More

tjhorner · 2026-02-25T19:28:46 1772047726

The post raises several points that I wholeheartedly agree with, but the framing is poor and honestly kind of elitist (or just short-sighted). Maybe to the point that I think much of it might just be bait, lol. For example:

> Ask a twenty-two-year-old to connect to a remote server via SSH. Ask them to explain what DNS is at a conceptual level. Ask them to tell you the difference between their router’s public IP and the local IP of their laptop. Ask them to open a terminal and list the contents of a directory. These are not advanced topics. Twenty years ago these were things you learned in the first week of any serious engagement with computers.

What? Computers were everywhere in all kinds of domains by 2006, but you can bet that your average accountant of the time would most likely not be able to SSH into a server (nor should they need to...) I guess it really depends on what the author qualifies as a "serious engagement with computers."

mattmanser · 2026-02-25T19:51:59 1772049119

They"ve basically got the dates pretty wrong. It's make sense if they'd said 35 years ago, that's when it was common to know that.

I'd say almost all of that became redundant for the average person with windows 3.1 release (34 years ago) or, maybe, more windows 95 (31 years ago).

I remember desperately trying to get two computers to talk to each other so we could play doom in the early 90s, whatever black magic we had to do seemed to take hours to get working.

The time we had 3 or even 4 computers playing Baldurs Gate together I swear we started trying to get the computers talking at 7pm and didn't start playing till 10 (but it was amazing).

pibaker · 2026-02-25T23:53:20 1772063600

Even site administrators didn't need SSH in 2006. "Panels" were already a thing back then.

The author has a rather distorted view of how things actually were in 2006.

tjhorner · 2026-02-25T19:18:17 1772047097

It obviously depends on local laws, but it's very commonly illegal to sell prepared food without a license/permit. You might not get caught selling food on FB Marketplace, but that doesn't make it any less allowed.

I agree with the author regarding Apple's walled-garden app distribution, but the analogy just doesn't work here.

tjhorner · 2026-02-25T22:08:35 1772057315

(Oops: *any more allowed)

tjhorner · 2026-02-19T05:29:29 1771478969

I'm interested in how the poison data was generated and why it's "practically endless". It looks like bits of code, structured data, and prose, but with small modifications that make it subtly incorrect. Usually off-by-a-few numbers, e.g. I got the text of GPL-3.0 with a copyright date of 2738.

tjhorner · 2026-02-12T16:58:24 1770915504

Did you really think posting this comment[1] in the PR would be interpreted charitably?

> Original PR from #31132 but now with 100% more meat. Do you need me to upload a birth certificate to prove that I'm human?

Post snark, receive snark.

[1]: https://github.com/matplotlib/matplotlib/pull/31138#issuecom...

famouswaffles · 2026-02-12T19:23:11 1770924191

There's a difference between snark and brigading, especially after the issue has been clarified.

tjhorner · 2026-02-12T19:40:16 1770925216

Yes, I'm with you there. In either case, their behavior is unacceptable and reads as bad faith.

tjhorner · 2025-12-19T17:36:21 1766165781

Who says you need to pipe the entire document with JSON-LD directly into the context window? I agree, that is very wasteful. You can just parse the relevant bits out and convert the JSON-LD data into something like your txt format before presenting it to the LLM. Bake that right into whatever tool it uses to scrape websites.

tsazan · 2025-12-19T17:46:29 1766166389

That solves the Token Tax. It fails the Bandwidth Tax. To get that JSON-LD, you still download 2MB of HTML. You execute JS. You parse the DOM. You are buying a haystack to find a needle, then cleaning the needle. We propose serving just the needle. Furthermore, JSON-LD is strictly for facts. It cannot express @SEMANTIC_LOGIC. It lacks the instructions on how to sell.

tjhorner · 2025-08-20T16:36:47 1755707807

Anubis doesn't target crawlers which run JS (or those which use a headless browser, etc.) It's meant to block the low-effort crawlers that tend to make up large swaths of spam traffic. One can argue about the efficacy of this approach, but those higher-effort crawlers are out of scope for the project.

scratchyone · 2025-08-21T04:13:11 1755749591

wait but then why bother with this PoW system at all? if they're just trying to block anyone without JS that's way easier and doesn't require slowing things down for end users on old devices.

Imustaskforhelp · 2025-08-20T20:34:46 1755722086

reminds of how wikipedia literally has all the data available even in a nice format just for scrapers (I think) and even THEN, there are some scrapers which still scraped wikipedia and actually made wikipedia lose some money so much that I am pretty sure that some official statement had to be made or they disclosed about it without official statement.

Even then, man I feel like you yourself can save on so many resources (both yours) and (wikipedia) if scrapers had the sense to not scrape wikipedia and instead follow wikipedia's rules

tjhorner · 2025-06-03T21:24:05 1748985845

I don't think that's the threat model here. The concern is regarding potentially sensitive information being sent to a third-party system without being able to audit which information is actually sent or what is done with it.

So, for example, if your local `.env` is inadvertently sent to Cursor and it's persisted on their end (which you can't verify one way or the other), an attacker targeting Cursor's infrastructure could potentially compromise it.

tjhorner · on Oct 19, 2024

Immich uses SvelteKit for its frontend: https://github.com/immich-app/immich

I’ve used it as a reference for a NestJS+SvelteKit app in the past, they did a pretty good job.

tjhorner · on Oct 1, 2024

The forks aren't actually automatically taken down in most cases. The claimant must list every individual fork in the claim. Which I love, because it's kind of petty but still following the DMCA to the letter.

Here is an example[1] of the form claimants must fill out.

> Each fork is a distinct repository and must be identified separately if you believe it is infringing and wish to have it taken down

[1]: https://github.com/github/dmca/blob/master/2024/05/2024-05-3...

0xcde4c3db · on Oct 1, 2024

IIRC it took them a couple months to get through all of the Yuzu forks after the initial DMCA and lawsuit. I doubt there were nearly as many forks of Ryujinx, though.

tjhorner · on July 31, 2017

Top of the device, secured with a simple philips head screw during use, easily accessible.