"web previews: I'd do this by making it the client's responsibility."
Actually a good example of how difficult the problem is. A very common attack is to switch a bit.ly link or something like that to a malicious destination. You would also DoS the hosts... as the Mastodon folks are discovering (https://www.jwz.org/blog/2022/11/mastodon-stampede/)
For blocks/mutes, you have to account for retweets and quotes, it's just not a fun problem.
Shipping the product is much more difficult that what's in your post. It's not realistic at all, but it is fun to think about.
I do agree that some of this could be done better a decade later (like, using Rust for some things instead of Scala), but it was all considered. A single machine is a fun thing to think about, but not close to realistic. CPU time was not usually the concern in designing these systems.
I'll go ahead and quote that blog post because they block HN users using the referer header.
---
"Federation" now apparently means "DDoS yourself."
Every time I do a new blog post, within a second I have over a thousand simultaneous hits of that URL on my web server from unique IPs. Load goes over 100, and mariadb stops responding.
The server is basically unusable for 30 to 60 seconds until the stampede of Mastodons slows down.
Presumably each of those IPs is an instance, none of which share any caching infrastructure with each other, and this problem is going to scale with my number of followers (followers' instances).
This system is not a good system.
Update: Blocking the Mastodon user agent is a workaround for the DDoS. "(Mastodon|http\.rb)/". The side effect is that people on Mastodon who see links to my posts no longer get link previews, just the URL.
---
I personally find this absolutely hilarious. Is that blog hosted on a Raspberry Pi or something? "Over a thousand" requests per second shouldn't even show up on the utilization graphs on a modern server. The comments suggest that he's hitting the database for every request instead of caching GET responses, but even with such a weird config a normal machine should be able to do over 10k/second without breaking a sweat.
> I personally find this absolutely hilarious. Is that blog hosted on a Raspberry Pi or something? "Over a thousand" requests per second shouldn't even show up on the utilization graphs on a modern server.
Mastodon is written on Ruby on Rails. That should really answer all your questions about the problem but if you're unfamiliar Ruby is slow compared to any compiled language, Rails is slow compared to near-every framework on the planet and it isn't written that well either.
While that may be funny, the number of Mastodon instances is growing rapidly, to the point where it will need to eventually be dealt with (not least because hosting on a Pi or having a badly optimized setup both happens in real life). But more to this example, it shows passing preview responsibility to end user clients is a far bigger problem. Eg not many would be able to handle the onslaught of being linked to from a highly viral tweet if previews weren't cached.
Actually a good example of how difficult the problem is. A very common attack is to switch a bit.ly link or something like that to a malicious destination. You would also DoS the hosts... as the Mastodon folks are discovering (https://www.jwz.org/blog/2022/11/mastodon-stampede/)
For blocks/mutes, you have to account for retweets and quotes, it's just not a fun problem.
Shipping the product is much more difficult that what's in your post. It's not realistic at all, but it is fun to think about.