That would be the natural next step, but it's a question of whether it's worth the engineering and maintenance effort, especially compared to other things that need doing.
For failures that don't take down the datacenter, we already have a hot standby. For datacenter failures, we can migrate to a different host (at least, we believe we can—it's been a while since we verified this). But it would take at least a few hours, and probably the inevitable glitches would make it take the better part of a day. Let's say a day. The question is whether the considerable effort to build and maintain a cross-datacenter standby, in order to prevent outages of a few hours like today's, would be a good investment of resources.
My vote is no. We will all be fine for a day without HN, as today proved. There have to be so many other ways HN can be improved, that will have more of an impact for HN users, in the remaining 364 days of the year.
> For failures that don't take down the datacenter, we already have a hot standby. For datacenter failures, we can migrate to a different host (at least, we believe we can—it's been a while since we verified this).
> Question: what is the other things that need doing?
I'm currently working on fixing a bug where collapsing comments in Firefox jumps you back to the top of the page. I'm taking it as an opportunity to refine my (deliberately) dead-simple implementation from 2016.
> But this forum has seen little change over the years and it's pretty awesome as is.
That's an illusion that we work hard to preserve, because users like it. People may not have seen much change over the years but that's not because change isn't happening, it's because we work mostly behind the scenes. Though I have to say, I really need more time to work on the code. I shouldn't have to wait for 3 hours of network outage to do that (but before anyone gets indignant, it's my own fault).
Does that mean it might get more performant? On my mobile the time it takes seems to scale with the number of posts on the page, not the number of posts it actually collapses
Yes I certainly hope so. The dead-simple implementation first expands all the comments and then collapses the ones that should be collapsed, so your observation is spot on.
I had a lot of help today from one of the brilliant programmers on YC's incredible software team. And there are other people who work on HN, just not full-time since Scott left.
For failures that don't take down the datacenter, we already have a hot standby. For datacenter failures, we can migrate to a different host (at least, we believe we can—it's been a while since we verified this). But it would take at least a few hours, and probably the inevitable glitches would make it take the better part of a day. Let's say a day. The question is whether the considerable effort to build and maintain a cross-datacenter standby, in order to prevent outages of a few hours like today's, would be a good investment of resources.