Hacker News new | past | comments | ask | show | jobs | submit login

Discord is entirely down right now, both the website and the app itself. Amusingly, a lot of the sites that normally track outages are also down, which made me think it was my internet at first. Downdetector, monitortheinternet, etc.

Lots of other big sites that are down: Patreon, npmjs, DigitalOcean, Coinbase, Zendesk, Medium, GitLab (502), Fiverr, Upwork, Udemy

Edit: 15 min later, looks like things are starting to come back up




Hacker News is an excellent status page for those cases.


Out of curiosity, done HN use any CDN or other way of DDOS protection? dang?


Their dns record points to only one IP (209.216.230.240) that goes to M5 Computer Security


It's a hosting company in San Diego: https://www.m5hosting.com/

I host a dedicated server there (running https://www.circuitlab.com/) and when I traceroute/ping news.ycombinator.com, it's two hops (and 0.175 ms) away :)


ówò


[flagged]


This was a BGP/routing issue and has already been documented. Please don't spread misinformation and hysteria, especially on technical issues like this

https://twitter.com/eastdakota/status/1284253034596331520


[flagged]


It was a misconfiguration we applied to a router in Atlanta during routine maintenance. That caused bad routes on our private backbone. As a result, traffic from any locations connected to the backbone got routed to Atlanta. It resulted in about 50% of traffic to our network to not resolve for about 20 minutes. Locations not connected to our backbone were not impacted. It was a human error. It was not an attack. It was not a failure or bug in the router. We're adding mitigations to our backbone network as we speak to ensure that a mistake like this can't have broad impacts in the future. We'll have a blog post with a full explanation up in the next hour or so — being written now by our CTO.


You should really read up on BGP. It really is that flimsy.


It's not failure, it's misconfiguration. Totally different issue to be resilient to fat fingers than fires.


I wouldn't count on Keemstar as a reliable source of cyber-attack coverage


It was no such thing. A Cloudflare router advertised some bad routes.


Keemstar? Not reliable source. The dude has no idea what he is talking about.


Don't think so. They used to use Cloudflare but stopped. To my knowledge, it's a single server without a database (using the filesystem as a database).


So HN is serving 5.5M page view daily (excluding API access ) on a single server without CDN and without a database?

Holy crap I am thinking either there is some magic or everything we are doing in the modern web are wrong.

Edit: The number is from Dang [1]

>These days around 5.5M page views daily and something like 5M unique readers a month, depending on how you try to count them.

[1] https://news.ycombinator.com/item?id=23808787


>Holy crap I am thinking either there is some magic or everything we are doing in the modern web are wrong.

Spin up an apache installation and see how many requests you can serve per second if you're just serving static files off of an SSD. It's a lot.

edit: I see that there are already a bunch of other comments to this effect. I think you're comment is really going to bring out the old timers, haha. From my perspective, the "modern web" is absolutely insane.


>* From my perspective, the "modern web" is absolutely insane. *

Agreed.

I was brought up as a computer systems engineer... So, not a scientist, but I always worked with the basic premise of keep it simple. I've worked on projects where we built all the fangled clustering and master/slave (sorry to the PC crowd, but that's what it was called) stuff but never once needed it in practice. Our stuff could easily handle saturated gigabit networks as the 2 core cpu only running at 40%. We had cpu spare and could always add more network cards before we needed to split the server. It was less maintenance, for sure. It also had self healing so that some packets could be dropped if the client config allowed it, if the server decided it wanted to (but only ever did on the odd dodgey client connection)

That said, I was always impressed by the map-reduce of for search results (yes, I know they've moved on) which showed how massive systems can be fast too. It seemed that the rest of the world wanted to become like Google, and the complexity grew for the std software shop, when it didn't need to imho.

I jumped ship at that point and went embedded, which was a whole lot more fun for me.

Sincerely, old timer


[flagged]


How about we spend our energy fixing systemic/institutional racism first, because language will follow quite naturally.

The other way around surely doesn't work, and is just symbolic gestures without actual change.


There is only so much i can do - i'm not american and thats a change i can't make on my own aside from doing my best to be an ally when possible.

However i can open a few PRs and use some of my time to make that change. It's a minor inconvenience to me and if it makes even one black person feel heard and supported then yea, i'm gonna do it.


>I think you're comment is really going to bring out the old timers, haha.

That is great ! :D

>It's a lot.

Well yes, but HN isn't really static though. Fairly Dynamics with Huge number of users and comments. But still, I think I need to rethink lots of assumption in terms of speed, scale and complexity.


Huge numbers of users don't really mean that much. Bandwidth is the main cost, but that's kept low by having a simple design.

Serving the same content several times in a row requires very few resources - remember, reads far outnumber writes, so even dynamic comment pages will be served many times in between changes. 5.5 million page views a day is only 64 views a second, which isn't that hard to serve.

As for the writes, as long as significant serialization is avoided, it is a non-issue.

(The vast majority of websites could easily be designed to be as efficient.)


There is some caching somewhere as well, probably provides a bit more boost.

I've been at my work laptop (not logged in) and found something I wanted to reply to, so I pulled out my phone and did so. For a good 10 seconds afterwards, I could refresh my phone and see my comment, but refresh the laptop and not see it.


> From my perspective, the "modern web" is absolutely insane.

You know, it should be even better than it was in the past, because a lot of heavy lifting is now done on the client. If we properly optimized our stuff, we could potentially request tiny pieces of information from servers, as opposed to rendering the whole thing.

Kinda like native apps can do(if the backend protocols are not too bloated)


> Holy crap I am thinking either there is some magic or everything we are doing in the modern web are wrong.

It doesn't need to be crazy.

A static site on DigitalOcean's $5 / month plan using nginx will happily serve that type of traffic.

The author of https://gorails.com hosts his entire Rails video platform app on a $20 / month DO server. The average CPU load is 2% and half the memory on the server is free.

The idea that you need some globally redundant Kubernetes cluster with auto fail-over capabilities seems to be popular but in practice it's totally not necessary in so many cases. This outage is also an unfortunate reminder that you can have the fanciest infrastructure set up ever and you're still going down due to DNS.


> The idea that you need some globally redundant Kubernetes cluster with auto fail-over capabilities seems to be popular but in practice it's totally not necessary in so many cases

True, but this is why it shouldn't be bashed either. When you need it, you need it (cue very complex enterprise applications with SLA requirements).


> True, but this is why it shouldn't be bashed either. When you need it, you need it (cue very complex enterprise applications with SLA requirements).

To support this, look at how many people criticize Kubernetes as being too focused on what huge companies need instead of what their small company needs. Kubernetes still has its place, but some peoples expectations may be misplaced.

For a side project, or anything low traffic with low reliability requirements, a simple VPS or share hosting suffices. Wordpress and PHP are still massively popular despite React and Node.js existing. Someone who runs a site off of shared hosting with Wordpress can have a very different vision about what their business/sideproject/etc will accomplish compared to someone who writes a custom application with a "modern" stack.


Modern web is a completely broken mess.

We were serving around that traffic off a single dual pentium 3 in 2002 quite happily off IIS/SQL Server/ASP. The amount of information presented has not grown either.

That little box had some top tier brand main corporate web sites on it too and was pumping out 30-40 requests a second peak. There was no CDN.


You were not serving that traffic, you were just serving your core functionality - no tracking, no analytics, no ads, no a/b, no dark mode, no social login, no responsiveness. Are most of those shitty? Sure, just let me know when you figure out how to pry a penny from your users for all your hard work.


Oh no, not the dark mode! The sacrifices we have to make for performance I guess...


Easy. We built something that was worth money without all that.

Not a one trick marketoid pony.


Dark mode and responsive webdesign are both good for the user and efficient for the server and user's device.


That means an average of about 63 pages per second. Let's say that the total number of queries is tenfold and take a worst case scenario and round up to 1000 queries per second and then multiply by ten to get 10k queries per second, because why not.

I don't know what the server's specs are but I'm sure it must be quite beefy and have quite a few cores, so let's say that it runs about 10 billions instructions per second. That means a budget of about one million instructions per page load in this pessimistic estimate.

The original PlayStation's CPU ran at 33MHz and most games ran at 30fps, so about 1million cycles per fully rendered frame. The CPU was also MIPS and had 4KiB of cache, so it did a lot less with a single cycle than a modern server would. Meanwhile the HN servers has the same instruction budget to generate some HTML (most of which can be cached) and send it to the client.

A middle of the line modern desktop CPU can nowadays emulate the entire PlayStation console on a single core in real time, CPU, GPU and everything else, without even breaking a sweat.

>Holy crap I am thinking either there is some magic or everything we are doing in the modern web are wrong.

Magic, clearly.


That's only ~60 QPS, assume it is peaky and hits something more like 1000 QPS in peak minutes, but also assume most of the hits are the front page which contains so little information it would fit literally in the registers of a modern x86-64 CPU.

Even a heavyweight and badly written web server can hit 100 QPS per core, and cores are a dime a dozen these days, and storage devices that can hit a million ops per second don't cost anything anymore, either.


In-memory databases? That's amateurish. Time to make a service that runs out of ymm registers.


Not sure where your 5.5M number came from, but that's only 64 requests per second.

90 to 99% of those are logged-out users, so fully cacheable.

Only a handful of dynamic requests each second remain.


Unlike Reddit, logged in and logged out users largely see the same thing. I wouldn't imagine there is much logic involved in serving personalized pages, when they don't care who you are.


The username is embedded in the page, so you can't do full caching unfortunately. But the whole content could be easily cached and concatenated to header/footer.


You also can't do that every time because of hidden, flagged, showdead, etc.


That could be split as a setting in the user-specific header with the visibility part handled client-side. A bit more work, but it's not impossible if it's worth it.


Oh! I have never looked an HN frames, etc., but I assumed the header was separate. Thank you!


It's amazing how little CPU power it takes to run a website when you're not trying to run every analytics engine in the world and a page load only asks for 5 files (3 of which will likely already be cached by the client) that total less than 1 MB.


It is certainly doable , PoF was running lot of page views famously of a single IIS server for a long time .

HN is written in a lisp variant and most the stack is built in-house , it is not difficult to imagine efficiency improvements when many abstraction layers have been removed from your stack .


I don't remember PoF being famous for that, but they got a lot of bang for the buck on their serving costs.

What I do remember, is that it was a social data collection experiment for a couple of social scientists, that never originally expected that many people would actually find ways to find each other and hook up using it.

I miss their old findings reports about how weird humans are and what they lie about. Now, it's just plain boring with no insights released to the public.


POF was also run by a single dude until he sold it for some 600 million dollars!


There are couple of old posts about it in hpc blog .

Nick carver from SO also one mentioned that they could run SO if a single server , while it wasn’t fun it was doable and had happened some time .


For all my sibling comments, there is also context to be aware of. 5.5m page views daily can come is many shapes and sizes. Yes, modern web dev is a mess, but situation is very different from site to site. This should be taken as a nice anecdote, not as a benchmark.


You can serve a lot of flat files from a properly configured server in 2020. It's just that most people don't bother trying.


You don’t need a CDN if what you’re serving up in this case is mostly all text.

Just need good stable code and server side caching.


Back in 2000 a joke project of mine got slashdotted. Ran outta bandwidth before anything else.


With DO, these days, they don't run me out of bandwidth, but my instance falls over (depending on what I am doing - which ain't much), but with AWS, they auto-scale and I get a $5000 bill at the end of the month. I prefer the former.


Yeah, the bandwidth overage was a grand, give or take. It was a valuable lesson in a number of ways, and why for any personal things I wouldn't touch AWS with a shitty stick.


AWS only auto scales if you configure it to...


Yeah, but I'm stupid and didn't understand all the switches.


S3 will scale as much as is needed by itself


A lot of modern web technology is inefficient for the sake of being ergonomic. Here's what Hacker News looks like: https://github.com/arclanguage/anarki/blob/master/apps/news/...


From an old-school lisper perspective, the code seems perfectly ergonomic to me.

It's ergonomic in a very lispy way but perfectly reasonably so from the POV of that aesthetic.


Well, it's not magic. So, the other one.


I had this same reaction. Definitely feels like most of what we’re doing with “the modern” web is probably wrong.


That's why we have this bloated over-engineered multi-node applications: People just underestimate how fast modern computing is. To serve ~2^6 requests/sec is trivial.

It's easily served by a simple server.


1M queries per day is ~10 queries per second. It's a useful conversion rate to keep in mind when you see anyone brag about millions of requests per day.


that number is not very big

i used to host a wordpress site that has 5M pageviews a month on a $10 (and later $20) digitalocean instance.

that's wordpress and a shared vps. I imagine it could be a lot higher if I have dedicated server and use self-written software.


You have to remember there's a lot of seconds in a day, that's only 60 qps.


HN could probably be served by Python running on a fancy laptop.


Everything we're doing is wrong.


Flat-file DBs and mountable DB file systems are the future.


Man, if this is true, this guys have steel balls.


We've been telling this for a while now...


I don't get this comment, what does page serving performance have to do with "the modern web"? It's not as if serving up a js payload would make it more difficult to host millions of requests on a single machine, html and js are both just text.


Makes sense based on what I've read about Arc, which HN is written in.

I've been working on something where the DB is also part of the application layer. The performance you can get on one machine is insane, since you spend minimal time on marshalling structures and moving things around.


"They used to use Cloudflare but stopped."

They are still using Cloudflare. Unlike CF, M5 does not require SNI.

   curl --resolve news.ycombinator.com:443:104.20.43.44 https://news.ycombinator.com


It's hosted on AWS, you don't need DDoS protection, just a big wallet.


Thats not even slightly true.

Their IP belongs to AS21581; which is registered with a company called 'M5 Computer hosting' out of west-coast USA.

m5hosting.com

The last hop is Santa Barbera.

Definitely does not fall in the AWS ranges.


Hacker News is self-hosted: https://news.ycombinator.com/item?id=22767439. Let me see if I can find a better link where the specs are discussed.


Only if you use their "infinitely scaling" services. eg. s3. If the attacker is hammering you with expensive queries and your database is on 1 ec2 server, you're still going to go down.


The nameserver is aws but IP4 points to “m5 computer security”


My iPhone actually popped up a message saying that my wifi didn't appear to have internet, which was strange and obviously false as I was actively using the internet on it and the laptop next to it, but now it makes sense that it must have been pinging something backed by cloudflare!


Discord attempted to route me to: everydayconsumers.com/displaydirect2?tp1=b49ed5eb-cc44-427d-8d30-b279c92b00bb&kw=attorney&tg1=12570&tg2=216899.marlborotech.com_47.36.66.228&tg3=fbK3As-awso

(Visit at your own risk.)

Hack?


I'd be looking at your browser extensions or malware (if you use the Discord app).


Sure you didn't misspell discord?


I've never even heard of the site before. Nor have I searched for "attorney" any time recently.


We operate that site and are using Cloudflare to prevent DDOS attacks. Probably some sort of hash collision...


Crazy stuff!


It looks like you mistyped it and landed on a domain with spammy redirects. They have all kinds of weird URLs and there's not always any connection to anything you did other than go to the wrong domain.


I didn't mis-type. I press 'd' and Firefox fills-in the site for me.

If I type 'e' I get 'en.wikipedia.org'.

I was redirected.


How can this be reproduced?


Same here!

I even checked to see if an AWS region was down once I realised it wasn't on my side (I thought it might have been my ISP's DNS servers or something).

The next move was to check Hacker News - thankfully it's not also hosted on Cloudflare, ha!


I noticed discord being down, so I went to check downforeveryoneorjustme, also down. So I figured I'd check NANOG mailing list, also down :P


Yep, we were down completely. We are quite dependent on Cloudflare (frontend + dns).



And that is why you host your status page on separate infra.


We're dealing with a deeper level problem here. Since a lot of the internet is relying on Cloudflare DNS at some part or another, even many backup solutions fail. Since so much of DNS is centralised in so few services, such outages hit the core infrastructure of the internet.


A sudden disruption on a large number of services for everybody at once doesn't look like a DNS problem to me, with all the caching in DNS. It would fail progressively and inconsistently.


DNS absolutely was an issue. I changed DNS manually from Cloudflare's 1.0.0.1 (which is what DHCP was giving me) to 8.8.8.8 (Google) and most things I'm trying to reach work. There may be other failures going on as well, but 1.0.0.1 was completely unreachable.


I don't use cloudflare DNS but google DNS and got the same problems thant everyone else

The problem seems to have been resolved now, you might have made the change when they fixed it.


No, I changed the setting back and forth while it was down to confirm that the issue was that I could not reach 1.0.0.1. All the entries I tried from my host file were responsive (which is how I ruled out an issue with my local equipment initially and confirmed that it wasn't a complete failure upstream -- I could still reach my servers). Changing 1.0.0.1 to 8.8.8.8 allowed me to reach websites like Google and HN, and changing back to default DNS settings (which reset to 1.0.0.1, confirmed in Wireshark) resulted in the immediate return of failures. 1.0.0.1 was not responsive to anything else I tried.

Again, it may not have been the only issue -- and there are a number of possible reasons why 1.0.0.1 wasn't reachable -- but it certainly was an issue.


> I don't use cloudflare DNS but google DNS and got the same problems thant everyone else

Cloudflare is also the authoritative DNS server for many services. If Cloudflare is down, then for those services Google's DNS has nowhere to get the authoritative answers from.


Except the services mentioned in the original post have a ttl of 1h. Unlikely they would all go down at the same time.


status.discord.com 5 minutes


given that TTLs are usually very short now if your DNS server is configured correctly then caching shouldn't make a bit of difference.


I looked at the example above, all of patron, digitalocean, coinbase and gitlab (haven't checked the others) have a ttl of 1h.


pr0 tip: dont set your DNS TTL to 5 min (status.discord.com does)


Yeah, but then your status page provider switches THEIR provider, and suddenly you are on the same infra again.


It's hosted by statuspage.io, and their own status page was also down (metastatuspage.com). It is now back up, but their page shows the outage.


Status turtles all the way down, it seems!


Works for me.


I get a DNS servfail when resolving that DNS record, and many others:

    Server:    8.8.8.8
    Address:   8.8.8.8#53
    
    ** server can't find status.discord.com: SERVFAIL
So it's either not just cloudflare, or all those sites use cloudflare to host their DNS.


The latter.

    $ dig +short discord.com ns
    sima.ns.cloudflare.com.
    gabe.ns.cloudflare.com.


8.8.8.8 is Google and is down in addition to Cloudflare. 8.8.4.4 was still up.


8.8.8.8 is a caching resolver and it wasn't down. Do you understand how caching resolvers work?


Same. It has a big banner saying CloudFlare is down.


Works for me too... Australia


I can live without creepy instant messengers, but its shocking just how much everything else relies on one, central system. And furthermore, why is it always cloudflair?


Cloudflare is free and has a nice UI. I manage ~40 domains from ~six domain registrars through it, the consistency is great. The caching is a bonus.


idk about the UI the redesign with the forced collapses and extra clicks everywhere is annoying when handling a multitude sub domains plus their let's encrypt text entries I'm there mostly for the freebies.


Discord confirmed Cloudflare is also the reason they're down: https://twitter.com/discord/status/1284237737638461453


Doordash too. My first order. On my wife's birthday.

Ah, well. This too shall pass.


I did a pickup order a few weeks ago when of all things Tmobile SMS went down for 3+ hours. I couldn't go in the restaurant (covid) and I couldn't text them the parking # I was sitting at in a packed parking lot. I got a flood of about 50 texts a few hours later. Sat there for about an hour waiting for a $9 sandwich. I have no idea if they didn't get my order until late, or if they finally realized it was me or what. About 45 minutes in I decided to just give up on the day and take a nap, woke up to a door knock.


Kudos to the people at Discord. Just a few minutes after I got disconnected they already tweeted about the issue. Some minutes later and they have a message in their desktop app confirming it's an issue with Cloudflare. All while Cloudflare's statuspage says there are 'minor outages'.


Every company rushes to report an outage when they can blame another vendor, well that might be hyperbolic but it's sure a lot easier!


As a percentage of total traffic, a 'minor' outage for Cloudflare probably equates to a significant outage for a non-trivial amount of the internet.

It will also be especially noticeable to end-users, because sites using Cloudflare are typically high-traffic sites, and so a 'minor' issue that affects only a handful of sites is still going to be noticed by millions of people.


"Your nines are not my nines" [0]

[0] https://rachelbythebay.com/w/2019/07/15/giant/


I wonder if they are all using Cloudflare's free DNS stuff or if they're paying for business accounts?

My stuff is on Netlify (for the next week or so) and the rest is on a VPS bought from a local business who isn't reselling cloud resources. I'm kinda glad I moved all my stuff from cloudflare.


I think it's going to be everyone. Some of my free sites are dead, but also huge enterprise Cloudflare users (Discord/Patreon/4chan) are also dead.


> Amusingly, a lot of the sites that normally track outages are also down, which made me think it was my internet at first.

That is why if you have this question, you should go to google.com

My guess is that there are more resources invested in making sure google.com stays up than for any other site on the internet.


Depending on what part we're talking about, it varies. But yeah, just a few.


Crazily, my local name resolution started failing, because I have these names servers: 192.168.0.99, 1.1.1.1 and 8.8.8.8. The first does the local resolution, but macOS wasn't consulting it because 1.1.1.1 was failing?? Crazy. When I removed 1.1.1.1 from the list, everything started working.


DNS over HTTP might bypass your local nameservers.


What was failing was ssh to a local host. I can't imagine that Brew ssh uses DNS over HTTP.


Yup


Thought something like this was going on. At first I thought it was my router and restarted everything - to no avail. Glad to see confirmation that it wasn't an issue on my end.


Discord works for me, but https://redbubble.com/ prints "Service unavailable".


Freenode's IRC servers were down which was unexpected for me. I was expecting old-school communication networks to not have a dependency on Cloudflare.


I've had no connection interruptions to the three IRC networks I'm connected to. Freenode, EFnet and Hackint.

I loathe Discord, and I can barely contain myself with schadenfreude at this news.


I also had no issues connecting to a few different networks, including Freenode and EFnet.


Ironically, downdetector.com is also down.


Who watches the watchmen?


Would IRC be down?


Same for the German downtime trackers.


same here. I tried if hacker news still works and saw this




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: