Seeing such a large web property go down like this is fascinating.
It's like a power plant grid failure, except for attention instead of energy.
When meta is down, a hoard of internet users desperately seek somewhere else to place their attention... But the system is designed with the expectation that meta will take all that traffic... And boom! everything starts falling over. Wild.
And rather than an indication that it is not working, I was just told to login again; since the attempts were unsuccessful, I was led to reset my password. Now I have no clue if the password has been reset properly or an old one is used, or login with google is used, or if I will continue to be logged out after they fix stuff.
If your service is down, please say "my service is down".
One weekend I was on-call, we had a system we ran for the government, but authentication was handled by another government owned service, hosted and managed by another company.
A user called in early Saturday evening, saying that the system was down. After a bit of debugging, wondering where our alerting had failed, I concluded that the system was perfectly fine, but the authentication service had been returning 500 errors for around an hour. When I called the user back he made a comment that rather changed how I think about monitoring and systems. He says: Well, from my point of view it really doesn't matter which part isn't working, it's just down.
I've learned the same lesson, and why things like ping checks aren't enough. Even just a WGET of the front page isn't enough. You need the monitor to login and replicate at least part of the user experience if you have a complex setup. It's very embarrassing when someone tells a user "shows it's up from my side." and they're relying on a naïve monitor.
I lament the fact error messages are rarely, if ever, displayed anymore. Just a generic message in its place, usually not even indicating whether it's a 'you' problem or a 'them' problem :(
It amazes me how most smartphone apps- which operate in a wireless environment of likely flaky connections, don't seem to be tested or designed at all to handle loss of connections. They tend to just become non-responsive or do erroneous things like play the first second of an audio file and then auto-pause, without reporting any errors.
Exactly, it's a daily thing for users- it's only an edge case for the developers, because they are likely developing and testing with hard wired high speed connections.
Good chance most of the phones are virtualized as well to test across OS builds and that the actual deploy to real phones aren't tested nearly as well.
Personally, I despise cutesy error messages that try to make the program sound more human. "Oops, we're sorry this happened! Would you like a nice cup of hot cocoa while we try and figure out what happened? Or should I read you Goodnight Moon" again?"
Yeah, but at least it's a bit understandable that its changed in that direction, and usually you can get a developer-friendly error message if you look at the HTTP responses in the devtools.
I remember one time a company I worked at received a support message where the title was (paraphrased) "DONT HURT MY CHILDREN" from some person who saw an error message saying something about "couldn't dispose of child" or something similar, when the frontend broke. "Child/children" being kind of common in programming, I'm sure others faced similar scenarios.
I'm not sure if they were genuinely scared or just decided to have some fun with us while reporting the issue, but after that we made the error messages even more generic, so nothing could be misunderstood.
I guess it is this quest to not alienate the lowest common denominator that is the reason behind the stupidification of error messages. On one hand, people won't get scared, on the other hand, people get less context about why the error happened in the first place.
Wonder how many of the average users actually care?
>The Bomb icon is a symbol designed by Susan Kare that was displayed inside the System Error alert box when the "classic" Macintosh operating system (pre-Mac OS X) had a crash which the system decided was unrecoverable. It was similar to a dialog box in Windows 9x that said "This program has performed an illegal operation and will be shut down." Since the classic Mac OS offered little memory protection, an application crash would often take down the entire system.
Unfortunately, the Mac's bomb dialog could cause naive users to jump up out of their seat and run away from the computer in terror, because they though it was going to explode!
And Window's "This program has performed an illegal operation and will be shut down" error message was just as bad: it could cause naive users to fear they might get arrested for accidentally doing something illegal!
Generally error messages get watered down to nothingness because it gives news reporters less meat to chew on. Not knowing if its a "me or you" problem means that small problems might actually get missed and not need any PR response. Details won't leak and produce all kinds of speculation over the causes, etc.
I'd assume the security vulnerability is there no matter what the error message says, but I guess very explicit and verbose error messages might expose details to make it easier to find said vulnerability.
You mean e2e from mobile app? Because they would have to be able to trigger those failure situations from all layers of microservices for which they have no control over.
My first thought was that I'd been hacked as well...
I have gotten 4-6 "here's your facebook reset" type emails over the last month. All unrequested. I've always assumed they're casual attacks (mostly interested in seeing if my password can be stuffed into another, more valuable account)
It has nothing to do with Facebook; it's the trend of sending email from alternative domains like facebookmail.com that train users to trust unknown domain names.
It's particularly egregious with critical stuff like banking.
Password works on THREE other machines, fails completely on ONE.
I was logged in when the outage happened on machine A and phone B - immediately tried to reset password, which took me into the hellish abyss many of us are experiencing now...
This evening - about 18hrs about the event, I fire up machine C - it logs STRAIGHT INTO Messenger. OK... This is something, too scared to open a browser in case that triggers the session disconnect...
So I fire up laptop D - STRAIGHT INTO MESSENGER... OK, open browser, STRAIGHT INTO FB. Log into Google password manager, VISUALLY CHECK password, and it is my last known good password (in use at time of outage).
Fire up iPad E - Straight into Messenger!!! All these machines are on the same network!
Back to Machine A - clear cookies and try to log in, no joy; different browser and try to log in, no joy; try to reset password on this different browser (I might add, I did get a new Change password Token number off this attempt), but no joy; clear cookies and restart, then attempt to log in, no joy!
WHAT THE F?!?!?!?
I initially thought it may have been a 24hr block due to password change attempts? But now not so sure... I've also tried logging in on via Machine A in a VM from a different O/S to see if it may have something to do with it - but again no joy - this environment had NOT been logged in to FB before...
Same... This was literally the first time I wanted to use my Facebook account in years, so I changed my password twice before I realized that there must be an issue on their end.
I just wanted to use their oauth login to buy a jonsbo n3 case from AliExpress. Sadge
Yeah, and they've dug themselves a hole with having near permanent login persistence. I can't remember the last time on a computer I use frequently that I've had to log in.
I fear my initial attempts to 'reset my password' and getting that same security(at)facebookmail.com email with the SAME reset code, have not helped me - if only I'd been asleep, I probably could have slept through it.
At least Insty came back up - though to be honest I cant remember that password either and now terrified to try and recover or change that now too!
I have exactly this issue. Been frantically jumping around this morning dealing with perceived security issues and have no idea where my many FB resets led to.
It logged me out and told me that my credentials were incorrect; I thought my credentials had been stolen, so I'm kinda personally glad that it seems to be happening to a lot of other people too. I know that's a bit selfish, but :shrug:
There are quite some harsh comments here below. You can't plan for every possible failure point, who knows what part of a system/infra out of everything that they have went down and triggered this behaviour. Some things you just can't catch/predict. Especially in huge systems like theirs. I would expect people here to understand things like these and not just call people names for something like this, we all know things seem simple/clear from the outside, but the job of debugging and fixing something like this take quite some effort.
This is a company with one of the largest digital infrastructures in the world. An outage is understandable, inability to tell they're having an outage and inform users appropriately is not. Stop making excuses for people who are literally awash in resources.
> Stop making excuses for people who are literally awash in resources.
This is a pretty weird outlook to have - looking at any group awash with resources, whether it be governments or other companies, and you can clearly see that even with those resources, failures still happen.
You can jump up and down and pretend that this is solvable, or you can look at reality, look at all the evidence of this happening over and over to almost everyone, and conclude with some humility that these things just happen to everyone.
(Looking this reality in the face is one of the things motivating my beliefs around e.g. AI safety, climate change, etc.)
It is always better for the company's rep for the issue to have been on your end. Admitting fault comes with a potential liability. It's gaslighting written as an SLA
You can't plan for every contigency, but you can reserve potentially scary message for situations where you know they are correct. An unpected error state should NOT result in a "invalid credentialiald error".
Pushing people to unnecessarily reset credentials increases risk. Not only does it increase acute risk, but it also decreases the value of the signal by crying wolf.
The argument here is the kind of nonsense cargo cult security that pervades the industry.
- in general, if the system is broken enough to be giving false-negatives on valid credentials, it's broken enough that there isn't much planning to be done here because the system's not supposed to break. So if they give me "Sorry, backend offline" instead of "invalid credential," they've now turned their system into an oracle for scanning it for queries-of-death. That's useful for an attacker.
- in the specifics of this situation, (a) credential reset was offline too so nobody could immediately rotate them anyway and (b) as a cohort, Facebook users could stand to rotate their credentials more often than the "never" that they tend to rotate them, so if this outage shook their faith enough that they changed their passwords after system health was restored... Good? I think "accidentally making everyone wonder if their Facebook password is secure enough" was a net-positive side-effect of this outage.
So your approach to security is to never admit that an application had an error to a user, but to instead gaslight that user with incorrect error messages that blame them?
This is security by obscurity of the worst kind, the kind that actively harms users and makes software worse.
No. My approach to security is to never admit that an application had an error to an unauthenticated user.
That information is accessible to two cohorts:
- authenticated users (sometimes; not even authenticated users get access to errors as low-level as "The app's BigTable quota was exceeded because the developers fucked up" if it's closed source cloud software)
- admins, who have an audit log somewhere of actual system errors, monitoring on system health, etc.
Unfortunately, I can't tell if the third cohort (unauthenticated users) is my customers or actively-hostile parties trying to make the operation of my system worse for my customers, so my best course of action is to refrain from providing them information they can use to hurt my customers. That means, among other things, I 403 their requests to missing resources instead of 404ing them, I intentionally obfuscate the amount of time it takes to process their credentials so they can't use timing attacks to guess whether they're on the right track, I never tell them if I couldn't auth them because I don't recognize their email address (because now I've given them an oracle to find the email addresses of customers), and if my auth engine flounders I give them the same answer as if their credentials were bad (and I fix it fast, because that's impacting my real users too).
To be clear: I say all this as a UX guy who hates all this. UX on auth systems is the worst and a constant foil to system usability. But I understand why.
You are absolutely correct. That would be a much better experience.
That said, getting there strikes me as pretty challenging. Automatically detecting a down state is difficult and any detection is inevitably both error-prone and only works for things people have thought of to check for. The more complex the systems in question, the greater the odds of things going haywire. At Meta's scale, that is likely to be nearly a daily event.
The obvious way to avoid those issues is a manual process. Problem there tends to be that the same service disruptions also tend to disrupt manual processes.
So you're right, but also I strongly suspect it's a much more difficult problem than it sounds like on the surface.
> That said, getting there strikes me as pretty challenging. Automatically detecting a down state is difficult and any detection is inevitably both error-prone and only works for things people have thought of to check for. The more complex the systems in question, the greater the odds of things going haywire. At Meta's scale, that is likely to be nearly a daily event.
Well, in principle, the frontend just has to distinguish between HTTP status 500 (something broken in the backend, not the fault of the user) and some HTTP status code 4xx (the user did something wrong).
The "your username/password is wrong" message came in a timely manner. So someone transformed "some unforeseen error" into a clear but wrong error message.
And this caused a lot of extra trouble on top of the incident.
But there's something off here. I wouldn't expecting to be shown as logged out when the services are down. I'd expect calls to fail with something aka 500 and an error showing "something happen edited on our side". Not all the apps going haywire.
At the scale of Meta, "down" is a nuanced concept. You are very unlikely to get every piece of functionality seizing up at once. What you are likely to get is some services ceasing to function and other services doing error-handling.
For example, if the service that authenticates a user stops working but the service that shows the login form works, then you get a complex interaction. The resulting messaging - and thus user experience - depend entirely on how the login page service was coded to handle whatever failure the authentication service offered up. If that happens to be indistinguishable from a failure to authenticate due to incorrect credentials from the perspective of the login form service, well, here we are.
At Meta's scale, there's likely quite a few underlying services. Which means we could be getting something a dozen or more complex interactions away from wherever the failures are happening.
Isn't this just the standard problem of reporting useful error messages? Like, yes, there are academic situations where you can't distinguish between two possible error sources, but the vast majority of insufficiently informative error messages in the real world arise because low effort was applied to doing so.
Yes, with the additions of sheer scale, a vast number of services, multiple layers, and the difficulty of defining "down" added in. I think the difficulty of reporting useful error messages is proportional to the number of places an error can reasonably happen and the number of connections it can happen over, and by any metric Meta's got a lot of those.
No, in that detecting when you should be reporting a useful error message is itself a complex problem. If a service you call gives you a nonsense response, what do you surface to the user? If a service times out, what do you report? How do you do all this without confusing, intimidating, and terrifying users to whom the phrase "service timeout" is technobabble?
> If a service you call gives you a nonsense response, what do you surface to the user?
If this occurred during the authentication process, I think I would tell the user "Sorry, the authentication process isn't working. Try again later." rather than "Invalid credentials". And you could include a "[technical details]" button that the user could click if they were curious or were in the process of troubleshooting.
> If that happens to be indistinguishable from a failure to authenticate due to incorrect credentials from the perspective of the login form service, well, here we are.
If you can't distinguish those, then that is bad software design.
Come on use a little imagination. DNS lookup for the db holding the shard with the user credentials disappears. Code isn’t expecting this, throws a generic 4xx because security instead of a generic 5xx (plenty of people writing auth code will take the stance all failures are presented the same as a bad password or non-existing username); caller interprets this a login failure.
Same auth system system used to validate logins to the bastions that have access to DNS. Voilá.
> plenty of people writing auth code will take the stance all failures are presented the same as a bad password or non-existing username
Those people would be wrong. You can take all unexpected errors and stick them behind a generic error message like "something went wrong" but you should not lie to your users with your error message.
If you have different messages for invalid username vs invalid password, you can exploit that to determine if a user has an account at a particular service.
"Invalid credentials" for either case solves this problem.
But sure, let's report infra failures different as "unexpected error"
Now, what happens if the unexpected error is only when checking passwords, but not usernames?
Do you report "invalid credentials" when given an invalid username, but "unexpected error" when given a valid name but invalid password?
If so, you're leaking information again and I can determine valid usernames.
So, safe approach is to report "invalid credentials" for either invalid data or partial unexpected errors.
Only time you could safely report "unexpected error" is if both username check and password check are failing, which is so rare that it's almost not worth handling. Esp. at the risk of doing wrong and leaking info again.
If you really want to hide whether a username is in use, then you also have to obscure the actual duration of the authentication process among other things. The amount of hoops you need to jump through to properly hide username usage are sufficient that you need to actually consider if this is a requirement or not. Otherwise, it is just a cargo cult security practice like password character requirements or mandated password reset periods.
In this case, Facebook does not treat hiding username usage as a requirement. Their password reset mechanism not only exposes username / phonenumber usage, but ties it to a name and picture. So yes, Facebook returning an error that says credentials are incorrect when it has infrastructure problems is absolutely a defect.
what if, if one service doesnt respond at all or responds with something that doesnt fit an expected format that it would if working correctly, the whole thing just says "sorry, we had an error, try again later"? if it has to check both at the same time, and cant check them independently, wouldn't that solve the vulnerability? or am i missing something? totally understandable if i am, i just want to learn /gen
Yea, the wife came to me in a bit of a panic that her Facebook account got hacked. I tried logging in to FB to check if I had been unfriended, and I also got errors indicating my password was incorrect. My FB password is 96 bits from /dev/urandom in a GPG-based password manager I wrote for myself a couple decades ago. So, no my password wasn't wrong, and I'm not a big enough target for someone to put enough effort into figuring out how to snarf up my password data and crack my GPG passphrase.
Anyway, when FB thought my password was wrong I calmed way down. I thought maybe FB corrupted their password DB or something, so I just tried to reset my password, got into an odd workflow loop, and then quacked "downdetector facebook".
that's actually really cool, i hadnt considered writing my own password manager but i feel like it'd be a fun and fairly useful project, did it take you particularly long to do? i'm interested in giving it a go :D
The heavy lifting is done by GPG in a subprocess, taking information on stdin or outputting the decrypted data on stdout. The rest is just generating the passwords, organizing the encrypted files, and perhaps interacting with the clipboard.
Yes. My spidey sense went off and I told my work I'll be off for an hour while I redo all my passwords... might still do that but glad to know it's not necessarily me getting hacked.
I called out some comment for being racist a little earlier (yeah I know, just report and move on...) and figured they'd managed to pwn my account somehow. Good to know it's not just me.
In an "anything's possible" sense then yeah. But the fact that FB was not letting me login with the credentials I knew to be correct was directly attributed to a global outage, rather than a me-specific issue. Which I can now verify by checking the devices that are authorised to my account.
So you're saying you do own your racism. Well good for you, one of the brave racists -- now we know what kind of a person you really are. But it doesn't mean you're right, it just means your opinion is worthless and you're not worth debating because you're an intellectually dishonest bigot, even worse for believing in scientific racism.
Edit:
Your beloved scientific racism is not reality, it's a pseudoscience, as foolish and wrong as Astrology and Phrenology and Homeopathic Medicine. You're still a intellectually dishonest bigot.
If you're so intellectually honest and sure of yourself, then why don't you state right now unequivocally for the record that you're an unrepentant racist bigot?
Scientific racism, sometimes termed biological racism, is the pseudoscientific belief that the human species can be subdivided into biologically distinct taxa called "races", and that empirical evidence exists to support or justify racism (racial discrimination), racial inferiority, or racial superiority. Before the mid-20th century, scientific racism was accepted throughout the scientific community, but it is no longer considered scientific. The division of humankind into biologically separate groups, along with the assignment of particular physical and mental characteristics to these groups through constructing and applying corresponding explanatory models, is referred to as racialism, race realism, or race science by those who support these ideas. Modern scientific consensus rejects this view as being irreconcilable with modern genetic research.
Scientific racism misapplies, misconstrues, or distorts anthropology (notably physical anthropology), craniometry, evolutionary biology, and other disciplines or pseudo-disciplines through proposing anthropological typologies to classify human populations into physically discrete human races, some of which might be asserted to be superior or inferior to others. Scientific racism was common during the period from the 1600s to the end of World War II, and was particularly prominent in European and American academic writings from the mid-19th century through the early-20th century. Since the second half of the 20th century, scientific racism has been discredited and criticized as obsolete, yet has persistently been used to support or validate racist world-views based upon belief in the existence and significance of racial categories and a hierarchy of superior and inferior races.
And if your grandmother had wheels, then she'd be a bicycle.
Since you just don't get it, and you're such an intellectually dishonest unrepentant racist whose opinions are so worthless they should be dismissed, I will explain it for you:
A statement in this form is always true:
"If <something that is false>, then <anything in the world you want to make up, true or false, no matter how stupid of implausible>."
Because <something that is false> like "If the opinion is based on facts that are true but inconvenient" means that you can say anything you like after that, such as "the intellectual dishonesty is dismissing them", and the entire statement is true, because the condition is false.
I know that's going to whoosh right over your head, but in other words, it's false that your opinion is based on facts that are true but inconvenient. Your opinion is based on lies and pseudoscience, and it is false, which is inconvenient for you.
Gino D'Acampo "If my Grandmother had wheels she would have been a bike" -18th May 2010:
Same same. I went through the password reset flow (I was overdue anyways), it never sent anything to my SMS, so I did it again with email, reset the password and went to log in with the new password, "Incorrect password" error. Old password, also incorrect.
Didn't help that I had just posted a lukewarm spicy take on how linguistic prescriptivism is BS.
All the while the website felt like it was unstable, hard to describe, but it felt like it was bouncing around between URLs too much and reloading a lot.
Definitely feels like a botched update on their end.
E: Instagram is misbehaving as well, banner loads but big "Something is wrong" error on the feed.
E: now youtube has "Something went wrong" - WTF. I can't believe I'm saying this, but thank goodness for reddit and X[itter]???
E: interesting, seeing a big spike across multiple platforms on downdetector, including AWS: https://downdetector.com/status/aws-amazon-web-services/ I'm not able to log in right now, but that could be PEBCAK, I have too many saved IDs and I don't want to fail2ban myself
downdetector reports has gone down but to me is still bugged out, been catching a livestream on youtube all along though, meta stocks are back up from the dip so I take it some regions are restored to normality
I heard for a while Netflix would fail open if auth was unavailable. Like it’s just movies just let em see it.
Facebook data is more sensitive. Not so much the data people go there to see, cool memes that their friends liked, but the list of friends and interests.
Other places I worked had the ability for Ops to push out a change saying the site was down for maintenance. After a while we stopped using it and just took the hit of a bunch of 5xx errors. Basically when the planned down times became shorter than the time to propagate the down setting.
Likewise, started password reset process that won't complete, asked my wife to double check my account wasn't compromised and posting cryptocurency crap or somesuch.
On a psychological note, I think the threat detection part of our brain doesn't always notify our conscious thought that it's actively monitoring for threats. I've often noticed that when I'm carefully handling a hot frying pan then my ringing phone is more likely to startle me than usual.
That makes sense. I've noticed too that my brain seems to have a threat pre-emption module as well as a threat reaction module. For example, I'll sometimes be walking and texting at the same time, only to stop in my tracks and suddenly realize that there's a hidden stair in front of me.
I have an active Instagram account. Today, coincidentally was the first time I thought of promoting my best few posts as an ad to improve reach to fellow photographers. Bad luck! The app now seems to be back online but my ads which were supposed to run for 48-72 hours, show <Disabled>. And there is a new "Pay Now" link, even though it was paid for and seems ad-spend is already showing it used some of that payment.
As an individual, this is pretty confusing. I don't have much to lose. I am glad I spent 10-15 bucks on favorite 2-3 posts only. I can imagine many others to be more affected. What is normally to be expected for SME users? Does Meta resume the ads automatically? Do they make good for the lost time since the clock seems to be ticking -- although no one saw any impression.
Edit: I submitted a ticket to Instagram Help and they responded by asking for a screencast video. The first time I sent, the video bounced. I have re-sent this by trimming the video.
Out of curiosity I want to know firsthand how Meta handles the small customers.
If we get huge surge of "Uncommitted" votes to Biden as we did in Michigan, you can argue this can cause significant pressure on him and might even lead to his ousting...
This would be a point for sure, except this is a primary and the only people who can vote in it are registered party members. Young people actually registered already care enough to know it's the primary today.
This one is already decided though. Biden on the Democrat side, Trump on the Republican. If anything it might hurt the Trump side a bit, as people may not realize that the Supreme Court only yesterday ruled states can not block him.
There’s all the down-ballot elections, and states like Texas have completely open primaries.
And states like Texas often have nearly one-party rule, so the primary pretty much is the election.
Because of the party that currently owns the non-urban parts of Texas, I usually vote in that primary, despite not voting for its candidates in the general elections.
> This would be a point for sure, except this is a primary and the only people who can vote in it are registered party members
Can't you register the same day as the voting happens? Seems utterly stupid that you have to register to vote to begin with, but if it's a requirement, you should at least be able to register the day of the voting.
Facebook and other social media platforms have played somewhat of a role in the 2016 election, for example. Meta has been under a lot of pressure for that, compare e.g. this recent Instagram change: https://about.instagram.com/blog/announcements/continuing-ou...
I can't get into the US-local conspiracy-mongering, but I'm not sure how you've missed social media becoming a hot-spot for election-day information/misinformation?
Shutting down social media has gone to the top of the list for regimes either "attempting to fabricate positive election results" or "attempting to combat the spread of misinformation about elections".
More sympathetically, for better or for worse (definitely the latter) there will be people trying to look up election information ("what are my local polling hours", "who is on my ballot") on social media websites, who will now not be able to be guided to the correct information.
Messenger is often used to coordinate logistics among friends. I would not be surprised if a Meta disruption lead to at least momentary confusion if someone was planning on carpooling. Not to mention it messes with "get out the vote" posting.
I'm not very conspiracy-minded but this does smell a little weird.
At the very least, "there's a big event in the country of one of our biggest userbases, maybe hold off on risky deploys until tomorrow"
Actually I can see this morphing into a Russia interference conspiracy.
If the main social media used by liberals goes down, while the one used by Trump (forgot what it's called) stays up, surely that is an advantage for him?
In an election between the two parties I think that would be a very likely narrative but in this case it's the primaries. And (as others have mentioned) neither primary has really had anything resembling a competition at this point so it's more or less irrelevant to the result (most likely).
For example, you can then pose the conspiracy that the fact it was Russian interference conspiracy was a conspiracy to justify more social media policing during the real election.
Perhaps but why would downtime increase policing? Something like a large scale fake news campaign (i.e. basic deep fakes or something) is easier (generally speaking) to pull off and would be more likely to cause increased policing that a short outage.
Not relevant to the primary election in any way, but this has been more than annoying to me already.
We have a small livestock operation, and won an online auction late last night for a pig about four hours away. Facebook was the only listed means of contacting the person, and we were planning on driving to pick it up this morning.
Now I get to re-arrange my day today to deal with that, and will probably have to take a PTO day from work to drive there later in the week.
Real businesses are in fact impacted by Facebook being down - including those not based around Facebook and that you might never expect.
Did they imply anything as ridiculous as needing facebook to vote?
The implication is merely that everyone would otherwise have been talking about voting and the results, ie, simply higher than normal traffic. (I'm not sure it would be that much but that was the reasonable interpretation of the comment.)
I agree with the first part but the second is taking it too far. Plenty of people use Facebook messenger for communication about semi-important things. Not seeing how that would affect the primaries but it is not jsut for vacation pictures.
Sweden, logged out from all devices, Google authentication to reset password results in error. Authorization codes sent via SMS to reset passwords seems non-responsive. Authorization codes to reset password sent by e-mail works, but setting passwords results in set password page two times in a row and after second try "An unexpected error occurred. Please try logging in again."
It seems to not just be Meta - sites like Downdetector[0] are showing a spike in reported issues from AWS, Google services, and X/Twitter as well. I noticed issues with Google myself.
The sparklines on Downdetector's homepage can't be compared to each other. Spikes that look similar can actually have a difference of several orders of magnitude. Only meta's services have truly large spikes.
That's true and an excellent point. I commented about the reported issues elsewhere mainly because I experienced them myself (google.com and drive.google.com not loading or being extremely slow to load content). That could be entirely sympathetic though - people having issues with Meta flooding other services and briefly overwhelming them.
every one panicked that they got hacked. i got a slim hope that i also got hacked and that i will not bother to recover my account and just roll without FB :) im too week to quit my self.
I know your intention is to help, but please don't share your FB password (if that wasn't obvious already lol). Letting randoms log into your FB account will just have massive consequences with your friends and family thinking it's you talking to them etc.
That's what Facebook used to be. I think they really lost their ways when they went from "useful tool" to "let's try to get users to spend as much time as possible".
I dream of a "facebook-like" app where you can only add someone as friend via a bluetooth protocol, forcing you to only add people that you've met in real life. Then text only, or with very limited image options.
That's my best guess. And for them to log out every user in the world makes me incredibly curious about what would have happened if they'd chosen a different course.
I think I’ve experienced 2 or 3 global session resets on fb in my life. Usually followed by some kind of reason they had to protect everyone. This probably isn’t great, hopefully a precaution not an active exploit.
Note that this web page only covers the status of Meta's offerings for business users. This doesn't track the downtime of Facebook, Instagram, WhatsApp, Messenger, etc. as normal users experience them.
I think the downtime associated with other services could just be people choosing alternative sites for their social media time.
Facebook + Insta makes up a huge share of the social media market, and when they go offline, it'd be natural for their competitors to receive large sudden upticks in trafic they're not immediately prepared for on a Tuesday morning.
Could this be related to Google's log in page change? Seemed cosmetic only, but funny timing that Google's page update happens the same time all these log in issues pop up.
They've been previewing that change for weeks, and I wondered: why do they need to change it at all? Is it some product manager justifying his need to exist?
Based on their hype banners, I was ready for a major overhaul, too. Or at least something obviously different about the login flow. It sure looks like somebody just clicked the left-align button and spit-shined the typeface a tiny bit.
Which, I guess, is the best possible redesign: one that freshens up without rocking the boat.
> Could this be related to Google's log in page change? Seemed cosmetic only, but funny timing that Google's page update happens the same time all these log in issues pop up.
That rollout is staggered over time, so not all users receive it at the same time. It's unlikely to be related.
I can’t answer your question, but when I was at Google I made a mistake that caused ads serving on Google results to become unclickable. For the postmortem they had me calculate (I don’t think a dollar amount but) the number of ad clicks that would have happened during the time it was down. Of course I looked up average cost per click rates. Not sure if I could share even if I remembered, but it really put things in perspective.
Overall it was a good learning experience. I didn’t get reprimanded; several months later I got a promotion.
This is rough napkin math, no need to downvote if anyone knows the real number and this is way off :)
Meta 2023 ad revenue was $131 billion. To make it easy, let's assume an even spread for # of users and ad revenue generation per hour/minute of the day and day of the year (which I'm sure is not the case).
This would be:
$358 million per day
$15 million per hour
$249k per minute
This also assume a minute down won't be somewhat or totally offset by a spike in users when it comes back online.
Naively, divide ad revenue by time to get a dollars-per-time.
But thats naive because ad serving isn’t totally sold out so they can make up for it by increasing the density of ads in the next time window. If the outage is short, then the impact is small.
But some markets are totally sold out and there’s no making up for lost impressions.
This is something I've thought about a while back. Like Facebook probably has a "maximum number of ads shown to users per post" value. So theoretically, they have a ceiling for how many ads can be bought in a specific time frame before having to increase the ceiling/find new users.
> they can make up for it by increasing the density of ads in the next time window
Not only that but the bigger spenders will have more budget so the bidding after a large outage should return higher bids on average leading to increased profit per ad slot.
Possibly, but increasing ad density is usually negative for performance. So it’ll probably be end of bad for Meta as a whole, as people spend more but don’t get more value out.
From my experience you'll receive a partial refund - and in some instances like inexplicable overspending, etc. - you won't receive anything. This may be an exception given a full sitewide outage, though
When you've more-or-less monopolized a lot of the web's content sharing you get to tell your clients to pound sand. Where else they gonna go? Twitter? The incel white supremacist dollar is not what advertisers call "the good dollar".
For me it happened when my long-dormant messenger lighted up (for a perfectly valid reason, no surprise there) and made me go through those unbundling options presumably mandated by EU. Certainly a coincidence, but it does irrationally strengthen that satisfying feeling I get whenever I see the ad giant stumble.
Session timed out. I'm in Macedonia. In the quick login menu there was another person with name and profile photo beside my name and profile photo. It seems the girl was from my country, don't know her. I thought I was hacked. This is messed up.
IG is not working as well. Feeds, profiles, messages are all blank.
This is so funny because the status page itself actually shows 500 Internal Server Error to me in the API calls. So the status page clearly isn't isolated from the FB network itself. I highly suspect it is either a BGP convergence issue, or their OIDC service hits the dirt.
I had immediately started the account hijack process which, funnily enough, is also down. I was kicked off Messenger and Facebook and then my password was rejected by the login page. This whole flow made it seem like an account takeover and not an authentication outage.
Tangent to this but I saw a link to metastatus last week and thought it would be a status page for other services' status pages. This makes it sound like a useful thing now, too bad the name is taken.
The cable cut only gonna after the arab world, or maybe india, but eropean to india communication don't depend on them that much so it's unlikelly, so probably not
In a lot of developing countries, Facebook is often the only app that a significant part of the population uses. And that includes young adults as well.
My Messenger and FB on my phone magically logged back in a day or so ago, but on my PC, no luck - and that PC doesn't seem to recognise the password I thought it was before the outage.
Interesting how such large scale outages can ONLY happen because of human errors, in this case by a poorly thought out heuristics. Like, a literal explosion could hardly achieve this level of disruption when there’s physical failover protections everywhere.
Neither is the only Facebook status page I found, interestingly. https://metastatus.com/ At least, it shows everything green (except for WhatsApp Business API).
They can't log in to update the status page about facebook login because they use facebook login for auth on the status page!
Joking aside, I wouldn't be surprised if this was the case, because during their last major outage (something related to DNS iirc) they were having issues pushing fixes because they couldn't login because their DNS was down.
EDIT: seems like the status page was recently updated.
> I don't really trust Down Detector, which despite the claims is really People Winging on Twitter Detector.
I'm confused. Isn't listening for spikes in complaints about outages a great way to detect them? I know for a fact some service companies monitor social media channels for this purpose (among others). I'd be surprised if that wasn't more or less standard practice.
I've checked Down Detector for ISP outages in my area many times now. It's always confirmed them before my ISP did.
> Isn't listening for spikes in complaints about outages a great way to detect them?
When there's a major ISP outage, people report problems with all the major sites. When Facebook's down, people report problems with any site that has "Login with Facebook" as an option.
It's almost never actually an outage impacting all of FAANG at once.
> It's almost never actually an outage impacting all of FAANG at once.
Exactly. If you click through down detector when things are _up_ you'll see people still complaining that $site is down. Could be a local power outage or even a flaky connection in their own home.
Down Detector is one of many signal sources and should have a "credibly" score associated with it that's proportional to the number of people complaining that something's down.
I can guarantee you with 100% confidence from experience that the call centers for AT&T, T-Mobile, Comcast, etc. are all blowing up right now because of users who assume that if the Instagram app isn’t loading it means the “wifi” is broken. Also keep in mind “wifi” doesn’t mean 802.11, it means “anything related to the internet” up to and including 4g/5g and Ethernet.
Heh, as soon as I saw Instagram failing to load, I immediately assumed it was Roger’s fault. They just suck when it comes to reliability and Instagram has a much better track record.
The important step is to filter downdetector from your consciousness. It only exists as rage/cable news bait and nothing more. It is not a useful tool, it’s just a clever way to serve AdWords iFrames.
> do you really think there are masses of people who can’t tell the difference between a single sign on service being down and individual sites being down and reporting it to downdetector?
Absolutely without a doubt.
99.9% of people don’t know what single sign on means or how it works.
> do you really think there are masses of people who can’t tell the difference between a single sign on service being down and individual sites being down and reporting it to downdetector?
Ahh, I see. In that case, most of DownDetectors data are from Twitter and other sources, not first party reporting, although even in the case of first party data, it is also sourced via "visits to DownDetector" which can be from a simple Google search for "is Instagram down?"
If DownDetector relied primarily on direct reporting, they'd be the last to know.
> do you really think there are masses of people who can’t tell the difference between a single sign on service being down and individual sites being down and reporting it to downdetector?
Yes, absolutely. 100%.
> Even if there were doesn’t the outage graph give you exactly the information your asking be curated?
> When Facebook's down, people report problems with any site that has "Login with Facebook" as an option.
If users log into your site with Facebook, then the login functionality of your site effectively is down when "Login with Facebook" is down.
From the user's perspective, your subcontractors, including authentication subcontractors, are a problem for you to deal with and never show them. From your perspective, you could have architected your site in a way that logging in doesn't "go down" when Facebook login is down.
If the user chooses "Login with Facebook" over other authentication options available, and they don't want to use other options, educating them with a good error message might help. Or you could remove the Facebook login option, if you (totally reasonably) don't want Facebook's failures to reflect poorly on you.
> If users log into your site with Facebook, then the login functionality of your site effectively is down when "Login with Facebook" is down.
There are plenty of sites where "Login with Facebook" is a convenience but hardly the only way to log in. Reddit, for example, has "Login with Google" and "Login with Apple"; it would be highly misleading to claim "Reddit is down" if Google's OAuth flow was having an outage.
> educating them with a good error message might help
Nothing in the API or OAuth flow would make that doable in an automatic fashion with this outage. It'd have to be something you put up manually as a banner after hearing of the outage.
> Or you could remove the Facebook login option, if you (totally reasonably) don't want Facebook's failures to reflect poorly on you.
I don't particualrly care; we're talking about why DownDetector isn't necessarily ideal for assessing. It can be a useful signal, in some scenarios, but I've seen plenty of spurious signals come from it.
> Nothing in the API or OAuth flow would make that doable in an automatic fashion with this outage. It'd have to be something you put up manually as a banner after hearing of the outage.
That is fair: if I choose to architect my site such that a user-critical feature goes down when a 3rd party service goes down, it behooves me to monitor the 3rd party service and do whatever necessary to properly inform users what's going on.
I edited my post unfortunately after you replied, but another option is removing the parts of your site that rely on 3rd parties, if you don't want the failures of those 3rd parties to reflect poorly on you (which they reasonably would).
>we're talking about why DownDetector isn't necessarily ideal for assessing. It can be a useful signal, in some scenarios, but I've seen plenty of spurious signals come from it.
Indeed, and if a bunch of users say that a feature of your site is down, even if it's a result of a 3rd party failure: chances are, that part of your site is down, and it's partially your fault for relying on a 3rd party for that feature. The users correctly don't care what the root cause is, they expect you to either mitigate it or don't have a feature they rely upon be unreliable.
Ignore the comments on DownDetector for a moment and check out that huge spike in reports recently. Clearly something wrong happened with AWS's user experience. That's something AWS needs to resolve, in the eyes of their users.
>The chart shows a big spike this morning, but there was no AWS outage
Are you sure? If hundreds of users simultaneously reported there was some sort of outage, particularly a huge spike like we saw, chances are there was an outage.
>Again, DownDetector can be a useful "is something unusual happening right now" signal
Exactly! Specifically, "is something unusual happening right now with my site, in the eyes of my users?" Every site owner should know when that condition is true. What you think about your site "up-ness" isn't as important as what your users think about your site "up-ness". What you attribute your downtime to, isn't as important as what your users attribute your downtime to (you.)
> Clearly something is going on with AWS's user experience.
But that's not the case. It's a false positive.
Pick a DownDetector service and open the page every day for a few days. You'll see it most of the time just reflects people waking up in the US timezones.
Is it a false positive, though? The data shows there was an outage. We would need more evidence to conclude hundreds of users, at that 1 spike, weren't actually having issues.
In other words, we have hundreds of people saying there was an outage, and 1 person saying there wasn't.
That's a problem AWS needs to resolve, regardless of what they think might be the root cause. If the users weren't experiencing any issues with AWS, I doubt they'd be reporting it.
Your comment about timing is a good point: if people are working with AWS early in the day, and AWS is giving them problems, then they will probably report problems with AWS early in the day. I wouldn't expect them to report problems while they're sleeping.
Hundreds of users, representing more users who didn't bother reporting, say they experienced issues when interacting with AWS this morning, so we'll need better evidence to the contrary to conclude otherwise.
The fact that some people accessed AWS without reporting issues does not mean that all people did. For those who had issues, AWS is responsible for dealing with those perceptions.
Indeed, it could have been a fault that affected a subset of users, for example 1 service in 1 availability zone. That's still an outage in the eyes of users, which AWS is responsible for managing. It could have been an issue with a route from 1 ISP. That's still an outage in the eyes of users, which AWS is responsible for managing.
An even better example is the DownDetector page for Facebook, with hundreds of thousands of reports. Do we really think there's no correlation between what DownDetector reports and what users experience?
tl;dr: what users think about your site is more important than both what you think about your site and the reality of your site, and you should be tracking it.
> When there's a major ISP outage, people report problems with all the major sites. When Facebook's down, people report problems with any site that has "Login with Facebook" as an option.
Yes? That's how all top-level reporting is going to work. It's not going to tell you which part of your service is inaccessible. It's just telling you that people can't access it. You obviously have to do additional investigation to figure out why people are having trouble.
Even here on HN, where people should know better, people take its incorrect attribution as useful info. TikTok isn't down. X isn't down. Google isn't down.
I would completely agree that people are bad at interpreting Down Detector-type results, but that doesn't mean it isn't providing a very useful signal.
Indeed I haven't noticed any blip in functionality, but then again I don't ever do FB (or other external service) login. Absolutely no reason to do so, long term drawbacks are too serious to be lazy about this.
That's kinda the point though isn't it? DownDetector is showing an early indication of a major outage in both of your examples. The issue may not be caused by the indicated service, but it's still a useful information source especially when we can correlate reports on there with what we are seeing in our internal monitoring.
A big spike on DownDetector is an indication of something going on.
Its attribution of what/who is often incorrect. You'll see "maybe it's more than Big Site X!" comments come up on every HN thread like this citing DownDetector; it's almost never the case, and folks on HN should know better.
The problem is the source of the reports and display of the reporting.
I'd trust Down Detector a lot more if it was filled with Hacker News community -- people who are able to understand that there's "DNS" and "Routing".. and that your phone can have internet access at home while your home PC does not.
I personally hate Down Detector's graphing because it can make it 'look' like there's an issue when there isn't really... Facebook with 500,000 reports looked as down as Google with 1,000 reports... For equally sized / used entities, I would not trust that "Google" is down with 1,000 reports. I had a coworker ask me what was going on with the internet because "everything is down.. Facebook, google, gmail, microsoft!" (when seeing the Down Detector home page)
DD should normalize the graphs against the service history in some way. A service shouldn't spike because it had 30 reports / hour for a day, then suddenly has 100... when it has a history of being out with 100,000+ reports. The 100 reports are probably mis-reporting, but you can't tell until you dig into each service, one by one, with separate page loads.
The OP means there is a lot of collateral noise from people who are just tech savvy. Eg. “oh no, I can access Facebook, my internet must be down. Let me login in Down Detector to file a complaint against my ISP”
In the Twitter operations area there was a big TV that streamed searches for #failwhale etc. It was actually very useful to detect problems with Twitter by looking for people complaining about Twitter on Twitter.
YouTube was definitely doing something weird that doesn't seem likely connected to Facebook.
A couple hours ago after watching a video I went to my home page, which usually shows recommendations based on what I've recently watched plus a few videos labeled as sponsored that have nothing to do with any of my interests.
Instead everything on the home page was either a sponsored video, or a movie that was free to view with ads, or something from one of their music products.
I tried from an incognito window to see if it had something to do with being logged in. Normally going incognito loses the history-based recommendations but at least recommends user uploaded content. But now it has just like my logged in home page. No user content. Just ads and videos from Google's movie and music services.
Refreshing gave an error that said something went wrong. I then logged in on that page and again got something went wrong. Another refresh got a page with some user content. Another refresh was the ads and Google stuff page.
A little later it seemed to clear up and now my home page is back to normal.
Yeah, it's wild that it's now treated as an authoritative source, especially by some news organizations.
It's as good as asking a neighbor what happened with a loud noise down the street. Sometimes you'll get something good, sometimes it'll be completely wrong.
Downdetector is nice because it answers my question of "is anyone else having issues with this?" When it takes AWS an hour to even acknowledge "increased error rates", and tells me that everything is a-ok in the meantime, I want another perspective.
Twitter's search used to be my go-to for this - a search for "AWS down" would typically be very illuminating - but it's tough to get it to genuinely spit out the most recent tweets with a keyword these days.
> Yeah, it's wild that it's now treated as an authoritative source, especially by some news organizations.
> It's as good as asking a neighbor what happened with a loud noise down the street. Sometimes you'll get something good, sometimes it'll be completely wrong.
Asking my neighbors if they know what some loud noise was or about some local disturbance has been extremely reliable in my experience. The one time someone gave me an explanation about something which wasn't mostly right they qualified it with something like "So-and-so said it might be such-and-such but I don't know if it's true".
You must have an exceptional neighborhood. Everywhere I lived, here's a handy map of "actual cause" :: "what the neighbors said it was"
Car exhaust :: gunshot
Appliance delivery truck liftgate :: gunshot
Transformer explosion :: gunshot
Garbage truck :: gunshot
787 at 25000ft :: complete ruining of peace and quiet
Any police activity :: probably someone robbed a bank
For the record, my city has (statistically indistinguishable from 0) homicides and bank robberies and, by American standards (I know, I know) no particular issues with gun crime.
I can imagine it being different in a city. I'm in a fairly quiet suburban area.
One time I heard a loud boom. A few hours later I saw a neighbor outside and asked if he'd heard it and if he knew what it was. He told me a house a few neighborhoods over had exploded. I was a bit skeptical of it but he turned out to be right.
i trust Down Detector more than the (majority of) companies who are silent during outages
hell, i'm surprised Down Detector hasnt been outright sued due to the graphs being an actual honest representation of availability that shitty companies cannot hide
“Yeah so it turns out when Facebook and Instagram goes down so does Google”
I do not envy the SREs at either company. I'm pretty sure all those other ones use Facebook or Google as their OAuth provider which is why they are all being reported as down.
It's real. Single core performance improves all the time. People overestimate how much power it takes to handle lots of queries per second on a well-tuned system and well-written software in 2024.
I see the "sorry, we are receiving too many requests, try again in a few minutes" error several times a day on here. I don't think that HN is reliably able to handle the amount of users it currently has.
I believe that's by design if you send an action request very quickly after a previous one. It's very easy to replicate. Open a post. Then click the upvote button and very quickly click the favorite button too. That will trigger it. I think it's used to rate limit.
>"Sorry, we're not able to serve your requests this quickly" is our little server process saying "help, I only have a single core and I'm out of breath here". If your account were rate limited it would say something like "You're posting too fast, please slow down."
dang, linked in one of the ancestor comments. But I still suspect you are correct.
I have been using HN daily since I was a teenager. I've seen that message maybe 10 times outside of serious issues in last 15 years. It's strange to me that it happens so frequently for you.
There's a manual rate limit that can be assigned to your account if you post frequently on controversial topics. Afaik once it's there it stays until removed by a moderator.
I've been seeing this multiple times a week for the past couple of years. It's gotten worse since 2020. I think that they are preparing to upgrade it, or did upgrade it?
That is a generic rate limiter that is independent of system load. As far as I can tell if you make more than one request per every 5 seconds, you will always be served the rate limit page.
I looked up the CPU mentioned in the link from your other comment. It looks like HN handles enormous traffic on about 2x the power of the last Celeron chip ever made.
It's not a lisp thing. Many lisps are capable of multithreading, including implementations of Common Lisp that have had it far longer than HN has been around, and Clojure, which is extremely good at it.
It even looks like Arc, the lisp HN is written in has threads now, but Arc is built on top of Racket and uses Racket's green threads, so it only takes advantage of one CPU core. Racket does have OS threads, but Arc does not use them.
Down Detector is so unreliable. People that can't call an AT&T phone via Verizon will think (and report) that Verizon is down, when it's really AT&T. People can try logging in using Facebook's on click login and not be able to get in, so they think Tiktok is down. It's not all that useful. I hate when journalists cite it.
It has false positives and noise for sure, but it's also very sensitive and shows issues very quickly.
I wouldn't trust it as a single source, but in a case like this where our internal monitoring shows a spike of issues with the Google APIs and we can see a huge spike in reported issues for Google on Downdetector starting at the same time, it's useful to confirm that the issues have an external source.
It's only slightly better than "my mom claims". My mom would ask if I had the internet at my house. Yup. all of it. in a rack in my bedroom closet. She'd also report the "internet is down" when a single website was having issues. To me, that is down detector carrying on the legacy of moms everywhere.
and something is usually happening. the issue is that a lot of end users (so the people that down detector is pulling from) don't understand the systems well enough to point to where that something is and will often misattribute it, which is what both parent and grandparent is claiming.
Eh, mostly it's people misunderstanding what it represents.
If I can't login to tiktok because FB is down, then tiktok is effectively down for me. When it comes to technology most people don't care about the trip, they care about the destination.
So yea, tiktok isn't "down" but for a lot of people it might as well be, hence coupling your infrastructure/auth on other providers has side effects like this you must take into account.
I made that same mistake after seeing someone post an unlabeled set of graphs to a Slack. The Google peak reported outages is about 0.25% of the Facebook peak. It seems reasonable some people just made a mistake.
> The fact that twitter was usuable the whole time does.
That's an assertion, not a substantiation. A single tweet does not corroborate that, even if you ignore the fact that most outages of large global services (including some of the outages of these Facebook properties mentioned above) are actually partial degradations.
Most of those seem OK for me now, and DD agrees. This seems to have been a temporary blip for all of them, possibly some kind of service switchover/fallback "not entirely unrelated" to the Meta outage?
Edit: actually a more attractive theory, given the very short timelines and near simultaneity of all those failures, is that downdetector itself had a failure, possibly a Meta-dependence, that they noticed and corrected quickly.
Interesting to see that all static content was still working during the outage (at least for Instagram).
It was still possible to swipe through all reels (I assume the list was cached).
All the electronic candy/soda vending machines in our office are also not working. Shudder to think of the chain of dependencies inside these machines.
A lot of sites have features that say “log in with your X or Y account.” They connect to each other somehow. I never studied that protocol. I wonder if authentication failures across services could be tied to it.
For process of elimination, do all of these services do multi-platform logins? Or do some not connect to anyone else?
Yeah, I had issues accessing GCP's documentation site for AlloyDB around 8:00-9:00ish Eastern this morning. The page just said 'Service Unavailable'. Had to use Google Cache.
If your election integrity relies on Facebook, YouTube, or even DNS to be up... there are bigger issues.
Actually, if all of them including Xitter went down, maybe things would get better? All the sunlight photons might get sucked in by too many eyeballs though, and there could be grass trampling.
What's your fave conspiracy theory? Massive cyberattack for Super Tuesday? Powers-that-be mandated takedown? Mossad sleeper agents activated? Covid-brain struck that one engineer attending to that one wire that kept everything going?
This is also happening to me. When I try to reset my password using their password recovery I get "An unexpected error occurred. Please try logging in again." For some reason my phone number is not working as a way to get a recovery code, same error. I can't get the recovery code sent from the app or from my browser on my laptop, but when I use my chrome browser on my pixel it sends me a recovery code, which results in "An unexpected error occurred. Please try logging in again."
Seems like it. Pretty messy too because it sometimes pushes you to reset your password which then doesn't work so there are going to be a LOT of reset email codes floating out there.
Agreed. Probably could be a much better UX for handling a mass outage like this. Graceful, clear error messaging that FB login is down would be better than the current UI.
Triggering millions of people to unnecessarily reset their password yet still be unable to login is not a great UX. This seems like one of those cases that's high impact when it does happen, never likely to occur on any given day, but likely to happen at some point; probably just wasn't much focus put on handling a case like this.
From a process/QA perspective I doubt this can ever be properly tested.
Sure you can set up a UX to show that the auth server is somehow down and discourage users from trying to login/reset passwords, but when shit hits the fan, you actually never know the precise error that gets thrown to the client because it could be any layer between the backend and the client that failed...
The domain reads a lot like "metastasis" which fits quite well with the social media landscape and oodles of terrible "suggested" content on Facebook in general
Down here in South Africa too.. it's weird that it's not affecting everyone, yet it's worldwide.. It's almost as if it's taking down a percentage or section of addresses/uid's on their database... I wouldn't be surprised to hear that everyone affected, joined the app at a similar date, or the sequential uid's.
To me this sounds like more than a bug.. I wouldn't be surprised if it's a hack.
I agree with this 100%! I do not believe this is just a little bug. This is more. There’s been a lot of things happening that people are trying to ignore. But this just adds to it..
I find it remarkable that Typepad, my blog host, not only isn't down but also is MUCH FASTER than its usual slothlike, approximately 5 seconds response time.
I've known several people who, when they got signed out, couldn't remember their password and access their email, and so made a new account. I'm sure Facebook will see a spike of this.
yeah i'm hoping it was just an auth server outage and browser sessions werent invalidated because this basically described my situation, but i wont have access to the only device i own that still (hopefully) has a an active login session until the weekend to find out
Google login also seems to be having issues, multiple people reported to me that the login isn’t working and they’ve been logged out of their Google accounts.
Yes, I tried logging in today in two distinct Google accounts on separate Chrome profiles and it would sign me out in about ~ 5 seconds after logging in. And the login process was very sluggish.
Yup. That it’s related to the elections is also predictable, due to stress.
Made worse in big corp due to affirmative action + lack of enough qualified candidates meeting diversity criteria.
Which is inevitable when you have coarse criteria applied to such a large industry this way so quickly, as it takes decades for anyone to be qualified for the senior roles, and many years for junior/mid level, even if there were no pipeline issues, which there are.
And unqualified folks in leadership, and mid level == stupid mistakes.
And, with the DOL rules, the company can’t even pay people differently, so no bueno even giving the high performers keeping things afloat better bonuses - unless they happen to meet the diversity criteria and it makes the stats look good.
Which it’s already hard enough to do properly when there is only one dimension, and impossible when there are 2-3.
so the bigger the company, the faster it has to cut its own throat.
Bwahaha, just wait until you see the shrapnel flying over the next year.
You don’t think the steady erosion in system reliability and ever increasing outages is unrelated to these pressures do you?
I’ve seen the sausage being made at the middle manager level in big corp for a long time. It’s never any one person/hiring decision, but the pattern and it’s impact has been obvious (and getting unavoidable) for a long time.
That no one seems to want to talk about the actual issues, but doing character assassination and black listing (like this comment) is part and parcel of the problem.
> You don’t think the steady erosion in system reliability and ever increasing outages is unrelated to these pressures do you?
Outages have steadily decreased at major companies. I don't know what you're looking at.
Remember AWS taking out a good chunk of the internet many times a year because their east coast data center kept going down? Remember the fail whale meme-ing because Twitter was so unstable?
Industry site reliability has only gotten better over the years.
Bwaha, so now everything is actually getting better and more reliable in big corp land!
I’m sure AT&T, Google, Facebook/Whatsapp/Meta, BofA, Apple, MS, and many others who have had prominent massive outages and embarrassing product launch failures this year will be happy to hear this.
Notably, Amazon is one of the few companies that has managed to avoid a lot of the DEI noise somehow. Perhaps due to their reputation for having such a brutal work culture already?
I can’t wait to hear what you’re going to say next.
Big Corp Software quality improving AND running faster on existing hardware?
Your spitball is flying in a completely different universe than my spitball. Imo anytime you have layoffs that cut so deep, you cut through informal capability, knowledge and relationships that take a long time to form. If anything DEI helps create internal resilience because the personal networks end up a little different, giving you wider and more angles of coverage.
HN talks about people in open source holding up major functionality with little to no recognition. That happens within corporations too. Indiscriminate layoffs may directly fire those people, or signal to them that it's better to move elsewhere leaving gaps that only get discovered over time.
Of course, and not breaking anything with layoffs is already hard when your sole criteria as a manager is ‘are they effective’. Which it’s never been that simple, especially in big corp, but it’s waaay more difficult now.
And so what happens when you’re required by the gov’t and leadership to also comply with coarse grained population statistics AND you can’t find qualified people that meet those statistics enough? On top of having to make layoffs?
My ex was a reasonably qualified software engineer, and even 4 years ago was getting no-interview offers because she was a woman - as explicitly stated by the recruiters.
It’s only gotten worse since then for hiring managers. She was offended because they literally didn’t seem to care if she was qualified or not.
I can provide links to signed and in force legal agreements between the DOL and Google for instance which formalize the need for this, and can point towards public records of evidence submitted to court of emails (internal) between recruiters which state the same too, btw.
Actual job qualifications (as in skills) did not enter the conversation at all. Just course grained DEI attributes.
So then they end up disproportionately cutting from the non-protected tranches (the groups that DO have to be qualified to stay) first because your stats still have to look good. I’m not saying DEI folks overall have no one qualified or hard working in them - rather, that there are little to no structural incentives for them to be. In many cases, they’re also unfireable/unlayoffable.
And eventually, the non-protected folks leave, burn out, or give up because f-this. Why do so much extra work when you literally can’t even get paid more for it, or be recognized because it will piss everyone else off?
And even if you’re superhuman on that front - everyone burns out eventually. Which is also why you tend to see what you see in Open Source.
It’s never any one decision, but stochastic movement in this direction has been relentless and inevitable.
I had just added Google Authenticator as a two factor logon for both Instagram and FB not ten min before it happened so I thought it was something got messed up when setting that up. I got a msg on FB saying session expired and tried to log back on with Unexpected error. Then thought I was hacked until my husband confirmed same on his account.
In my country (Georgia), I received dozens of reports from people saying that after the outage, they have been offered to sign in to other users' Facebook accounts and were successful in doing that. I can't confirm it, but it appears that this was happening to accounts that were co-admins of Facebook pages.
I can access what I call my family FB account, which I run on Firefox.
When I try to access my general purpose account I'm forced to log in again. And when I try to change my password I get the "An unexpected error occurred. Please try logging in again." message.
I suspect that one of the password servers has been compromised.
I tried to do a SMS verify when I couldn't login, and the text never came. That service must be overloaded. That's when I realized it was an outage. Hundreds of thousands (millions?) must be doing that.
Also, going to /r/facebook doesn't load, heh, there must be per-subreddit load issues?
Shortly before I received password recovery codes to my email, from all facebook accounts I have.
My email requires 2FA, as does my facebook account, but when I went there and clicked "forgot password" there was an unknown email address added to my account. That shouldn't be possible.
Glad, I found this. I literally just told me boss I spilled my coffee on my desk so I could have a break and figure this things out. F'ing FB... Thank you to whoever started this post. I tried resetting my password many times and also thought I was HaX0red.
I reaceived a highlight from a friend on fb tagging me in a kfc promotion, i clicked the add to the promotion and liked , followed and text @highlight, after that i was completely logged out from fb and instagram. With passoword being rejected etc.
I was logged in on mac, and the page refreshed and logged me out without any interaction from me, while I was reading a post. Then I went to check on my phone and I got the "We had a problem with the page you tried to reach." dialog box.
I was logged out of all devices and told my password was incorrect. It looks identical to a hijacked account attack and was quite scary until I started seeing similar stories from others.
not only Facebook, Google loggeed me out as well, and had trouble logging back in. downdetector shows nicely that almost everything had a hickup. what are the chances of this?
Every outage there is a discussion about how these status pages are failing to adequately notify and describe the problem. Is there anyone out there doing it right?
They recently started demanding that I log in on a phone app, to make some choice to use messenger on ipad. That is pretty much the end of messenger for me.
Does FB still have that corporate chat/slack competitor offering? Personally I’m glad for the FB “forced” break, but may not be great for those users..
> They are not down. (They just don't work for lots of people!)
Seems about par for the course for big tech these days. There's currently an issue affecting the Google Ads API causing timeouts when sending data to it, but the Google Ads Status Summary page shows nothing [0]. However, there's an incident detail page showing some vague hand-wavy information about incidents [1], which appears to be unreachable from anywhere on the Summary page. Gah.
p.s.: The actual incident details URLs are available in the "RSS feed" link very transiently and tend to disappear -- the feed, which incidentally for fun reasons, is actually an Atom feed.
According to them Whatsapp Business on premises solutions has issues since end of February..
But also looks like that WAB is the only product with API issues according to the status page.
I'd say "down for lots of people" is the same as "having widespread issues." If it was just a single person, then it would be annoying but I wouldn't assume it's a system problem.
My point was just that, of course, the status page is not reflecting reality.
It's on AWS, but likely just has issues with provisioned capacity now that it's actually being hammered.
Meta has a small collection of tools on AWS to deal with large SEV0 events like these. Another one of them is a basic communication tool that does not use Meta's own servers for anything (including auth), a super basic version of the internal SEV tool.
That might be the FB traffic instead going to the content producers? I mean half the internet users need to go somewhere else if they want procrastination.
I had just added Google Authenticator as a two factor ligon for both Instagram and FB not ten min before it happened so I thought it was something got messed up when setting that up. I got a msg on FB saying session expired and tried to log back on with Unexpected error. Then thought I was hacked until my husband confirmed same on his account.
my heart rate was pounding when I couldn't login and couldn't reset my password just now. Seeing this HN post made me come back down to normal. I'm now going to move away from FB, Google, and all other major tech companies I rely on.
greeting from indonesia,Here too, the same error, lots of people asking what is wrong with FB IG error, lots of speculation saying update, but the repair process is taking too long without any confirmation from meta, something strange is happening
Frankly, I don't buy this explanation - technically or logically both from a devops,and systems architecture level - There is no way in hell a company like Meta is pushing database design changes this significant to production? We all know how many times these database architecture changes get run in staging,then even production subsets before rolling out to production at large.
Should we assume the teams working to ensure Business Continuity & Applications Resiliency redundancies feel asleep at the wheel?
Also, to assume that no down or outage messaging go out during a fairly routine maintenance based outage?
I call BS
Lol I can't find scenario where this happens, at this scale at a company of Meta's scale
The 2021 outage was worse, it completely killed the network access to FB data centers for hours, and even led to issues with FB employees accessing offices (since the badge readers were also offline). It was so bad it got it's own Wikipedia page: https://en.wikipedia.org/wiki/2021_Facebook_outage
Google logged me out of an account and I had trouble logging back in. downdetector also reports Google having issues and friends also had trouble with both.
FWIW for me Google seems relatively stable, but I tried logging into a lesser used gmail account on a lesser used browser, successfully logged in for like 10 seconds, then got logged out.
It's very spooky when supposedly the two companies who probably aren't sharing infrastructure seem to go down at the same time...
Everyone beware,
The “Black Hat Hackers” that have accomplished this massive Global hack represent a Clear and Present Danger, immediate and significant threat/danger to National Security, as well as all People’s personal and financial Security
Everyone Beware,
The “Black Hat Hackers” that have accomplished this massive Global hack represent a Clear and Present Danger, immediate and significant threat/danger to National Security, as well as all People’s personal and financial Security
I have a very tangential question to this situation prompted by some conspiracy theories linking the outage to today being Super Tuesday election day in the US.
Why does Facebook need sharing news media or political ad targeting on its platform from a business point of view? Am I being naive or is it really such an important revenue driver to the business or the core experience of the app?
I think it is the source of enormous reputational damage and risk, that if I was running the company I would even happily trade 10% if not more of the market cap to removing any feature that enables news sharing on the platform.
Actually if you remove these two things (political ad targeting and commenting on news media) I struggle to find any other issue that would make facebook a "political" target, they can literally shutdown the fake news division that employs 10,000 people...
Why do you think they do this to our social medias and our phones? To have control? To track? To shut us down so we have nothing? Like what is your thoughts on what they are trying to do? As in there plans or what are they trying to accomplish
Given Facebook is like 10% of total Internet bandwidth, I'm curious what happens when all that traffic suddenly gets "redirected" elsewhere by people going to other sites and platforms. Could YouTube cope with a sudden 10% uptick in streams for example?
Edit: YouTube seems now to be defaulting to a lower bitrate, leading me to guess it's a demand issue.
Anyone else find this shit weird? First Nation wide outages with all phone carriers, mainly AT&T and Verizon and T-Mobile… massive outage… and then now Facebook and instagram. Then banks were having outages and pharmacies.. and when the phone outages accrued mine was working fine and my fiancés was not. I just find all this weird… feel like there’s more to it!
I’m seeing a lot of non-FB services down too. Mostly AWS-based, but not all. My original conspiracy thoughts (this being Super Tuesday in the US) are giving away to thinking it’s some low level routing issue.
Meta doesn't do that. For one thing, "Patch Tuesday" is a Windows thing, and 0% of production traffic is served from Windows there. For another, they are constantly and always redeploying.
More likely that someone bungled a deploy of user auth. No doubt they are rolling back as we speak.
Technically true, just as if you replace Meta with Amazon, Google Cloud, Etsy, eBay, the local city council, etc. If you don't have a diversified, multi-market, preferably international presence, you're not viable in the long term.
Of course, in the really long term, we're all dead. Meanwhile, the local mom & pop candy shops keep advertising and selling online and making ends meet.
Greetings from Ukraine. And I confirm. First I sought, it is because some war issue in Ukraine (we are at war if somebody don't know), or somebody hacked me, but decided to check HN and seen this thread.
This means IG is also down. Which means probably a network tier screw up yet again. Idk why they design a monolith network that basically can put out all their infrastructure.
Does not look like a network issue, my guess is that they fucked up something major with their auth server when deploying a change related to the EU Digital Markets Act.
I was considering buying the Quest 3 this morning, this outage is timely. The fact that I can't use my perfectly working headset to play an offline game because facebook is down, makes me wonder if I should go for a different provider. Any recommendations? Excluding apple vision pro since it is too expensive.
As a principle I think it is a good idea to avoid any technically unneccessary coupling of hardware or software to something else that is not strictly user-servicing, else the priorities are inverted in favor of the vendor's priorities and not your own.
There is a point where you learn that everything is some sort of moral compromise. There is always going to be someone ahead of you and someone behind, it means someone is always going to get left out.
No system changes that, or makes it better, just different.
The Quest 3 is kind of the obvious mainstream consumer choice for a VR headset.
Standalone wireless headset, reasonably powerful chipset, can optionally stream from a PC either wired or wireless, good optics/resolution, decent controllers/tracking, large game library, large suite of features (including hand tracking and color passthrough), all for a reasonable price. Not sure any other headset really competes on all those things at once.
Not true. You can turn off the wifi and the headset works fine.
The current problems sound like a server-side bug while it phones home. But usually it can work fine without internet.
Standalone just means the VR compute is happening on the headset itself, not on a console or gaming PC the headset is tethered to. Of course, most of the people disputing "standalone" already know that, they're just playing definitional games.
For what it's worth it does require a meta account, but not a Facebook one. I refused to buy one while it required a Facebook account since I deleted a couple of years ago. Once they made the change I figured that was an OK compromise. I just found out today during the outage that my headset won't work if I get signed out of my Meta account. That was an unpleasant realization, although I suppose it's partially my fault for trusting Meta not to hamstring the hardware their selling.
It's the equivalent of finding out that if Microsoft's auth servers go down no one with a Windows PC can use it since they can't authenticate. I'm fairly displeased.
I know they raised the price recently but it seemed pretty obvious to me they were selling these at a loss to try and get people locked in by the software.
Oh there are definitely people who avoid it because it's made by Meta. Maybe a bunch. On the other hand, it seems to be the most popular VR headset line by a wide margin.
It would be cool if Valve came out with a standalone headset, they're one of the few companies I can see that would be in a good position to do that: they already have a good amount of VR experience with one high-end headset + SteamVR APIs + a couple VR games, they have their own highly popular store/platform, they generally have a positive reputation with gamers, and they have a decent amount of hardware experience in general including the recent Steam Deck for mobile gaming hardware specifically.
And of course, a Valve headset would probably be significantly more open than the Quest. The Steam Deck has gotten some good reputation among more FOSS/hacker-oriented people for being fairly open: you can use it in a regular Linux desktop mode, you can install Windows (or presumably other OSes) on it, it's fairly repairable, etc. The default behavior is very console-like, but it's not very locked down if you don't want it to be. Best of both worlds, really.
A silly comparison. A standalone VR headset is more comparable to a smartphone or game console than a monitor or keyboard. The latter have little to no compute.
So compute requires vendor lock in? That seems silly to me.
Edit: Can we just acknowledge that a lot of the bells and whistles are for the companies benefit at the expense of the user? Thats their right, but it's also our right to want something better.
> So compute requires vendor lock in? That seems silly to me.
Correct, it's very similar to game consoles, though it is somewhat more open than those (sideloading is possible, including standard Android apps IIRC, and you can run PC VR games from other stores while tethered).
> Can we just acknowledge that a lot of the bells and whistles are for the companies benefit at the expense of the user?
It's the same model as XBox or Playstation, seems like. They sell the hardware at cost or at a loss, and make it up via software.
A fully open headset with comparable specs would probably cost much more for the hardware. From a business standpoint that would be very stupid for a company like Meta, but this is hacker news, and many commenters here see nothing wrong or silly about asking businesses to commit suicide.
This doesn't explain why its _required_. It just means there is precedent.
Your other point is better, although I think you mean it would cost the consumer more for the hardware, right? The hardware would cost the same to produce, it's just that the company would miss out on surveillance based revenue.
It's a reasonable point, fb would make less money if they made an open headset, possibly to the point that they wouldn't make it all.
But the world where fb doesn't make any headset, and the world where they make an unacceptable headset are basically equivalent to me - the former might even have an edge in that shitty relationships with corporations aren't being encouraged (like they are throughout everything tech related currently). Granted, them blazing the trail has a tiny chance of enabling a reasonable alternative to come along in the future.
But I am a bit of a Luddite, and I know that people want their toys, and they want them now.
> the company would miss out on surveillance based revenue.
More than likely most of Meta's revenue from the Quest series other than hardware is based off of, y'know, selling games. I doubt tracking what games you play to target ads in the OS is more valuable than the money they make when people actually buy games.
In Facebook or Instagram, you're looking at a space that they can shoot lots of
ads into, and it's otherwise very hard to monetize. But a gaming-focused VR headset is a different story. Most of the time you're not looking at anything that can have ads in it, but you can actually sell stuff very easily.
Maybe this'll change someday if they actually get social media shit in there that's popular, I'm sure Meta would love that, but so far that hasn't happened.
> But the world where fb doesn't make any headset, and the world where they make an unacceptable headset are basically equivalent to me
Popularizing the format is useful for pushing the tech forward. A big player pushing lots of devices means that the supply chains feeding the manufacture of those devices bulk up too, not to mention other knock-on effects like greater consumer awareness, and "free research" for whoever copies what the market leader does (at least for things that aren't IP-protected).
> But I am a bit of a Luddite, and I know that people want their toys, and they want them now.
> More than likely most of Meta's revenue from the Quest series other than hardware is based off of, y'know, selling games. I doubt tracking what games you play to target ads in the OS is more valuable than the money they make when people actually buy games.
Isn't that a great argument for why they don't need to have such a hard requirement for a logged in session? Consoles didn't have an internet connection for the longest time, though only because it wasnt feasible yet. They moved a lot of games.
> I can hear the sneer from over here, yes.
I don't mean it as judgment, I know I'm the weirdo here. Sorry if that came off rude.
> Consoles didn't have an internet connection for the longest time, though only because it wasnt feasible yet. They moved a lot of games.
Consoles had physical games. VR headsets don't. Consoles treat digital games the same way Meta is doing them here, I think; if you get logged out, no more games.
The problem here isn't that Meta servers are merely down -- losing connection usually doesn't mean losing access to your library of games on consoles, or Steam. The problem appears to be that authentication is failing such that you're actually being essentially logged out, which would definitely lose you access to digital games on every console as well as Steam.
Which, I mean yeah, that's a big fuck-up on Meta's part.
> Consoles had physical games. VR headsets don't. Consoles treat digital games the same way Meta is doing them here, I think; if you get logged out, no more games.
Again, consoles and steam do this because they want to, because it benefits them, and consumers don't put any meaningful pressure on them for doing so. It's not some kind of fundamental requirement. It's helpful for e.g. anti piracy stuff, but not necessary. It is 100% feasible to sell me a digital copy of a game and then not hang around on my system and watch me play it.
People let triple A PC games basically put rootkits on their systems. It's not like the games wouldn't work just fine (or better even!) without them. It's just that approximately nobody cares, and the companies will do whatatever you let them do.
> I view VR headsets and their peripherals as no different than a mouse, keyboard, and display
That could be valid when VR headsets were tethered to a PC via a DisplayPort or HDMI connection and essentially mirrored the display.
The Quest is closer to an iPhone or Android phone or an all-digital handheld gaming device. With integrated compute, display, battery, text input, pointing devices, mic, and speakers, it bears little functional resemblance to peripherals like a mouse, keyboard, or display with no utility unless slaved to another device.
Considering I can use my Quest with no wifi or other network to log in (once initial set up is complete), it seems that the Meta back-end APIs must have broke in some way that confused the headsets into thinking they were available when they weren't.
It sounds like a server-side bug that forced a log out somehow. Which does really suck, Meta deserves criticism for that, but acting like this means the headset "isn't standalone" is silly, since that's not what "standalone" means in the context of VR headsets.
Agree, many posts I read seem like classic "I don't like Meta/VR/big companies/social media so let me use this specific incident to confirm my biases."
As you say, there's valid criticism to be made but it's hard to find the signal through the noise.
I think the desire for "standalone" VR headsets to mean offline-capable is totally reasonable. It has its own storage, apps and games get installed on it directly, and none of its core features need to rely on an online connection.
Given that it uses its own OS, essentially, is a fair point. I guess what I meant around my monitor analogy earlier is that it has the capability to serve that purpose, possibly without the sophisticated OS that wraps the store experience, the apps/games, and other features -- specifically with being able to use it on SteamVR or your PC in general.
This makes it a device that's generally capable of using any supported source for its screens, and can pass its peripheral input to other devices, like a PC, not unlike a mouse and keyboard.
VR headsets could treat their "OS" as a minimal experience akin to an OSD on a monitor that lets you switch sources and use the peripherals more generally like a mouse/keyboard with the right drivers on the target machine.
I'm more interested in calling out that Meta missed an opportunity here, and that it's confusing that they offer some semblance of these features (wireless linking for SteamVR...) while coupling that so closely to their OS and online-only experience.
I don't know if you'll ever see this, but thought I'd reply.
First, the original Rift headsets were as you describe: lightweight, passing through the PC VR image. However, Meta did not miss out on an opportunity. In what was perhaps the most effective A/B test they could run, they released the Rift S (tethered PCVR) and Quest 1 at effectively the same time. The market feedback was resounding: I believe it was a 10-to-1 preference for a standalone experience vs. tied to a PC. Since they doubled down on standalone (or all-in-one if you prefer), well over 20 million headsets have been sold. In fact, they're so popular that even the fraction that connects to Steam is basically tied for market share with the most popular PC VR headset ever, the Index.
Second, even as a PC VR HMD it was a real stretch to call it a monitor equivalent. It's wildly complicated to create compelling VR images. You need two screens at nearly 2Kx2K resolution each, running at 90 frames/second, sustained. Dip below that and you can induce nausea. Not every PC can do that, so you need careful engineering between the client and HMD, with tricks like time warp, space warp, interleaving, compression, prediction, pose estimation, etc. to take up the slack. Creating sub-millimeter precision of location with six degrees of freedom either requires external base stations (cost, complexity) or inside-out tracking with headset-mounted cameras and a processor running realtime simultaneous location and mapping and image recognition code, which implies a CPU and tech stack to support it. Nowadays people also expect passthrough (with real-time depth correction), hand tracking (AI routines for hand posing), and more. All this is to say that significant code must run on the HMD for a modern gaming headset (Meta's target market), as well as on the PC. And if you're investing that much in a custom software stack, you can't make it up on hardware margin - the cost to build an HMD is just too high. So you have to have an app store tie-in, because Valve sure isn't going to share its Steam profits with you.
Now, certainly there have been (and are) HMDs that tried this approach. HP (G2) and HTC (Vive series) both put out quality products leveraging the Steam ecosystem. Neither are sold in volume today, because the economics of selling a headset just aren't good enough.
Immersed and Big Screen are releasing very lightweight fixed-function HMDs for either work or movie watching that do operate the way you describe. Neither are expected to be high volume devices, and both are more expensive than Quest 3.
In short: VR is much, much harder than you may realize. Meta didn't miss an opportunity, the explicitly chose the market-tested, most popular solution that also has an economic model with some potential future payoff. If you want a "minimal experience akin to an OSD" then look at the Big Screen Beyond ($999, https://www.bigscreenvr.com/) or the Immersed Visor ($1,049, https://www.visor.com/). (Note: compare the price of these hardware-model pass-through devices to the Quest 3 ($499) which also includes a CPU, battery, storage, audio, more RAM).
It's also worth noting that Quest 3 is not online-only. It works fine offline once you've logged in once (people use it on planes, in parks, in the car, etc.). But this particular issue at Meta forcibly logged out users, then the API appeared online while failing all future login attempts. Ironically, users that work offline never noticed the outage because the bug couldn't log them out.
comparing it with a monitor is rather unfair, you have to bundle the computing along with it, not to mention the applications to make it an actually fair comparison. At that point, is it still ludacris?
Only because there's a bug, seems like. Normally you can turn off your wifi entirely and the headset continues to work fine. Tried disconnecting entirely in a non-standalone headset and see how that works out for you.
Standalone just means you don't need to tether to a PC or console.
Steam Index is great if you don't care about cords / needing a PC. Easy to do VRchat, IRacing, or even blade & sorcery game sessions for 3-4+ hours without any eye strain, headaches, motion sickness, or discomfort from the headset.
It also fits over / around glasses
Biggest reason not to IMO is of the rumors around an "index2".
I'll third this - Valve is really unobtrusive about the steam related features of the index. It does have some requirements with steam for setup but if you want to run a local binary and mess around with dev tools it's extremely easy to do. It's also extremely well sealed and designed - I tend to sweat a lot and a few times I've been beat sabering for quite a while without any long term damage to the headset.
> Biggest reason not to IMO is of the rumors around an "index2".
Another big reason not to buy a Steam Index is "not available in your country". The only VR headsets I've seen in stores here are the Quest 2/3 and the PSVR2; and the Steam store page for the Valve Index (and the Steam Deck) says "not available in your country".
There may be a bug or change since I left, but I built the app library and authorization logic, and it was explicitly designed to work offline. Of course, using it day-to-day and initial setup are different and I'd imagine if Apple is down it's hard to setup an AVP as well.
This is different from it being offline, it's like the device is kicked off the associated account. A "something went wrong" window pops up with a "generate device code" button, with instructions on how to remove and re-pair the headset.
There is no way to even access wifi settings or anything else to disconnect the device from the internet. If it's still a problem much later in the day, I'll try turning off my router to see what that does.
Interesting. I had a problem a few months ago with DNS not resolving Meta servers on my Starlink internet connection, but I was able to use the UI and the apps nonetheless, just couldn't open the store or update firmware.
Seems like they really did change something in the latest firmwares.
imho this is a stance you certainly have every right to take, but good luck. If you want to be part of the world of things like VR, smartphones, etc. then refusing (on principle) to participate in things like "accounts" and "cloud" is going to cost you far more time than the number of hours a massive company like Meta may have downtime. Likewise, yes, at some undetermined future date a lot of this hardware will become a complete doorstop due to their supporting servers being taken offline, however again, if you are doing this advanced gadget thing, it'll be long after you would have decided to upgrade to new hardware anyway because it can't do any of the latest stuff.
(and yes, there are ways if you're devoted enough, to roll your own everything and run Linux on a Framework laptop, and use some kind of custom ROM on your phone without Google anything, 3d print yourself a VR headset, etc. But all of this will cost you several orders of magnitude more time than Meta outages ever would.
I think the current buzz in the VR space is the "Bigscreen Beyond" which eschews all of the nice-to-haves in order to make the headset as light as possible, and the result is surprisingly compelling.
It looks compelling for high-end PC gaming VR enthusiasts, but if GP is more of a mainstream consumer it probably won't make sense for them.
At least from what I've read, there's a bunch of downsides for regular consumers: very expensive ($1000) -- SUPER expensive if you bundle in controllers and tracking points (~$1600), needs external tracking, wired instead of wireless, no built-in audio, can effectively only be used by one person (because each one is built custom to your face), and of course it's not a standalone headset, it has to be hooked up to a gaming PC.
That's not a Steam thing though, but rather the specific software. Steam explicitly has an exit path for the user if Valve disappeared overnight that allows their downloaded games to continue working offline.
The difference being that we are discussing platforms, not the things that run on those platforms.
Steam Offline wants you to go online and perform a bunch of steps, including launching games and then enabling offline mode. Every game launch is is probably a download of at least few hundred megabytes of data. And then every game requires its own networks where your account linked to steam acc, etc. and Rockstar games iirc it you must be online when you launch the game. So the fact that Steam client has offline mode is irrelevant and misleading.
That has not been my experience. Any online game is going to be its own thing, dependent on the choices of the game company. Inherently, Steam does not require all the steps you're describing.
Sidenote, but my experience with the Rockstar launcher has been absolutely atrocious, to the point that I just avoid rockstar at this point even though I'd otherwise be interested because I've been burned so many times. That's a Rockstar issue, not a Steam issue.
In my experience, the Steam offline mode only works if your computer is actually offline (without any network connection); if you're connected to the local network but the Internet connection from your router is down, it still tries to connect to the authentication servers while starting up.
Rockstar games require you to have an account at the launch iirc. And this us kind misleading, because while technically Steam has offline mode, but not necessarily the games you purchased on Steam. But having a unified UX is why I want to use the platform like steam to begin with.
It does offer an offline mode. It does NOT work with most games, because the publishers literally can't help themselves but add more layers of DRM on top of steam DRM and most of those these days require always online connections.
Were you still connected to your local network? Next time it happens, try completely disconnecting from your local network before starting Steam, it seems to use the presence of a local network connection to decide whether to enable offline mode or not.
I can confirm that I was unable to use my Quest 3 this morning. I left it connected to the internet, it tried to phone home I guess, and then locked itself into a "please connect this headset to your Meta account" state.
I am so sick of companies "selling" computers that they continue to control. In what universe does Meta have the right to remotely lock my headset and prevent me from using it to run the software I installed on it? If I were to sell my current desktop computer, or phone, or whatever, on any marketplace, and leave a remote login account on it that I then used to continue to operate the computer as though it were mine remotely, installing software, playing games, and occasionally peeking at what the current owner was doing, that would be obviously criminal. How is this any different? Because I signed away my rights when I "agreed" to their Terms and Conditions box (which I was compelled to do to use the hardware I purchased)?
Something is so fundamentally broken in the current ownership/property landscape. We somehow ended up in a world where people don't own the most critical tools in their lives, companies have managed to recreate feudal fiefdoms within the bounds of the market.
Ubisoft Director of Subscriptions really opened the floodgates of bad behavior when they came out saying "Gamers need to get comfortable with not owning their games".
I think these companies need to be reminded they do not own our PCs either.
I'm really starting to like that mantra of "If buying isn't owning then piracy isn't stealing".
I can understand your frustration but were you not aware of the software lock in when you bought it? I'm not defending the ownership erosion, but I avoided these things specifically because of who was selling it and how it was locked to them.
I was aware. They are the only game in town when it comes to standalone VR. I want to play BeatSaber, a game I purchased when I used an Oculus Quest, and the only way to do that now is by subjecting myself to Meta's whims. I compromise on my ideals to have nice things, but will continue to complain when I feel that I or others have been wronged.
Not OP but I bought a Rift when it was still just Oculus.
...then Facebook bought Oculus...
...and then required you to have a Meta account to continue using the Oculus drivers.
It's a real "boil the frog" strategy and this is still early days for VR in terms of realized market value. The time to push back on this bullshit is yesterday. As we can all see, nobody can compete with Meta on price with the Quest 3, but the cost to purchase is heavily subsidized by the expected futures.
If you bought a Rift before facebook purchased them I wouldn't call it boiling the frog, more like being stabbed in the back. Not much to do there but sell your device but I guess most people probably hoped things would turn out differently than they did. This is one of the most infuriating parts of America now, if you hate a company and never want to interact with them some merger comes along and throws you into being their customer again against your will.
Of course, OP owns a Quest 3 so its more cut and dry there.
You can’t have everything. If you want a VR headset you described, there may not be any good ecosystem of software yet and you’d have to wait. This means you are gonna use your headset even less, think about how often fb goes down, it goes down for 2-3 hours once a year and it also has to coincide with the time you are using it? It doesn’t make sense to be this risk adverse
Lucky you didn't already have a Quest 3, and you were in CyberSpace when Facebook crashed, because then you would be trapped in CyberSpace and die. That's how it works, you know.
Whatever you do, get one of the all-in-ones that doesn’t require setting up tracking beacons. They are universally a pain and will cause you to never bother playing.
So it's a bug of the "automated remediation that makes things worse" family.
This system that checks for invalid values in the cache looks like a very bad idea in the first place as in my understanding it checks things beyond "is the cache up to date?".
> The cut lines include Asia-Africa-Europe 1, the Europe India Gateway, Seacom and TGN-Gulf, Hong Kong-based HGC Global Communications said. It described the cuts as affecting 25% of the traffic flowing through the Red Sea.
Pretty neat if a week after a cable is cut, FB falls over.
Especially when most of the source of truth databases are in the US and Europe, and that sort of data flow doesn't cross the Red Sea. FB has datacenters and points of presence all over, but outside the US/EU it's almost all caching.
It's not unreachable. I can easily see the FB page on my browser. It's just that even after resetting my password it doesn't accept it. Probably something's fucked up in the credentials database.
Those lines were cut yesterday, so it seems like a poor candidate for explaining the current outages. Likewise the geography doesn't match up with the outages.
We have satellites. We use cables b/c they lack the speed and bandwidth necessary to support the total requirements of the modern internet. Satellite-only is only feasible if you're fine with going back to waiting minutes for your saucy jpegs to load (elder millennials, you know what I'm talkin' about).
ever heard of Musk's Starlink? From thier website "Starlink users typically experience download speeds between 25 and 220 Mbps, with a majority of users experiencing speeds over 100 Mbps" - https://www.starlink.com/legal/documents/DOC-1400-28829-70
Tip for Meta engineers: when your service is failing, don't just log people out and prevent logins. Display a cute image that shows that the service is drastically failing (like a whale or something), and then people will know to stop trying to repeatedly log in. The public might even come up with a catchy name for the whale.
Beyond unbelievable that going on an hour later, they're still showing "incorrect password" errors. How many hundreds of millions of people have wasted time frantically trying (in vain) to reset their passwords and pointlessly freaking out that their account might be compromised? What a bunch of careless, incompetent excuses for engineers.
Imagine how many hundreds of millions of users waste their time using instagram and facebook on a daily basis. Safe to say they don't mind wasting their customers time
What a poor bunch of overworked human beings, with almost no control over the product they work on. Frantically following the whims of managers, reduced to labour units in this late stage capitalist hellscape.
It probably depends what team you're on, but I would not describe it as "pretty shitty." Being oncall for a 24/7 service sucks, yeah, but for my team it is one week a quarter and I haven't had any outside-of-biz-hour alarms the last few shifts. Other than that -- my work is challenging and interesting, my colleagues are friendly and smart, and my manager is decent. Not a lot to complain about.
Major outages are periods of intense stress and extremely difficult to operate in. The folks troubleshooting may be many things, but careless and incompetent are unlikely to be among them.
I can almost guarantee you're getting mercilessly downvoted because half of the people here are sympathetic Meta worshippers who desperately (1) wish they worked there and (2) know they'd probably contribute similarly to this same horribly engineered system.
Having had a look at desperate Twitter posts during a major outage of a big German email provider with similar failure mode (login failed silently), it seemed like many people assumed that their email account was hacked. Close enough.
Right? I know I was like "oh, I haven't typed in my FB password in eons... maybe I changed it at some point and forgot? But if I change it what happens to all related services, is it going to log out my kids' Messenger Kids devices? Those are such a pain to log in. Should I change my password or not? What do I do?"
Surely the repeated logins can't be helping the situation. I suppose it is entirely auth related across all Meta products. The repeated strain could pose a cold start problem for example.
> Display a cute image that shows that the service is drastically failing (like a whale or something), and then people will know to stop trying to repeatedly log in.
Probably not so easy to implement in behemoth apps, consisting of 20'000 source files...
For anything outside like-ing and post-ing, facebooks UI/UX is horrendous. Even Internet search does not help to find out how something trivial is done... The only way is to watch youtube videos.
> For anything outside like-ing and post-ing, facebooks UI/UX is horrendous.
It isn't perfect for even that IMO.
> Even Internet search does not help to find out how something trivial is done... The only way is to watch youtube videos.
That says more about where the web is heading than about facebook. Video is easier to monetise ATM⁰, and these days people don't put helpful stuff out there just to be helpful as much as they once did¹, so content creators are making them instead of simple web pages.
--
[0] Everyone wants to be the next big influenza who doesn't need a day job to get by.
[1] That sort of people are still out there, though they are somewhat drowned out as the signal-to-noise ratio heads inexorably towards “WHAT? WHAT?! I can't hear a single thing above the manscaping adverts!”.
Unless it's an outage in their ability to log into their own servers, they should be able to swap out the login page with a static HTML page explaining the outage. Maybe a 503 status code.
Yes of course it was. The point is, an hour later, they could have hit a circuit-breaker to get people to stop trying and going crazy over an error that is completely inaccurate.
>Statista
https://www.statista.com › statistics › facebook-global-dau
Feb 9, 2024 — During the fourth quarter of 2023, the number of daily active users on Facebook reached 2.1 billion, a minor increase on the previous quarter.
So why is this site metastatus.com and not status.meta.com as mentioned in an earlier article here the east amount of domain names these big corps have is not helping with making sure this is not a scam. So why metastatus.com and not metastatuscheck.com?
Sure that the buffs at Mets could come up with an independent machine that can show the status even if everything else is down.
Yep. I've also noted that the people making such claims never seem to cite their own work as an example of how to implement something at Facebook or YouTube scale that is less "brittle".
Armchair quarterbacking isn't just a U.S. football phenomenon.
Frankly, I don't buy this explanation - technically or logically both from a devops,and systems architecture level - There is no way in hell a company like Meta is pushing database design changes this significant to production? We all know how many times these database architecture changes get run in staging,then even production subsets before rolling out to production at large.
Should we assume the teams working to ensure Business Continuity & Applications Resiliency redundancies fell asleep at the wheel?
Also, to assume that no down or outage messaging go out during a fairly routine maintenance based outage?
I call BS
Lol I can't find scenario where this happens, at this scale at a company of Meta's scale...
Is it coincidence that Meta is trying to get portions of NSOs source code via the court system. Also today is apparently a big voting day. The techno-apocalypse is near.
My shadow is extremely good at mimicking my every move, but I don’t live in fear that it will make the jump to the third dimension, kill me, and assume my identity. Should I?!
Same reason we're also not talking about the ten million non-coincidences that didn't seem to happen today. It's a kind of survivorship bias that coincidences seem like they mean something.
Or you could imagine it as this: your animal brain has evolved to notice patterns. Seeing coincidences like this is akin to seeing faces in the stars. The challenge of evolved being is to override those impulses when they're not logically sound.
Sure. But if millions of evolved brains are "seeing the same pattern", they will behave as such, and so the meaning of that otherwise random noise in fact becomes something material.
We've seen continuous layoffs for the past year or so, and the stability of everything is wonky now. Yesterday I had trouble with LinkedIn, last week it was Coinbase, a month or so ago Gmail was hanging. I don't think it's because of Super Tuesday.
If this was 2016, I'd think there could be a possibility. Given that the candidates for the election are almost 100% a foregone conclusion at this point, what purpose would it serve?
Fueling clickbait conspiracy theories is one probable outcome that comes to mind. For example, even a winning candidate may claim that a "woke" employee sabotaged the site in an effort to subvert the will of the people to have free and open discourse, they won despite the media deck stacked against them, etc.
I have yet to see one plausible reason for why this would affect primary voting in any way. Voting is done at the local level on local hardware; we don't vote on Facebook.
Messenger is also down. For some people this is an important means of communication (many young people I know in the US). People may either be in a panic or otherwise cannot coordinate sufficiently the plans they may have had to get to the polls.
That being said this year's primaries have got to be historically uncontested that it could not matter less.
Grassroots organizers use FB and messenger and Insta for “get out the vote” communications. People use these services to goad their friends to go and vote. People use FB as a search engine to identify their polling place.
It's like a power plant grid failure, except for attention instead of energy.
When meta is down, a hoard of internet users desperately seek somewhere else to place their attention... But the system is designed with the expectation that meta will take all that traffic... And boom! everything starts falling over. Wild.