One interesting thing I noticed is the ALT text that is displayed in place of the missing images. It shows how their image recognition is classifying everything. I guess a lot of my friends drink as a lot of the alt text was like "Image may contain: drink"
I noticed that too (mtg meme group just showed up as a wall of "image may contain:text" placeholders), but I didn't realize it was an outage and I thought that I'd accidentally turned on some kind of really annoying safe-mode.
My concern there is my understanding has always been that WhatsApp implements end to end encryption through the Signal protocol, how can images be stored and be considered end-to-end encrypted? So are only text messages end-to-end? Ignoring logs being uploaded to the cloud.
I wouldn't send any sensitive pictures over WhatsApp then.
Because the actual data for those images, that is stored on Facebook's media servers, is just a blob of ciphertext. Only the sending and receiving clients actually have the keys, not the servers.
That’s also what it looks like to me. The services themselves work and are responsive here. But oh boy, images have been down for the longest time now for being top websites.
All of my VPS servers have been slammed in the last 5 days with bruteforce SSH attacks. I've never seen anything this aggressive. One of my servers got PWNed, which has never happened to me before. There's definitely some wild, shady shit going on.
I highly recommend installing fail2ban to automatically firewall IPs with consecutive failed attempts, or if possible, disable password authentication altogether and use key auth.
Solid advice. I had fail2ban installed and enabled. SSHD Root/password login turned off and only ssh key had access. My firewall was also airtight, or so I thought. Clearly I mucked up a config or setting somewhere because the odds of someone getting past all that are extremely low. One thing I had not prepared for was IP spoofing which I learned can be prevented with a few net.ipv4.conf tweaks. I also just purchased a static IP from my provider so I can lock down ssh access even further. Here's hoping I never have to deal with this headache again! fingers crossed
You don't generally "get past" your firewall rules and into your box unless you have accounts that are not password protected.
If you really had password logins turned off, you need to identify and isolate how they gained access before you put that box online again. Never "hope" or "cross fingers" that it doesn't happen again. Unless you are an interesting target for some reason, chances are that these attacks are automated and you are running some insecure software somewhere.
Start by taking a snapshot of the machine before you do anything else. Go through the logs. Are there any unwanted processes? How were they started? Are there any unwanted binaries in the filesystem? How were they uploaded? Try to find IP addresses that that be tied to any unwanted login, and see search your logs for any previous occurrences.
Pay special attention to any web-reachable software you have installed.
If they exploited a service or Web app they might have gotten shell access. The chance of gaining access through ssh with fail2ban is extremely unlikely, unless fail2ban was badly configured.
How did you discover the breach, and did you determine the vector? My guess is that it was a pivoted breach from another system on the LAN such as your PC.
I'm still picking up the pieces but from my logs I can see that hundreds of successive login attempts were made from different IPs, effectively circumventing fail2ban with what I can only assume is some form of automated IP spoofing. I'm hoping that strict ipv4 settings and ssh ip range restrictions will mitigate this in the future. I also used this python script to harden my SSH security with better algorithms. https://github.com/arthepsy/ssh-audit
It's scary to admit this but you are probably right. The first thing these bots do is use server resources to scan ports and brute force their way into other machines. I don't want to think about how many machines are pwned like this. Very sobering!
+1 for disabling password auth. On the other hand I find fail2ban pretty useless these days. Most attacks I see seem to come from botnets where you only get a few requests from each IP.
I don't trust key-only auth; what if I need to access my machine from a new computer I haven't done this with before?
Is there any way to configure SSH to use a custom high-entry password that's different from the user's local password? My local password is something reasonable for me to type regularly (e.g. for sudo prompts), but I'd love to have a super long password just for SSH that I have to copy from my password manager each time.
If this is a scenario that happens often, you probably should invest in a portable method to hold your ssh keys, such as a smart card. I have only used YubiKey for this purpose, but I'm sure others, like the Nitrokey, work similarly.
This could be done simply by creating a new user with your super long password. Lock down ssh so you can only log in as this user, and let them `sudo su` into your normal working user.
I mean, yeah, but I don't want to do that. Partially because I don't want the friction, and partially because that won't work with any other tools that tunnel over ssh (e.g. sftp).
There will be friction, but sftp and port tunneling - which are my most used fwatures besides plain ssh - should be possible?
I mean either you sftp to a shared folder that can be accessed from your regular user as well or you use a staging area with a cron job (or you load/unload the staging area manually.)
If I'm sftp'ing to my server it's because I want to access my files. Not a special shared folder that I then have to separately ssh in, su to the real user, and move into place.
I'm not deploying a website so a staging area isn't applicable. This is just a VPS that I use for various purposes.
If your ssh user has the right priveleges, you can read and write your real users files as the ssh user just fine. I do get that this isn't ideal though.
There are definitely cases where you might need to expose a an SSH setup where passwords are allowed to the World, but there is usually little reason to allow anyone to log in directly as root.
Set a long password for the user you log in as (correct horse battery staple-style passwords are perfect for such things), and make sure to put SSH on an alternate port to keep the more basic bots away and thereby reduce the noise.
Having to type a long password for sudo promts is a bit of a pain, but that trade-off is worth it from a security perspective.
"a custom high-entry password that's different from the user's local password?"
You mean an RSA private key? (no really, that's exactly what it is... you can put your private key in your password manager and copy it to your computer)
I can manually type in a 30-letter password that I can see from my password manager on my phone. I'm pretty sure I'm not going to be manually typing in an RSA private key stored on my phone.
You should disable password auth entirely and whitelist the IPs you connect from in your firewall. If you need to conenct from a large range of possible IPs, use a bastion host with 2FA and restricted to IP blocks from countries you actually connect from.
Several european banks are also having issues with their login sites and online services. Backends are working though. I know, it's far fetched to throw them in the same bucket, and I won't, but it feels like there is happening some kind of reorganization on the net, which in the worst case would be some kind of global state-sponsored MITM-layer getting set up, maybe to prepare for new techs like 5G and TLS1.3.
If I were going to buy into any theory other than coincidence my bet would be on an emergency patch situation where we're seeing the rushed deployment.
I agree that the probability of these occurring so close together is very unlikely.
Separately, this is a huge blow for Instagram/Facebook since this is one of the busiest social media sharing weeks in the US. I bet a lot of their engineers are answering pagers on vacation.
In my experience, people jostle to take part in a rotating on-call roster where they are sufficiently compensated, in my case $1700/week, to occasionally answer one or two calls.
People who make defense policy have been assuming the answer to that question was "yes" for about 20 years now. It's sometimes described as "cyber precedes kinetic."
Ok, so unfriendly sovereign nations are trying to “hurt” us by making social media inaccessible? I’m half joking here, but perhaps we should all be out, about, and enjoying Independence Day in person rather than through our phone screens.
(Yes I realize WhatsApp is a different segment, an important way people communicate.)
All three apps are huge for communication. It doesn't seem far fetched that a nation would test its capabilities taking down different forms of communication.
Is there an official HN stance on this? There have been times I comment, with frustration, that a statement/positions can be hard to grok for those who don't have an automatically-American context in mind (you know, because they might not live in America) the downvotes come swiftly and snarky comments about "this is an American messageboard" follow.
So I'm actually genuinely curious what YC/HN's official position on this is. Asking broadly.
There was already a US-centric context via the parent comment:
It was a link to a RAND report warning the US Military of cyber-attacks in the US, the report starting off as:
"The chances are growing that the United States will find itself in a crisis in cyberspace, with the escalation of tensions associated with a major cyberattack, suspicions that one has taken place, or fears that it might do so soon."
The comment I was replying to was literally a link to a RAND report warning of such things in the US, the reporting starting off as:
"The chances are growing that the United States will find itself in a crisis in cyberspace, with the escalation of tensions associated with a major cyberattack, suspicions that one has taken place, or fears that it might do so soon."
If that's the case, then they should have make that critique directly rather than beat around it with questions that they didn't actually need answered.
If I'm not mistaken virtually all of the ones in the list above were caused by networking problems. Given how much Chinese parts are in those interfaces, it does make you wonder.
It could also be just a really strange coincidence.
So, people like to slam on GCP but not Azure. Didn’t make front page of HN but GCP wasn’t even an outage just increased latencies due to rerouting and still make front HN?
Honestly, I think it's much simpler than that -- the last decade has been a general trend of building more centralized, less resilient systems. The global internet is in an inherently unstable state because of this (i.e. when problems occur, the internet is no longer able to route around them because the problems are internal to large organizations which both control huge portions of the internet and which aren't internally incentivized to separate components -- see e.g. the AWS status images living on S3). These outages are simply more and more frequent as the system's instability increases. And they will continue to become more frequent as time goes on, until organizations realize that they can't control their own uptime unless they run their own infrastructure, at which point the pendulum will start swinging back the other way. Rinse, repeat.
This is a fallacy all humans experience, however, what is the probability specifically that all these problems are unrelated? At what probability is it mathematically impossible that these problems are unrelated?
"Impossible" means "impossible", not "unbelievably improbable".
That's even more true when you ask about "mathematically impossible", as it reinforces the idea of formal logic being the relevant domain, where precise meaning of words is fundamental.
If you adjust your question to be "at what probability is it unreasonable to claim these problems are unrelated?", then the answer is subjective - different people have different standards for reasonability.
I think we'd need mountains more data than we have about the incidents to compute a meaningful probability, anyway.
In my experience this is so true! The opposite applies as well,for some reason everything behaves quietly all at once and I turn up every leav and look under every carpet for a "problem" with the monitoring or something worse.
Probability is funny, for example I don't think anyone has explained exactly why numbers that start with '1' are so common. I think it's just geometric symmetry if complex event chains.
Could it be a physical infrastructure problem? There are occasional issues with fiber cuts that cause outages. I'd imagine many companies rely on such a critical piece of infrastructure.
I first noticed Instagram then realized it was Facebook and their other properties such as What's App as well. Is very odd all of these major outages this week. By Cloudflare at least said it was an employee error, so assuming it's not subtle attacks by other countries unless companies like Cloudflare are required to legally provide an alternate story for national security.
Feels like every day there's another one of these outages. Shame that everything is so centralized that there's a single point of failure that affects 3 of the biggest communication platforms around.
All those outages are strange, I wasn't able to access my personal server for few hours today, I was not even capable to ping ip). When I was able to reconnect, everything seem to be ok. Maybe there is some routing/BGP issues/attack (again). Sadly, I didn't try to traceroute the serveur IP.
Second this, with my DigitalOcean VPS's. DigitalOcean posted an incident report yesterday. They say "global networking issues" were caused by a "major provider", but unfortunately there isn't much more detail than that [1].
Not only Facebook, but Instagram (which shares common infrastructure) has also been mostly down (read-only mode it seems). The most popular Twitter hashtag right now is #instagramdown
Very good point. I wonder if the recent outages on other well known services could be heavily influenced by a similar phenomenon.
If this holds water, it would be interesting to have an article or study around this issue. I certainly would be interested in reading it.
One interesting thing I've noticed from this (not sure if this is just my experience or if others noticed as well) but none of the ads seem to be impacted by this. Certainly makes me wonder if those are on a more prioritized and entirely separate infra and SLA that is designed to be more resilient and highly available.
It's a lot easier to serve ads than your custom photos. The ads are more generic, easier to cache globally, and show to a lot of people. They can also fall back to super generic ads if the database is unavailable to customize them.
Fallback is the main reason. Some webservices even fallback to non-moneymaking ads for charities in the case of a technical fault, because shareholders react badly to an ads outage, but they don't notice an hour or two of charity ads...
Our local transit system installed video boards showing arrival times for the next trains. It was paid for and operated by an advertising company who in return were able to run ads on their video boards. The functionality for arrival times rarely worked but the displaying of ads never failed. It was obvious where the company's priorities were at.
> Certainly makes me wonder if those are on a more prioritized and entirely separate infra and SLA that is designed to be more resilient and highly available.
There's absolutely no doubt that the ads are served differently. Think about it this way:
- Ads: Small number, each served to many people, funded with real cash, but you can't get the cash if you don't serve the ads.
- Photos: Large number, each served to few people, funded by money left over from ads, and if you don't serve them your (non-paying) customers get frustrated.
My guess is that ads is funded well enough that they can run at lower resource utilization, and much more effort is made to run photos at high resource utilization. That's how I'd run it.
Ads:. Conduct an auction in milliseconds between hundreds of parties between hundreds of millions of ad creatives with lots of very large in-memory machine learning systems, all to decide which ad to serve to maximize revenue.
In systems like this, you tend to want to maximise some other outcome than revenue (like clicks or conversions). In general, both GOOG and FB make money out of DR advertisers who care about this, so you tend to make more money if you optimise for the thing they care about (revenue goes up as a second-order effect).
1) Cryptocurrency ops are so vastly different from running a social website that I can't even think of any overlap.
2) I hate Facebook as a company, but as a builder/scaler of web apps for many years, I'm continually blown away by the speed and reliability of their website. Their operations are mind-blowing.
The only comparable apps (in terms of scale) are Gmail and YouTube, and Gmail is simpler in certain key areas (e.g. mail delivery isn't millisecond-sensitive for a user).
Cryptocurrency ops are so vastly different from running a social website that I can't even think of any overlap
I know nothing about how cryptocurrency works, but wouldn't social media outage sources like multiple server failures, hurricane, tornado, sliced fiber line, etc... affect the kind of cryptocurrency that Facebook is embarking on?
Or is there something in the "distributed" nature of cryptocurrency that makes it more resilient? Is Facebook using that model, too?
And Google Search. That's actually treated even more specially (compared to GMail and YT) inside Google and it has amazing reliability.
It would be interesting to know more details. I bet results are sometimes incomplete, but somehow Google manages to keep the system correct enough nobody notices a sudden quality drop.
I also understood Search has some special QoS at all levels: from the network to the scheduling of jobs...
I don’t have extensive knowledge of how crypto works, but for 1), most, if not all large-scale applications like FB operate with distributed systems somewhere in the chain (ex. image and video processing), which is the core of how ledgers work, and how mining works.
Their website tech is impressive, I’ll give them that. I still wouldn’t trust them with my money.
How culture changes perspectives! Over here in Europe I was thinking "doesn't sound so bad". Then I realised over in the US, gas means petrol, and without petrol America doesn't work.
I'll reply to any replies after I've taken the electric train home.
You're not being fair to Europe's very impressive contributions. After all, European companies lead the way in the production of rolling pollution machines, aka ICE vehicles.
Besides, the US has cleaner air than most of Europe, thanks to the intense diesel particulate pollution across European cities. Who can forget the infamous Paris smog photos from a few years ago (which is still an ongoing problem)?
Don't worry though, the US invented the modern electric car, twice, in the form of GM's EV1 and Tesla. We'll lead the European automakers out of the dark ages - underway right now as they all chase Tesla - and take care of the problem of the ICE automobile that Europe was heavily responsible for. In the process cities like Paris will be smog free again, finally.
I would say cash is used much less often in Canada than it used to be and most people just tap or swipe their Interac debit cards. Outages happen from time to time, usually during the mad Christmas shopping season, but life goes on.
It's really only gone down for short periods of time here- maybe a couple of hours at most, but it's usually just limited to one of the major banks or retailers (like all the Walmarts in town).
In Canada though? Because in Canada it is definitely rare to see cash now except in boomers. Even a lot of places don't accept cash like gyms for even purchases like drinks or towels.
If you frequent small (generally Asian) restaurants, it's quite common to see signs saying they only accept cash in Vancouver. Otherwise, if they take credit, they may attempt to incentivize paying with cash via discounts.
Same, here in Toronto. My favourite Asian fruit market only takes cash, and a lot of the corner stores have a minimum purchase for cards.
But any business with any sort of margin (as in, I’m not buying grapes for 99¢/pound) takes contactless payments. My gym would look at me like I’m crazy if I tried to pay with cash, and Billy Bishop airport has been cash-free for years.
Customer protection- credit cards allow you to reverse charges on defective or fraudulent purchases if you are unable to find recourse with the vendor. They usually side with the cardholder as long as a basic paper trail of a good faith attempt to resolve the issue with the vendor is included.
Also, credit cards often include insurance on purchases. For example, one of my cards will reimburse my hotel/car rental/plane tickets in certain cases of trip interruption if I purchase those items on that card.
Ignoring the obvious one that you can pay for things you don't currently have enough money for (bad idea in general), I pay for almost everything on a visa card for travel points.
The revenue service is unaware of cash payments/purchases made by private parties and so cannot tax such transactions. Unless those private parties volunteer the information.
This is basically the point of decentralized systems. Foor bitcoin to work it relies by protocol on whoever is online, so effectively everyone using it would have to go offline for it to fail. Additionally, bitcoin really doesn't get updates that would dramatically change how things function enough for their to be an error introduced that would take things down. In comparison, FB and friends rely on a handful of servers that typically can be taken out by one poorly planned software upgrade.
This is a gross simplification, but the basic idea is that more servers and diversity of the servers (which you get from decentralised protocols) should lead to crashes not taking out the whole network.
They have one wallet. But there is no reason for a wallet to depend on facebook being online, and presumably there will be 100 points of failure for the ledger on launch
Nope, that's the beauty of decentralized network, it never goes offline while there at least some clients connected. You could disrupt the network with other ways though.
(quote from a tv show where the anti-hero destroys an intergalactic empire by setting the value of all money from 1 to 0; in the scene is the alien "white house" having a discussion what to do without money)
I wonder what kind of economic damage all these recent outages add up to. How much harm are they causing, in terms of productivity and value. Facebook's social networks I'd imagine are less impactful than the slack outages, which could completely cripple a company if they were primarily remote.
There was an old joke from I believe Jerry Seinfeld which came out around the time the Y2K scare was in full force.
I am probably going to butcher it as I am no comedian but it went something like:
>All these people are freaking out about this whole Y2K thing. What are we going to do when the internet and computers all stop working?! I always respond with, "I know it will be horrible, it will be like living in the 80s again"
Now clearly I realize if all computers went down there would be real world consequences and issues, however I still like the joke.
Some popular images seem to load for me, but longer tail stuff doesn't. Maybe they accidentally flushed their edge cache and then their origin couldn't handle the load.
Could they be in your own browser's cache? Or maybe in your ISP's cache?
Considering the type of content Facebook, WhatsApp, and Instagram host, I'd imagine the cache hit rates to be pretty low for media. They'd be effective for JavaScript/CSS though, which might also be served from the same system
No, I checked if it was my own cache, and it's all HTTPS so it couldn't have been my ISP's cache either.
I compared a few celebrity's instagrams with accounts of friends with many fewer followers. Celeb's images loaded but less and less as you scroll back through time, friends images all 503'd. Still seems to be the case.
They're implementing their personal data harvesting tools into WhatsApp and the sheer amount of content being pulled from their servers was too much to handle and it crashed
I wonder if there's a large enough market for a "decentralized cloud" along the lines of IPFS but for compute. And, yknow, actually working decently. Something where you could pay compute/storage operators for resources and also be a server host and share your excess compute power.
This is effectively the idea behind the Ethereum blockchain model. You can make "smart contracts" (think APIs with persistent state) using languages like Solidity[1] and deploy them onto the blockchain. After that, you can invoke individual functions by paying "gas" (small amounts of ETH) that goes to the node operator's account. Smart contracts also get their own blockchain addresses, so they're capable of sending and receiving transactions. Meaning you can build financial applications with little to no barriers to entry! (whether this is a good thing or not, I will leave up to the reader).
The really cool thing about it is that nobody owns a smart contract once it's deployed. You can't edit the code or even delete the contract itself. It's a truly autonomous entity that will continue to operate the same way forever (unless there's a 51% attack or something of that nature).
The obvious benefit for this is that you can mathematically ensure trust. For example, if you hosted a lottery app on a LAMP stack, you could steal the money, hackers could get into your server, your database could get corrupted, etc. On the blockchain, nobody can access your lottery funds or business logic, not even yourself, meaning that as long as you developed the application correctly (to be fair, that is a big assumption), it is truly fair.
I always wondered about this, so how do projects like CryptoKitties profit? Or do they give themselves the seed cryptocollectibles and profit off of those?
It needs to download it from the CDN first. The actual implementation is your media are stored encrypted on a CDN for a certain amount of time then you receive the keys as an encrypted message.
I think Zuck himself said recently they're moving to encrypt everything and merge infrastructures for single login across properties (which a lot of us think is them sneakily trying to make their web properties a lot harder to split up in an anti-trust case).
But in some countries many small to medium businesses use WhatsApp extensively for communication with their customers. For many people it's as bad, arguably worse, than their main ISP going down.
These "attack maps" tend to just be a map of population centers cross-referenced against where it's daytime. They're worthless for actually assessing anything about potential threats.