Facebook, WhatsApp, Messenger, and Instagram went down for six hours last week. Here’s why, and what you should do next
On Monday October 4, Facebook, WhatsApp, Messenger, Instagram went totally dark for what felt like a pretty long time. It suddenly became impossible to talk to anyone, spread misinformation, or post selfies. We all realized that we don’t have anyone’s phone number anymore. And a lot of people who depend on these services for their businesses realized they were going to lose out too. It’s worth asking why we’re so dependent on Facebook’s services, and why they broke, as well as what our next moves should be.
Here at VPNadviser, we have a complicated relationship with Facebook. Sometimes we’re excoriating them as a dishonest, dangerous corporate behemoth that’s turning the internet into everything it used to be a refuge from — and turning your weird uncle into a Nazi at the same time. Sometimes we’re raiding r/totallynotrobots for memes of Mark Zuckerburg pretending to be human.
Like I said, complicated.
Facebook is just a website. How could it break?
What’s not that complicated is keeping Facebook operational. They may have plans for world domination (all the depositions aren’t in yet, so let’s not be definite) and they certainly have some extremely competent developers working for them. But they don’t have a whole new technology or something. They’re a website, websites are just a program on someone else’s computer, so if the computer’s running and the program’s OK, you should be fine. If the program goes wrong, you revert it — take it back to how it was before you made whatever change made it go crazy. If the computer goes wrong, you turn it off and back on again.
Websites: virtually programs, but on the phone
That may sound flippant, but this gets relevant later so let’s clarify. The ‘other people’s computers’ the internet runs on are big, high-capacity ‘headless’ (no normal interface) computers called servers. Looking after servers used to require quite serious technical coding skills; they use unusual, server-specific operating systems (most use Linux) and debugging servers used to be a major part of keeping them operational. Now, servers are virtualized. If you’ve ever used WINE or VirtualBox or any other home virtualization app you’ll have some idea what this means. A program runs inside your operating system: all the other programs run inside that.
Suppose you ran MS Office, inside Windows Vista, inside WINE on a Mac. When you do something in the program, like change font size, the program has to give instructions to the electronics that make up your actual computer. That’s where all the work really gets done.
Normally it goes program>OS>computer; virtualization means it goes program>OS>virtual computer>OS>real computer. Virtualization eats computational resources. But it puts everything inside the virtualization in a sandbox. If it gets corrupted, infected or hacked, you can just wink it out of existence by killing the virtualization program.
Have you tried turning it off, and then back on again?
When a server goes wrong now, you just turn it off forever and spin up a new virtualized server, in the time it takes to type a couple of commands. Virtualized servers have other virtualized servers as ‘failovers’ for when the load exceeds the server’s capacity, so there’s never a time when the computer that runs the internet has to be turned off. All the tech we use all the time is doing this all the time behind the scenes; no harm done.
That’s assuming business as usual, which as we shall see is a) a bold assumption and b) the problem.
Facebook and the empire’s other provinces fell because Facebook built a complicated system with a single point of failure and then lost the key. Here’s how the story plays out.
It’s a vicious circle: you have to get updates to stop your computer breaking, but then the updates break your computer. It’s like both halves of a Russian reversal at once, which come to think of it is also a pretty good analogy for Facebook: you can always find the party, but it turns out the Party can always find you too.
Facebook performed a company-wide BGP, or Border Gateway Protocol, update. Facebook is a big company and like big companies always do when they’re allowed to, it’s vertically integrated: it owns its own supply and distribution chains, which in this case means its own servers. Among other things, Facebook is a telecoms company now and has been for a while. Facebook the website is hosted on Facebook servers at Facebook headquarters.
That matters, because it makes Facebook the equivalent of a country on the internet. The internet works because everyone agrees: I can send my data through your computer, you can send yours through mine. In P2P networks that’s literally what’s happening, but the actual web uses servers because home computers are comparatively feeble and, in theory anyway, we periodically turn them off. Border Gateways are like maps of the big roads at a country’s borders, and they sometimes change.
When that happens to a country’s roads it causes massive snarl-ups, supply chain catastrophes that ripple out to affect neighbours, and problems to the end user trying to do anything from buy a burger to refill a car. When it happens to the internet, the digital equivalent is data getting lost or rerouted in ways that make it much slower and less efficient. So digital countries like telecoms companies periodically upgrade their BGPs. And sometimes when you do an upgrade, everything crashes.
Facebook crashed everything on its domain when it blew its BGP update. Other entities on the internet now had no working map to allow them to locate the borders of the Facebook empire, so typing facebook.com into a browser returned a 404 not found error and sending a WhatsApp message just didn’t work. No biggie, though, we’ll just log in remotely to the servers and type in a couple of commands, right?
Put all your eggs in one basket, and then destroy your ability to digitally connect to that basket
Yeah, so, about that. Facebook personnel weren’t able to access the company’s building to deal with the problem, because their doors are all electronic. Guess where the computer program that manages the locks on those doors resides? Yep. On the server.
And they couldn’t even tell each other about it. The normal structures of reporting were broken in a similarly elegant and simple way: by having the company’s internal communications rely on Messenger and WhatsApp. Which were hosted on… Yeah. See the problem?
There’s an emergency, last-ditch, I’m-sure-we’ll-never-need-it Facebook communications system, an old-timey IRC chat tool that’s clunky and looks very web <1, but does the job and more importantly is very, very hard to break. In fact, the only way you could really break it would be to… take down all the servers it was hosted on. (Guess where it’s solely hosted?)
It gets worse. When Facebook staff were eventually able to access the building, they couldn’t access their systems to manage the server crisis because Facebook uses cloud computing and their cloud is…
So they had to go down to the physically-secured, multiply fail-safe-locked server room to do the job by hand with a laptop and a LAN cable. Except they couldn’t. Because all those locks are electronic, and managed by a system that is, indeed, hosted on facebook.com.
Class, it’s short story time. Imagine for a second that you’re Bill Johnson. Bill is an industrial contractor, a jack-of-all-physical trades. In his sixties now, he’s mostly working because he’s forgotten how to stop. He still hangs doors, fixes broken steps, does a bit of welding, and tells people with electrical problems to call an actual electrician. (You can see him now, right? Looks like Sam Elliot in old Dickies.) He’s used to making OK money, and he owns all his own tools, including an industrial angle grinder that he bought from Lowe’s in 1996.
Facebook lost $100 million in ad revenue alone over the roughly six hours their services were down. Break that down, and Bill’s angle grinder is worth $277,777.78 per minute.
So here’s the $277,777.78 question:
How much do you charge to come out and fix it?
According to consultant Cullen Dudas, a contact at Facebook admitted that a local contractor did indeed have to come in and cut through the cage; Facebook later refuted this story, and the NYT reporter who confirmed it retracted the confirmation — in a rather creepy, he-had-won-the-victory-over-himself kind of way — but the company did agree that ‘these facilities are designed with high levels of physical and system security in mind. They’re hard to get into,’ suggesting that if they didn’t use the standard tools for getting into things that are physically hard to access, maybe they called Bruce Banner instead of Bill Johnson.
Since whoever took the *cough-anglegrinder-cough* requisite tools to the site will have signed an NDA as gigantic and stone-cold serious as the Constitution, we’ll probably never know.
Enclosing the internet
All of this points to something that’s pretty important, but very dull and unfashionable to talk about. The internet was built specifically to avoid this kind of single-point-of-failure problem. Back in the olden days (the 1970s), the big concern was that well-targeted weapons from the Soviet Union might somehow hit major US cities and communication centers, darkening the whole nation and making both civilian and military communication impossible.
Since destroying the enemy’s communications is a key element of modern battlefield doctrine, that makes sense; also, we were planning to do it to them if we could, so there’s that too. That’s why DARPAnet was developed, as a communication system that worked as a self-adjusting layer over physical substrates. That’s why the internet uses packet switching technology. If a wire was cut or a city destroyed, the internet would simply route around the damage.
Later (not that much later), a whole bunch of nerds and freaks (people used to say AOL stood for Anarchists, Onanists, and Lunatics) figured out that the internet would treat any disruption as damage. In the words of Electronic Frontier Foundation founder John Gilmore:
That’s fine, and the internet is remarkably resilient to traditional censorship and to short-term destruction of chunks of infrastructure. But long-term infrastructure capture is a different matter.
Organizations like Facebook own things the original internet pioneers didn’t foresee people owning. They own their code, and they try their very hardest to own their users, deliberately engineering in high ‘switching costs’ to tilt the cost-benefit analysis in favor of continuing to tolerate their growing list of casual abuses. It’s never been clearer that the company’s everyday users are its product, not its customers.
But Facebook also own their own physical infrastructure. And as we’ve just seen, that’s a big problem for Facebook (even while being a boon to Telegram, Signal and other tools that are like WhatsApp used to be before it got Zucked up). However, it’s also a big problem for us.
The open range of products
What we’ve seen in the Facebook outage isn’t just that even billionaire bad guys can act just like Hong Kong Phooey villains (folks, hackers pwned these guys so fast they didn’t even see themselves do it!). It’s that the internet can’t route around this level of damage. Facebook’s biggest engineering achievement has been to build a single point of failure into the internet and break it — in a much less likeable way than Kim Kardashian.
It’s not just Facebook. Apple’s latest privacy updates locks Facebook out of user data, and Facebook are understandably furious about it; this data is how they make their money. But Apple are walled-gardeners too. In fact, all the big tech companies are. And because they own the regulators that are supposed to stop them, they’re carving up the internet exactly like 19th-century imperialists carving up the globe.
We talked earlier about telecoms companies being a little like countries on the internet. Increasingly, big tech companies are actually like empires, ruling as they please without any input from us and seizing things that used to belong to us, like easy communication and connectivity. They’re offering us ‘security’ and convenience in return for handing over control of our online world, but let’s be real: if we make that deal, we won’t get either.
If you want to have access to a free and open internet where your right to anonymity is respected, and where the network retains its capacity to route around censorship and damage, you won’t get it inside one of the corporate walled gardens. And you’ll have to pay for it.
It will mean using communication apps that don’t track you, spy on you, collude with spies and spooks, and service despots. Using social media that doesn’t treat you as a data source and ad target, doesn’t collude with war crimes and help foment coups. And it means getting a VPN, and using it.
Both Nord and Express have free trials — check them out!
|Try NordVPN||Try Express VPN|