The cause of the Amazon Web Services failure that resulted in parts of the internet failing?...

midian182

Posts: 9,665   +121
Staff member

You probably noticed a large number of website and internet-connected services went offline earlier this week, the result of an outage from Amazon Web Services' (AWS) S3 section. Now, the online retail giant and cloud service provider has provided an explanation as to how it happened. The short answer: a simple typo.

Amazon apologized for the disruption on the AWS services page. It writes that a Simple Storage Service (S3) engineer was debugging an issue causing the S3 billing service to run slowly. They "executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process."

“Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems. One of these subsystems, the index subsystem, manages the metadata and location information of all S3 objects in the region.”

While the S3 subsystems are designed to work even when a number of them fail, Amazon hadn’t restarted the indexing and placement parts for many years. Additionally, S3 has experienced huge growth over the last several years, both of which meant "the process of restarting these services and running the necessary safety checks to validate the integrity of the metadata took longer than expected.” Four hours and 17 minutes, to be exact.

Sites and services that rely on Amazon’s Northern Virginia data center region were affected by the outage. It’s estimated that up to 100,000 websites were down because of the error, including Business Insider and Medium. Automation service ifttt, Trello, and websites created with Wix were also taken offline.

The company said it had added safeguards to prevent the same thing happening in the future. “We will do everything we can to learn from this event and use it to improve our availability even further," Amazon wrote.

Permalink to story.

 
I guess someone is being relocated to a nearby McDonald's or Burger King for his employment...

Can you imagine the razzing that person got from the people they work with? Never mind the screaming and probably royal ***-kicking from the bosses.

Whoever it is, is probably submitting resumes to maintain the self-serv kiosks in those MickeyD's and Burger Kings.
 
I guess someone is being relocated to a nearby McDonald's or Burger King for his employment...

Can you imagine the razzing that person got from the people they work with? Never mind the screaming and probably royal ***-kicking from the bosses.

Whoever it is, is probably submitting resumes to maintain the self-serv kiosks in those MickeyD's and Burger Kings.
I actually have difficulties imagining that - after all it is supposed to be the civilized world, not China.
 
I actually have difficulties imagining that - after all it is supposed to be the civilized world, not China.

Yes, this is what happens in the real world when you screw up on the job. Especially when your goof knocks out a chunk of the Internet and costs millions in lost revenue for customers. This isn't the fairytale world of academia and SJW's where all snowflakes get a participation trophy just for showing up.
 
Yes, this is what happens in the real world when you screw up on the job. Especially when your goof knocks out a chunk of the Internet and costs millions in lost revenue for customers. This isn't the fairytale world of academia and SJW's where all snowflakes get a participation trophy just for showing up.
Any half decent HR department will tell you that keeping a climate of fear is counterproductive in a company. Besides that was a mistake which could have happened sooner or later. I still don't see why you need to go the Chinese way on the poor fellow ...
 
I guess someone is being relocated to a nearby McDonald's or Burger King for his employment...

Can you imagine the razzing that person got from the people they work with? Never mind the screaming and probably royal ***-kicking from the bosses.

Whoever it is, is probably submitting resumes to maintain the self-serv kiosks in those MickeyD's and Burger Kings.

Razzing? It's amazon. For a screw up on this scale I don't see them accepting anything less than ritual disembowelment.
 
Any half decent HR department will tell you that keeping a climate of fear is counterproductive in a company. Besides that was a mistake which could have happened sooner or later. I still don't see why you need to go the Chinese way on the poor fellow ...
Climate of fear? No one's talking about something like that and it doesn't apply here. We're talking about what happens in the real world when you screw up this big. You're held accountable for what you do. You don't get a hug and a safe space. When you get out in the world and have a few jobs, you'll understand.
 
Back