In brief: Amazon’s recent Prime Day was the biggest global shopping event in the company’s history, but it could have been very different. The site suffered numerous crashes and glitches on Monday, and now we know why: its servers couldn’t cope with the sheer amount of traffic.
CNBC reports that it obtained internal documents showing the issues faced by Amazon engineers on Prime Day. Shortly after the event launched at noon PT Monday, a surge of traffic started causing errors that affected product searches and shopping carts and led to instances of crashes.
The company was forced to launch a stripped-down version of the Amazon homepage and block international traffic as it tried to deal with the issues. It also had to add servers manually to keep up with traffic demand, which suggests something went wrong with its auto-scaling capabilities. An hour after the start of Prime Day, one of the Amazon server’s updates read: "Currently out of capacity for scaling. Looking at scavenging hardware."
A breakdown in Sable, an internally developed piece of technology used for computational and storage services across Amazon's businesses, caused problems across Prime, authentication and video playback. There were even issues with Alexa, Prime Now and Twitch, and some warehouses were temporarily unable to scan or pack products.
At one point, the crisis saw 300 people taking part in an emergency conference call, but things returned to normal as more servers were added. As noted by GeekWire, with the CNBC report failing to mention anything about Amazon Web Services, it appears that configuration errors on Amazon's end were behind the limited server capacity.
Amazon's CEO of worldwide retail, Jeff Wilke, said in an internal email that his team was “disappointed” about the issues and the company was making sure they don’t happen again.