Azure(ms cloud) reliability is lagging the competition


Posts: 13,947   +1,770 article by Tom Krazit on May 7, 2019 posted an interesting analysis of major Cloud Services reliability -- the Microsoft Azure is trailing the pack...

This graphic ( will synopsize the problem. “Azure has had significant downtime, not just in 2018, but even the first three months of 2019 have been not good for Microsoft,” said Raj Bala, an analyst with Gartner who compiled the data.

last week when a routine DNS migration went haywire, disconnecting Azure services from customers and causing a major outage that lasted several hours and took out essential Microsoft services like Office 365 and Xbox Live, as well as websites...

The discovery of the Meltdown and Spectre chip bugs in 2017 forced all cloud providers to update their services in January 2018 with software mitigations that isolated cloud customers from those bugs, but Microsoft had to reboot everyone’s servers to put those changes into effect, and that takes time. But AWS and Google also needed to update their servers to add the patches for Spectre and Meltdown, and it didn’t appear to have as much of an impact on their service uptime. Google likes to tout its live migration capabilities that can update servers with no disruption to customer workloads

in September 2018, a lightning strike at a data center in its South Central U.S. region caused some cooling systems to fail, damaging servers and knocking out some services for more than 24 hours as engineers worked to preserve customer data and replace the damaged systems.

{more generally}
Operating cloud computing services at scale is really one of the more amazing things human beings have accomplished; the complexity involved is hard to appreciate without a fair amount of knowledge about how these systems work. And even if Microsoft lags AWS and Google in reliability scoring, unless your company is blessed with world-class operations talent, Microsoft is likely still better at operating data centers than most companies managing their own servers.
Last edited: