A bug in firewall systems took Cloudflare customers offline

Cal Jeffrey

Posts: 4,191   +1,430
Staff member
In brief: Cloudflare lost several network services this morning in what was at first thought to be a DDoS attack. However, after investigating it was determined that the outage was caused by a bug in the company's firewall software. The problem was fixed in about 30 minutes.

AT&T was not the only service provider to suffer an outage Tuesday morning. Several websites relying on Cloudflare servers were knocked offline as well.

According to Gizmodo, Cloudflare CEO Matthew Prince said that a glitch in the company’s firewall software caused “a massive spike in CPU usage.” The result was a failure in global systems that lasted for about 30 minutes. The outage affected all company services, including websites reliant on Cloudflare’s network.

“We’ve never seen an outage like this before,” said Price. At first, technicians thought it might have been a directed-denial-of-service attack (DDoS), but later discovered the firewall bug.

"While it would be convienent for this to be a nation state or another attacker, this one was our fault."

Europe and the East Coast of the United States were the most affected by the outage since it occurred during business hours in those regions. A Cloudflare operations center in London was the first to notice the CPU spike. They immediately thought it was an attack since the firewall is designed to scale up to mitigate such situations. However, after investigating they found no traffic or evidence to indicate it was malicious.

Ironically, of the affected websites, DownDetector, a site used by millions to check the status of websites and other online services, including Cloudflare, was knocked out. So users couldn’t even check with DownDetector to see why the host was down. Cloudflare actually has its own service outage website.

The company takes full responsibility and apologizes for the interruption.

“One of Cloudflare’s core policies is radical transparency,” Prince told Gizmodo in a phone call. “Today’s outage was 100 percent in our control and 100 percent our responsibility. We’re reaching out to all our customers to honor our responsibilities to them. It’s important for people to know it’s a mistake on our part. While it would be convenient for this to be a nation-state or another attacker, this one was our fault.”

Permalink to story.

 
Back