Two Singapore banks were unable to process transactions because of an overheating data...

Alfonso Maruccia

Posts: 1,025   +301
Staff
In context: Located about one degree of latitude (137 km) north of the equator, the city-state of Singapore has no distinctive seasons, uniform temperature and pressure, and high humidity throughout the year. If something goes wrong during a system upgrade, the tropical climate can pose a challenge to the data centers operating on the island.

On October 14, DBS and Citibank suffered an IT outage that affected millions of payment transactions in Singapore. Banking apps were down, servers could not be reached, and the two banks' customers were left with very few means to pay for their purchases or receive payments. The city-state is heavily relying on digital banking systems, an approach that government authorities are now considering from a different, more cautious viewpoint.

The October outage resulted in full or partial unavailability of the online banking services provided by DBS and Citibank, Minister Alvin Tan confirmed during a parliamentary Q&A session. The root cause of the issue was later identified in a non-functioning cooling system at the Equinix data center used by both banks, which made the server temperatures rise above optimal operating conditions.

The outage led to 810,000 failed access attempts, Tan said this Monday, with 2.5 million unsuccessful payment and ATM transactions. According to Equinix, the overheating issue was caused by a contractor that sent an incorrect signal to "close the valves from the chilled water buffer tanks" during a planned system upgrade.

DBS and Citibank had some backup plans prepared for this kind of situation, but those turned out to be absolutely worthless. DBS was unable to reach its backup data center because of a "network misconfiguration," Singapore's government said, while Citibank had some unspecified connectivity issues.

The two financial institutions didn't comply with the requirements from the Monetary Authority of Singapore (MAS) related to resilience of critical IT systems. MAS dictates that unscheduled downtime for critical banking systems should not exceed four hours within a 12-month period, and the October issue clearly went beyond that limit.

According to Kevin Reed, Chief information security officer for Singapore-based backup company Acronis, Equinix should have had a redundant cooling system for its servers. As is often the case, Reed remarked, an incident is not a single issue, but "a chain of interconnected events" as the DBS and Citibank case clearly demonstrates.

Minister Tan also had some remarks about the "digital first" approach within Singapore financial market, which shouldn't be a "digital only" affair anyway. Consumers and businesses should be aware of the risks related to paperless money, and companies should of course provide alternative payment options for when the servers and apps are unavailable.

Permalink to story.

 
Sounds more like they have expanded their equipment and services without upgrading their HVAC system. This happens frequently with data centers that don't have or don't listen to their chief engineer's advice on cooling system upgrades .....
 
🤔 I wonder if they are trying to run any sort of AI in their data center?
Another warning against only-digital money. I've said it before and I'll say it again: digital assets are not real assets.
Real money is an illusion of data these days. If you were to go to a bank and you wanted to take out all your money, the bank might not have that cash on hand. After all, money is literally bits and bytes in a computer system these days.
 
🤔 I wonder if they are trying to run any sort of AI in their data center?

Real money is an illusion of data these days. If you were to go to a bank and you wanted to take out all your money, the bank might not have that cash on hand. After all, money is literally bits and bytes in a computer system these days.
All you are doing is proving his argument correct. Real money is only an "illusion" because of an overreliance on digital systems.
 
As someone who worked with some Singapore banks on testing their business continuity plans, holy ****, I'm glad neither of those two were our clients, or that would have been pretty embarrassing.
 
Last edited:
Sounds more like they have expanded their equipment and services without upgrading their HVAC system. This happens frequently with data centers that don't have or don't listen to their chief engineer's advice on cooling system upgrades .....
Unlikely. Our* government and corporates are tightly intertwined in 2023, and the cost-cutting as well as outsourcing (to companies of dubious standards that offer the lowest bids) are beginning to rear their ugly heads. Read up a bit about our current labour policies, and where many of our IT engineers are coming from, especially in the case of DBS.

*Yes I'm from Singapore
 
"unscheduled downtime for critical banking systems should not exceed four hours within a 12-month period, and the October issue clearly went beyond that limit."

Nowhere in the article does it state the duration of the outage, so how it 'clearly went beyond that limit' is...left to our imagination? Likewise, the statement made is regarding _unscheduled downtime_, however the article states that this failure occurred during a _planned_ system upgrade. One can argue the semantics that if a planned system upgrade goes sideways and breaks things, then it implicitly becomes unscheduled downtime.

The failure of the backup networking however is the real failure here. An untested backup plan is not a backup plan, it's a 'well we hope it will'.
 
Back