What just happened? NASA is the world's foremost space research agency and has access to the latest and greatest technology that most organizations can only dream of. However, an audit conducted by the NASA Office of Inspector General has found that the agency's outdated and overburdened supercomputers are creating massive infrastructure bottlenecks, leading to severe mission delays.

In a scathing report (via The Register), the OIG stated that NASA's high-end computing (HEC) technologies need a complete overhaul if it wants to compete with space research programs of other nations and retain its leadership position. Without massive changes, the agency's supercomputing resources will "likely constrain future mission priorities and goals."

Describing the agency's HEC resources as "oversubscribed and overburdened," the report claimed that Mission Directorates are requesting more computing time than existing capacity can provide, often leading to schedule delays.

The situation is so dire that various NASA teams are having to use parts of their allocated budget to purchase their own HEC resources to meet deadlines. As an example, the report highlighted that the Space Launch System team invests about $250,000 annually to purchase and manage their own HEC systems instead of waiting for existing HEC resource availability. According to the OIG, almost all NASA centers are using their own HEC systems except for Goddard Space Flight Center and Stennis Space Center.

The audit also highlighted that NASA is not keeping up with modern supercomputing trends, in part due to organizational and funding constraints. For example, NASA's Advanced Supercomputing facility has just 48 GPUs alongside 18,000 CPUs, while the HEC systems at the NASA Center for Climate Simulation is even more CPU-heavy. The inability to modernize the systems is said to be due to multiple factors such as "supply chain concerns, modern computing language (coding) requirements, and the scarcity of qualified personnel needed to implement the new technologies."

As of June 2023, NASA had five supercomputers at the NASA Advanced Supercomputing (NAS) facility in Ames, California, and the NASA Center for Climate Simulation (NCCS) in Goddard, Maryland. The list includes Endeavor (154.8 TFLOPS), Aitken (13.12 PFLOPS), Electra (8.32 PFLOPS), Discover (8.1 PFLOPS), and Pleiades (7.09 PFLOPS).

The report is critical of NASA for lacking "a comprehensive strategy for when to use HEC assets on the premises versus when to utilize cloud computing options." The haphazard HEC management is also a clear and present cybersecurity threat, and one that needs to be addressed as soon as possible.

To mitigate the challenges, the OIG is recommending that NASA appoint "executive leadership to determine the appropriate definition, scope, ownership, organizational placement, and structure for NASA's HEC." In addition, the report says that the agency should establish "a tiger team to collaborate and strategize on HEC issues," including identifying and plugging mission-critical technology gaps.

The agency is also encouraged to develop a concrete strategy to improve prioritization and allocation of HEC assets, mitigate cybersecurity concerns, and address various other issues that are holding it back from attaining its full potential.