Crystal ball: In its first hurricane season, Google's Deepmind AI framework not only matched decades of human expertise but surpassed the output of two of the world's most advanced supercomputer models. As the National Hurricane Center and global forecasting agencies process final verification data, the question is no longer whether AI can perform meteorological forecasting – but how soon traditional methods will adapt or give way entirely.
As the 2025 Atlantic hurricane season winds down, early evaluations of model performance reveal a shift in forecasting reliability that may redefine meteorology. Google DeepMind's Weather Lab, which began issuing tropical cyclone forecasts in June, has decisively outperformed traditional physics-based models used by national weather agencies.
Forecasters and researchers reviewing preliminary data say the results mark the first serious challenge in decades to the global dominance of numerical weather prediction systems such as the US National Weather Service's Global Forecast System.
University of Miami climate scientist Brian McNoldy analyzed forecast accuracy across 13 named storms this season. His preliminary comparison shows that DeepMind's AI model consistently produced lower average position errors than the United States' GFS at forecast intervals up to 5 days. According to his calculations, DeepMind's track error at 120 hours averaged 165 nautical miles, compared to 360 nautical miles for GFS – more than a twofold difference.
And here is the same, but for all 13 storms in the entire season so far. "OFCL" is the NHC's forecast, and "GDMI" is the Google #DeepMind ensemble mean and is the leading model for both track and intensity this season. [2/3]
– Brian McNoldy (@bmcnoldy.bsky.social) October 31, 2025 at 11:04 AM
[image or embed]
The contrast is especially significant given the technical approaches behind each system. The GFS relies on explicit physical equations that simulate atmospheric motion across three dimensions, running on large-scale NOAA supercomputers with frequent data assimilation cycles.
DeepMind's system, by contrast, is a neural network trained on decades of archived meteorological data, enabling it to infer atmospheric patterns statistically rather than from physical first principles. Its architecture can deliver forecasts in minutes on standard GPU clusters, eliminating the need for massive computational infrastructure.
Michael Lowry, a hurricane researcher and author of the Eye on the Tropics newsletter, notes that DeepMind's near-instantaneous output reflects a fundamental computational advantage: "Models based on neural networks learn from previous forecast errors and adjust their pattern recognition, something physics-based systems simply can't do."
Lowry told Ars Technica that because AI models can retrain quickly using new data, "their improvement curve could be exponential compared to incremental updates to older systems."
DeepMind's model not only surpassed the individual model output from GFS but also outperformed human-generated official forecasts and consensus models such as TVCN and HCCA, which blend multiple model outputs to reduce bias.
If verified by final National Hurricane Center statistics, these results would represent the first time an AI system exceeded both automated and human consensus forecasts in the Atlantic Basin.

The early analysis does not include the European Centre for Medium-Range Weather Forecasts (ECMWF) model, which has long been considered the global benchmark for deterministic forecasts. Historically, ECMWF's tropical cyclone track accuracy has been similar to or slightly better than NHC official forecasts. Still, McNoldy's data suggest it is unlikely to have exceeded the DeepMind model's performance this season.
The implications extend beyond tropical weather. The exceptional performance of DeepMind's system raises questions about the long-term role of traditional numerical weather prediction. Physics-based models like GFS must integrate equations describing fluid dynamics, radiation transfer, and thermodynamics across millions of grid points. This process requires vast computing power and often introduces numerical truncation errors. By contrast, data-based neural models infer underlying dynamics directly from global reanalysis datasets, bypassing explicit equations.
These architectures belong to a class of AI methods known as "deep generative models," capable of learning high-dimensional patterns. DeepMind's framework is believed to use encoder-decoder functions optimized for spatiotemporal prediction, enabling it to handle both track and intensity forecasting within a single network architecture.
During this hurricane season, it also demonstrated reliable performance in estimating maximum wind speeds and pressure fluctuations – tasks that even sophisticated physics-based systems continue to handle inconsistently.
Meanwhile, GFS's performance this year has puzzled meteorologists. Though the model underwent a major upgrade in 2019 with the Finite-Volume Cubed-Sphere (FV3) dynamic core, the transition appears to have produced regression rather than improvement.
The model's persistent biases and track divergences have frustrated operational forecasters, many of whom increasingly disregard its guidance for tropical systems.
Lowry and others suggest that lapses in observational data – possibly linked to federal budget gaps – may have compounded the issue, though that remains speculative. The National Weather Service has not yet released its internal assessment.