Scaling transformer neural networks for
skillful and reliable medium-range weather forecasting

Tung Nguyen

Rohan Shah

Hritik Bansal

Troy Arcomano

Sandeep Madireddy

Romit Maulik

Rao Kotamarthi

Ian
Foster

Aditya Grover

[Paper]

[GitHub]

[Demo]

Preprint

Illustration of an example 5-day forecast of near-surface wind speed (color-fill) and mean sea level pressure (contours). On December 31, 2020, an extratropical cyclone impacted Alaska setting a new North Pacific low-pressure record. Here, we evaluate the ability of Stormer to predict this record-breaking event 5 days in advance. Using initial conditions from 0000 UTC, 26 December 2011, Stormer was able to successfully forecast both the location and strength of this extreme event with great accuracy.

Abstract

Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the- art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. On WeatherBench 2, Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days, while requiring orders-of-magnitude less training data and compute. Additionally, we demonstrate Stormer’s favorable scaling properties, showing consistent improvements in forecast accuracy with increases in model size and training tokens. Code and checkpoints will be made available.

Reliable Weather Forecasts Via Randomized Iterative Forecasting

The key innovation of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals of 6, 12, and 24 hours. The 6 and 12-hour values help to encourage the model to learn and resolve the diurnal cycle (day-night cycle), one of the most important oscillations in the atmosphere driving short-term dynamics, while the 24-hour value filters the effects of the diurnal cycle and allows the model to learn longer, synoptic-scale dynamics. This objective enables a single model, once trained, to generate various forecasts for a specified lead time 𝑇 and combine them to obtain better forecast accuracy.

Model Architecture

Stormer architecture consists of two components, a weather-specific embedding and the Stormer backbone. The weather-specific embedding module embeds the input 𝑋₀ ∈ R^{𝑉×𝐻×W} to a sequence of (H/p) × (W/p) tokens, while modeling the non-linear interactions between climate variables in the input. The tokens are fed to the Stormer backbone together with the time interval embedding 𝛿𝑡, and finally go through a linear and reshape module to produce the dynamics forecast. Each Stormer block employs adaptive layer normalization to condition on 𝛿𝑡.

Empirical Highlights

We evaluate the performance of Stormer on Weatherbench 2 using RMSE (top) and ACC (bottom) metrics. For short-range, 1–5 day forecasts, Stormer accuracy is on par with Pangu-Weather and GraphCast. At longer lead times, Stormer excels, consistently outperforming both baseline methods from day 6 onwards by a large margin. Moreover, the performance gap increases as we increase the lead time. At 14 day forecasts, Stormer performs better than GraphCast by 10% − 20% across all 9 key variables. Stormer is also the only model in this comparison that performs better than Climatology at long lead times, while other methods approach or even do worse than this simple baseline.

Comparison with the deep learning baselines on Root Mean Squared Error (RMSE) metric. Lower RMSE is better.

Comparison with the deep learning baselines on Anomaly Correlation Coefficient (ACC) metric. Higher ACC is better.

BibTeX

Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Sandeep Madireddy, Romit Maulik, Rao Kotamarthi, Ian Foster, Aditya Grover
Scaling transformer neural networks for
skillful and reliable medium-range weather forecasting
Preprint, 2023.

If you find our work useful, please consider citing us using the following BibTeX:

[BibTeX]

This website was made using a template which can be found here.