|
|
|
|
|
|
|
Foster |
|
|
|
|
![]() |
Illustration of an example 5-day forecast of near-surface wind speed (color-fill) and mean sea level pressure (contours). On December 31, 2020, an extratropical cyclone impacted Alaska setting a new North Pacific low-pressure record. Here, we evaluate the ability of Stormer to predict this record-breaking event 5 days in advance. Using initial conditions from 0000 UTC, 26 December 2011, Stormer was able to successfully forecast both the location and strength of this extreme event with great accuracy. |
Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the- art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. On WeatherBench 2, Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days, while requiring orders-of-magnitude less training data and compute. Additionally, we demonstrate Stormerβs favorable scaling properties, showing consistent improvements in forecast accuracy with increases in model size and training tokens. Code and checkpoints will be made available. |
![]() |
The key innovation of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals of 6, 12, and 24 hours. The 6 and 12-hour values help to encourage the model to learn and resolve the diurnal cycle (day-night cycle), one of the most important oscillations in the atmosphere driving short-term dynamics, while the 24-hour value filters the effects of the diurnal cycle and allows the model to learn longer, synoptic-scale dynamics. This objective enables a single model, once trained, to generate various forecasts for a specified lead time π and combine them to obtain better forecast accuracy. |
![]() |
Stormer architecture consists of two components, a weather-specific embedding and the Stormer backbone. The weather-specific embedding module embeds the input π0 β RπΓπ»ΓW to a sequence of (H/p) Γ (W/p) tokens, while modeling the non-linear interactions between climate variables in the input. The tokens are fed to the Stormer backbone together with the time interval embedding πΏπ‘, and finally go through a linear and reshape module to produce the dynamics forecast. Each Stormer block employs adaptive layer normalization to condition on πΏπ‘. |
We evaluate the performance of Stormer on Weatherbench 2 using RMSE (top) and ACC (bottom) metrics. For short-range, 1β5 day forecasts, Stormer accuracy is on par with Pangu-Weather and GraphCast. At longer lead times, Stormer excels, consistently outperforming both baseline methods from day 6 onwards by a large margin. Moreover, the performance gap increases as we increase the lead time. At 14 day forecasts, Stormer performs better than GraphCast by 10% β 20% across all 9 key variables. Stormer is also the only model in this comparison that performs better than Climatology at long lead times, while other methods approach or even do worse than this simple baseline. |
![]() |
Comparison with the deep learning baselines on Root Mean Squared Error (RMSE) metric. Lower RMSE is better. |
![]() |
Comparison with the deep learning baselines on Anomaly Correlation Coefficient (ACC) metric. Higher ACC is better. |
|