new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Dec 10

PuYun: Medium-Range Global Weather Forecasting Using Large Kernel Attention Convolutional Networks

Accurate weather forecasting is essential for understanding and mitigating weather-related impacts. In this paper, we present PuYun, an autoregressive cascade model that leverages large kernel attention convolutional networks. The model's design inherently supports extended weather prediction horizons while broadening the effective receptive field. The integration of large kernel attention mechanisms within the convolutional layers enhances the model's capacity to capture fine-grained spatial details, thereby improving its predictive accuracy for meteorological phenomena. We introduce PuYun, comprising PuYun-Short for 0-5 day forecasts and PuYun-Medium for 5-10 day predictions. This approach enhances the accuracy of 10-day weather forecasting. Through evaluation, we demonstrate that PuYun-Short alone surpasses the performance of both GraphCast and FuXi-Short in generating accurate 10-day forecasts. Specifically, on the 10th day, PuYun-Short reduces the RMSE for Z500 to 720 m^2/s^2, compared to 732 m^2/s^2 for GraphCast and 740 m^2/s^2 for FuXi-Short. Additionally, the RMSE for T2M is reduced to 2.60 K, compared to 2.63 K for GraphCast and 2.65 K for FuXi-Short. Furthermore, when employing a cascaded approach by integrating PuYun-Short and PuYun-Medium, our method achieves superior results compared to the combined performance of FuXi-Short and FuXi-Medium. On the 10th day, the RMSE for Z500 is further reduced to 638 m^2/s^2, compared to 641 m^2/s^2 for FuXi. These findings underscore the effectiveness of our model ensemble in advancing medium-range weather prediction. Our training code and model will be open-sourced.

  • 10 authors
·
Sep 1, 2024

LaDCast: A Latent Diffusion Model for Medium-Range Ensemble Weather Forecasting

Accurate probabilistic weather forecasting demands both high accuracy and efficient uncertainty quantification, challenges that overburden both ensemble numerical weather prediction (NWP) and recent machine-learning methods. We introduce LaDCast, the first global latent-diffusion framework for medium-range ensemble forecasting, which generates hourly ensemble forecasts entirely in a learned latent space. An autoencoder compresses high-dimensional ERA5 reanalysis fields into a compact representation, and a transformer-based diffusion model produces sequential latent updates with arbitrary hour initialization. The model incorporates Geometric Rotary Position Embedding (GeoRoPE) to account for the Earth's spherical geometry, a dual-stream attention mechanism for efficient conditioning, and sinusoidal temporal embeddings to capture seasonal patterns. LaDCast achieves deterministic and probabilistic skill close to that of the European Centre for Medium-Range Forecast IFS-ENS, without any explicit perturbations. Notably, LaDCast demonstrates superior performance in tracking rare extreme events such as cyclones, capturing their trajectories more accurately than established models. By operating in latent space, LaDCast reduces storage and compute by orders of magnitude, demonstrating a practical path toward forecasting at kilometer-scale resolution in real time. We open-source our code and models and provide the training and evaluation pipelines at: https://github.com/tonyzyl/ladcast.

  • 2 authors
·
Jun 10

FuXi-ENS: A machine learning model for medium-range ensemble weather forecasting

Ensemble forecasting is crucial for improving weather predictions, especially for forecasts of extreme events. Constructing an ensemble prediction system (EPS) based on conventional NWP models is highly computationally expensive. ML models have emerged as valuable tools for deterministic weather forecasts, providing forecasts with significantly reduced computational requirements and even surpassing the forecast performance of traditional NWP models. However, challenges arise when applying ML models to ensemble forecasting. Recent ML models, such as GenCast and SEEDS model, rely on the ERA5 EDA or operational NWP ensemble members for forecast generation. Their spatial resolution is also considered too coarse for many applications. To overcome these limitations, we introduce FuXi-ENS, an advanced ML model designed to deliver 6-hourly global ensemble weather forecasts up to 15 days. This model runs at a significantly increased spatial resolution of 0.25\textdegree, incorporating 5 atmospheric variables at 13 pressure levels, along with 13 surface variables. By leveraging the inherent probabilistic nature of Variational AutoEncoder (VAE), FuXi-ENS optimizes a loss function that combines the CRPS and the KL divergence between the predicted and target distribution, facilitating the incorporation of flow-dependent perturbations in both initial conditions and forecast. This innovative approach makes FuXi-ENS an advancement over the traditional ones that use L1 loss combined with the KL loss in standard VAE models for ensemble weather forecasting. Results demonstrate that FuXi-ENS outperforms ensemble forecasts from the ECMWF, a world leading NWP model, in the CRPS of 98.1% of 360 variable and forecast lead time combinations. This achievement underscores the potential of the FuXi-ENS model to enhance ensemble weather forecasts, offering a promising direction for further development in this field.

  • 10 authors
·
May 9, 2024

oMEGACat. VII. Tracing Interstellar and Intracluster Medium of $ω$ Centauri using Sodium Absorptions

We investigate the foreground interstellar medium along the line of sight and intracluster medium of omega Centauri (omega Cen) by measuring the equivalent width of Na I D absorptions from MUSE observations. The large line-of-sight velocity difference between omega Cen and the foreground enables us to separate Na I D absorption contributed from atomic gas in the interstellar and intracluster medium. We find that small-scale substructures in the foreground Na I D distribution correlate with differential reddening derived from photometric methods. Using an empirical Na I D equivalent width-reddening relation, we determine an average reddening of E(B-V)=0.153pm0.003 mag within the half-light radius of omega Cen. However, the Na I D-inferred differential reddening is significantly larger than photometric estimates. This is likely due to scatter in the Na I D-reddening relation. We find no evidence for intracluster atomic gas from spectra of horizontal branch stars, as there is no significant Na I D absorption at omega Cen's systemic velocity. Given this non-detection, we place the strongest upper limit to date on the intracluster atomic gas column density in omega Cen of lesssim2.17 times 10^{18}~cm^{-2}. We also estimate the ionized gas density from pulsar dispersion measure variations, which exceed the atomic gas limit by sim50 times. Nevertheless, the strong correlation between dispersion measure and foreground Na I D suggests that much or all of this ionized gas resides in the foreground. Given ongoing mass loss from bright giant stars, our findings imply that the intracluster gas accumulation timescale is short, and gas removal in the cluster is likely not tied to stripping as omega Cen passes through the Galactic disk.

  • 17 authors
·
Sep 30

Scaling transformer neural networks for skillful and reliable medium-range weather forecasting

Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the-art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. On WeatherBench 2, Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days, while requiring orders-of-magnitude less training data and compute. Additionally, we demonstrate Stormer's favorable scaling properties, showing consistent improvements in forecast accuracy with increases in model size and training tokens. Code and checkpoints are available at https://github.com/tung-nd/stormer.

  • 9 authors
·
Dec 6, 2023

A Model RRNet for Spectral Information Exploitation and LAMOST Medium-resolution Spectrum Parameter Estimation

This work proposes a Residual Recurrent Neural Network (RRNet) for synthetically extracting spectral information, and estimating stellar atmospheric parameters together with 15 chemical element abundances for medium-resolution spectra from Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST). The RRNet consists of two fundamental modules: a residual module and a recurrent module. The residual module extracts spectral features based on the longitudinally driving power from parameters, while the recurrent module recovers spectral information and restrains the negative influences from noises based on Cross-band Belief Enhancement. RRNet is trained by the spectra from common stars between LAMOST DR7 and APOGEE-Payne catalog. The 17 stellar parameters and their uncertainties for 2.37 million medium-resolution spectra from LAMOST DR7 are predicted. For spectra with S/N >= 10, the precision of estimations Teff and log g are 88 K and 0.13 dex respectively, elements C, Mg, Al, Si, Ca, Fe, Ni are 0.05 dex to 0.08 dex, and N, O, S, K, Ti, Cr, Mn are 0.09 dex to 0.14 dex, while that of Cu is 0.19 dex. Compared with StarNet and SPCANet, RRNet shows higher accuracy and robustness. In comparison to Apache Point Observatory Galactic Evolution Experiment and Galactic Archaeology with HERMES surveys, RRNet manifests good consistency within a reasonable range of bias. Finally, this work releases a catalog for 2.37 million medium-resolution spectra from the LAMOST DR7, the source code, the trained model and the experimental data respectively for astronomical science exploration and data processing algorithm research reference.

  • 3 authors
·
May 30, 2022

A New Approach for Constraining Large-Scale Temperature Fluctuations in the Intergalactic Medium

The reionization of helium is thought to occur at 2.5lesssim zlesssim4, marking the last phase transition and final global heating event of the intergalactic medium (IGM). Since it is driven by rare quasars, helium reionization should give rise to strong temperature fluctuations in the IGM between neutral and recently-ionized regions of order sigma (ln T) sim Delta T/T = 20-50%. We introduce a novel method to search for reionization-induced temperature fluctuations in the IGM by using the effective optical depths of the Lyman-alpha forest towards a large number of background quasars. Higher IGM temperatures give rise to lower effective optical depths in the Lyman-alpha forest, implying that temperature fluctuations will broaden the observed optical depth distribution. We measured the distributions of effective Lyman-alpha forest optical depths across 71 X-Shooter spectra from the XQ-100 survey in four redshift bins from z=3.76 to z=4.19 and compared them to a large-volume cosmological hydrodynamical simulation. A good agreement is found between the observations and the simulation, which does not include temperature fluctuations; therefore, we do not detect a signature of helium reionization. We then post-process the simulations to include an increasing amount of temperature fluctuations until the model becomes inconsistent with the observations. We obtain tight constraints on sigma (ln T) < 0.29 (<0.40) at 2 sigma (3 sigma) at z=3.76 when averaging over scales of 100 comoving Mpc, and weaker constraints for higher redshifts and smaller scales. Our constraints are the tightest to date, and imply that either the IGM temperature contrast caused by helium reionization is less than sim30%, or that the process has not yet significantly started at z=3.76.

  • 3 authors
·
Jan 9

Aircrew rostering workload patterns and associated fatigue and sleepiness scores in short/medium haul flights under RBAC 117 rules in Brazil

The relationships between workload and fatigue or sleepiness are investigated through the analysis of rosters and responses to questionnaires from Brazilian aircrews, taken from Fadig\^ometro database. The approach includes temporal markers - coinciding with Samn-Perelli (SP) and Karolinska Sleepiness Scale (KSS) responses - where SAFTE-FAST model outcomes are calculated. The model results follow the increase of fatigue and sleepiness perceptions during the dawn (0h00 to 05h59), but underestimate the self-rated scores during the evening (18h00 to 23h59). On the other hand, the KSS scores fit the relative risk of pilot errors, representing a reasonable proxy for risk assessment. Linear relationships obtained between workload metrics, computed within 168-hours prior to the responses, and self-rated SP and KSS scores provide a consistent method to estimate accumulated fatigue and sleepiness. Considering 7149 rosters of 2023, the duty time (DT), the number of flight sectors (N_{CREW}) and the sum of flight sectors with sit periods longer than one hour (N_{CREW}+N_{SIT}) are associated with 70.1%/60.6% of the highest predicted scores of SP/KSS. Applying the mitigations DTleq44h, N_{CREW}leq15 and N_{CREW}+N_{SIT}leq19 for every 168-hour interval yields a significant decrease in the higher values of SP/KSS with minimal impact on aircrew productivity.

  • 8 authors
·
Aug 5, 2024

The challenge of simulating the star cluster population of dwarf galaxies with resolved interstellar medium

We present results on the star cluster properties from a series of high resolution smoothed particles hydrodynamics (SPH) simulations of isolated dwarf galaxies as part of the GRIFFIN project. The simulations at sub-parsec spatial resolution and a minimum particle mass of 4 M_odot incorporate non-equilibrium heating, cooling and chemistry processes, and realise individual massive stars. All the simulations follow feedback channels of massive stars that include the interstellar-radiation field, that is variable in space and time, the radiation input by photo-ionisation and supernova explosions. Varying the star formation efficiency per free-fall time in the range epsilon_ff = 0.2 - 50% neither changes the star formation rates nor the outflow rates. While the environmental densities at star formation change significantly with epsilon_ff, the ambient densities of supernovae are independent of epsilon_ff indicating a decoupling of the two processes. At low epsilon_ff, more massive, and increasingly more bound star clusters are formed, which are typically not destroyed. With increasing epsilon_ff there is a trend for shallower cluster mass functions and the cluster formation efficiency Gamma for young bound clusters decreases from 50 % to sim 1 % showing evidence for cluster disruption. However, none of our simulations form low mass (< 10^3 M_odot) clusters with structural properties in perfect agreement with observations. Traditional star formation models used in galaxy formation simulations based on local free-fall times might therefore not be able to capture low mass star cluster properties without significant fine-tuning.

  • 7 authors
·
Sep 16, 2021