NSF AI Institute for Artificial Intelligence and Fundamental Interactions
MIT Improbable AI Lab researchers developed an end-to-end reinforcement learning controller that enabled the Mini Cheetah robot to achieve rapid, agile, and robust locomotion, setting a new indoor speed record of 3.9 m/s and demonstrating high agility (Froude number 5.1) on diverse natural terrains.
199
Obtaining high-precision predictions of nuclear masses, or equivalently nuclear binding energies, EbE_b, remains an important goal in nuclear-physics research. Recently, many AI-based tools have shown promising results on this task, some achieving precision that surpasses the best physics models. However, the utility of these AI models remains in question given that predictions are only useful where measurements do not exist, which inherently requires extrapolation away from the training (and testing) samples. Since AI models are largely black boxes, the reliability of such an extrapolation is difficult to assess. We present an AI model that not only achieves cutting-edge precision for EbE_b, but does so in an interpretable manner. For example, we find that (and explain why) the most important dimensions of its internal representation form a double helix, where the analog of the hydrogen bonds in DNA here link the number of protons and neutrons found in the most stable nucleus of each isotopic chain. Furthermore, we show that the AI prediction of EbE_b can be factorized and ordered hierarchically, with the most important terms corresponding to well-known symbolic models (such as the famous liquid drop). Remarkably, the improvement of the AI model over symbolic ones can almost entirely be attributed to an observation made by Jaffe in 1969 based on the structure of most known nuclear ground states. The end result is a fully interpretable data-driven model of nuclear masses based on physics deduced by AI.
Enhanced emission in the months to years preceding explosion has been detected for several core-collapse supernovae (SNe). Though the physical mechanisms driving the emission remain hotly debated, the light curves of detected events show long-lived (\geq50 days), plateau-like behavior, suggesting hydrogen recombination may significantly contribute to the total energy budget. The Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) will provide a decade-long photometric baseline to search for this emission, both in binned pre-explosion observations after an SN is detected and in single-visit observations prior to the SN explosion. In anticipation of these searches, we simulate a range of eruptive precursor models to core-collapse SNe and forecast the discovery rates of these phenomena in LSST data. We find a detection rate of ~40-130 yr1^{-1} for SN IIP/IIL precursors and ~110 yr1^{-1} for SN IIn precursors in single-epoch photometry. Considering the first three years of observations with the effects of rolling and observing triplets included, this number grows to a total of 150-400 in binned photometry, with the highest number recovered when binning in 100-day bins for 2020tlf-like precursors and in 20-day bins for other recombination-driven models from the literature. We quantify the impact of using templates contaminated by residual light (from either long-lived or separate precursor emission) on these detection rates, and explore strategies for estimating baseline flux to mitigate these issues. Spectroscopic follow-up of the eruptions preceding core-collapse SNe and detected with LSST will offer important clues to the underlying drivers of terminal-stage mass loss in massive stars.
It has recently been argued that noisy intermediate-scale quantum computers may be used to optimize interpolating operator constructions for lattice quantum field theory (LQFT) calculations on classical computers. Here, two concrete realizations of the method are developed and implemented. The first approach is to maximize the overlap, or fidelity, of the state created by an interpolating operator acting on the vacuum state to the target eigenstate. The second is to instead minimize the energy expectation value of the interpolated state. These approaches are implemented in a proof-of-concept calculation in (1+1)-dimensions for a single-flavor massive Schwinger model to obtain quantum-optimized interpolating operator constructions for a vector meson state in the theory. Although fidelity maximization is preferable in the absence of noise due to quantum gate errors, it is found that energy minimization is more robust to these effects in the proof-of-concept calculation. This work serves as a concrete demonstration of how quantum computers in the intermediate term might be used to accelerate classical LQFT calculations.
Partial differential equations (PDEs) are instrumental for modeling dynamical systems in science and engineering. The advent of neural networks has initiated a significant shift in tackling these complexities though challenges in accuracy persist, especially for initial value problems. In this paper, we introduce the Time-Evolving Natural Gradient (TENG)\textit{Time-Evolving Natural Gradient (TENG)}, generalizing time-dependent variational principles and optimization-based time integration, leveraging natural gradient optimization to obtain high accuracy in neural-network-based PDE solutions. Our comprehensive development includes algorithms like TENG-Euler and its high-order variants, such as TENG-Heun, tailored for enhanced precision and efficiency. TENG's effectiveness is further validated through its performance, surpassing current leading methods and achieving machine precision\textit{machine precision} in step-by-step optimizations across a spectrum of PDEs, including the heat equation, Allen-Cahn equation, and Burgers' equation.
6
In this study, we investigate the application of the New Physics Learning Machine (NPLM) algorithm as an alternative to the standard CWoLa method with Boosted Decision Trees (BDTs), particularly for scenarios with rare signal events. NPLM offers an end-to-end approach to anomaly detection and hypothesis testing by utilizing an in-sample evaluation of a binary classifier to estimate a log-density ratio, which can improve detection performance without prior assumptions on the signal model. We examine two approaches: (1) a end-to-end NPLM application in cases with reliable background modelling and (2) an NPLM-based classifier used for signal selection when accurate background modelling is unavailable, with subsequent performance enhancement through a hyper-test on multiple values of the selection threshold. Our findings show that NPLM-based methods outperform BDT-based approaches in detection performance, particularly in low signal injection scenarios, while significantly reducing epistemic variance due to hyperparameter choices. This work highlights the potential of NPLM for robust resonant anomaly detection in particle physics, setting a foundation for future methods that enhance sensitivity and consistency under signal variability.
Time series data and their time-frequency representation from gravitational-wave interferometers present multiple opportunities for the use of artificial intelligence methods associated with signal and image processing. Closely connected with this is the real-time aspect associated with gravitational-wave interferometers and the astrophysical observations they perform; the discovery potential of these instruments can be significantly enhanced when data processing can be achieved in O(1s) timescales. In this work, we introduce a novel signal and noise identification tool based on the YOLO (You Only Look Once) object detection framework. For its application into gravitational waves, we will refer to it as GW-YOLO. This tool can provide scene identification capabilities and essential information regarding whether an observed transient is any combination of noise and signal. Additionally, it supplies detailed time-frequency coordinates of the detected objects in the form of pixel masks, an essential property that can be used to understand and characterize astrophysical sources, as well as instrumental noise. The simultaneous identification of noise and signal, combined with precise pixel-level localization, represents a significant advancement in gravitational-wave data analysis. Our approach yields a 50\% detection efficiency for binary black hole signals at a signal-to-noise ratio (SNR) of 15 when such signals overlap with transient noise artifacts. When noise artifacts overlap with binary neutron star signals, our algorithm attains 50\% detection efficiency at an SNR of 30. This presents the first quantitative assessment of the ability to detect astrophysical events overlapping with realistic, instrument noise present in gravitational-wave interferometers.
In order to be in control of the α\alpha' derivative expansion, geometric string compactifications are understood in the context of a large volume approximation. In this letter, we consider the reduction of these higher derivative terms, and propose an improved estimate on the large volume approximation using numerical Calabi-Yau metrics obtained via machine learning methods. Further to this, we consider the α3\alpha'^3 corrections to numerical Calabi-Yau metrics in the context of IIB string theory. This correction represents one of several important contributions for realistic string compactifications -- alongside, for example, the backreaction of fluxes and local sources -- all of which have important consequences for string phenomenology. As a simple application of the corrected metric, we compute the change to the spectrum of the scalar Laplacian.
We consider the problem of generating samples via Flow Matching (FM) with an additional requirement that the generated samples must satisfy given constraints. We consider two scenarios, viz.: (a) when a differentiable distance function to the constraint set is given, and (b) when the constraint set is only available via queries to a membership oracle. For case (a), we propose a simple adaptation of the FM objective with an additional term that penalizes the distance between the constraint set and the generated samples. For case (b), we propose to employ randomization and learn a mean flow that is numerically shown to have a high likelihood of satisfying the constraints. This approach deviates significantly from existing works that require simple convex constraints, knowledge of a barrier function, or a reflection mechanism to constrain the probability flow. Furthermore, in the proposed setting we show that a two-stage approach, where both stages approximate the same original flow but with only the second stage probing the constraints via randomization, is more computationally efficient. Through several synthetic cases of constrained generation, we numerically show that the proposed approaches achieve significant gains in terms of constraint satisfaction while matching the target distributions. As a showcase for a practical oracle-based constraint, we show how our approach can be used for training an adversarial example generator, using queries to a hard-label black-box classifier. We conclude with several future research directions. Our code is available at this https URL.
Though type-Ia supernovae (SNe Ia) are found in all types of galaxies, recent local Hubble constant measurements have disfavored using SNe Ia in early-type or quiescent galaxies, aiming instead for better consistency with SNe Ia in star-forming, late-type host galaxies calibrated by Cepheid distances. Here we investigate the feasibility of a parallel distance ladder using SNe Ia exclusively in quiescent, massive (logM/M10\log M_*/M_{\odot} \geq 10) host galaxies, calibrated by tip of the red giant branch (TRGB) distances. We present TRGB measurements to four galaxies: three measured from the Hubble Space Telescope with the ACS F814W filter, and one measured from the JWST NIRCam F090W filter. Combined with literature measurements, we define a TRGB calibrator sample of five high-mass, early-type galaxies that hosted well-measured SNe Ia: NGC 1316 (SN 2006dd), NGC 1380 (SN 1992A), NGC 1404 (SN 2007on, SN 2011iv), NGC 4457 (SN 2020nvb), and NGC 4636 (SN 2020ue). We jointly standardize these calibrators with a fiducial sample of 124 Hubble-flow SNe Ia from the Zwicky Transient Facility that are matched in host-galaxy and light-curve properties. Our results with this homogenized subsample show a Hubble residual scatter of under 0.11 mag, lower than usually observed in cosmological samples of the full SN~Ia distribution. We obtain a measurement of the Hubble constant, H0=75.3±2.9H_0 = 75.3 \pm 2.9 km s1^{-1} Mpc1^{-1}, including statistical and estimated systematic uncertainties, and discuss the potential to further improve the precision of this approach. As calibrator and supernova samples grow, we advocate that future cosmological applications of SNe Ia use subsamples matched in host-galaxy and supernova properties across redshift.
Recent applications of machine-learned normalizing flows to sampling in lattice field theory suggest that such methods may be able to mitigate critical slowing down and topological freezing. However, these demonstrations have been at the scale of toy models, and it remains to be determined whether they can be applied to state-of-the-art lattice quantum chromodynamics calculations. Assessing the viability of sampling algorithms for lattice field theory at scale has traditionally been accomplished using simple cost scaling laws, but as we discuss in this work, their utility is limited for flow-based approaches. We conclude that flow-based approaches to sampling are better thought of as a broad family of algorithms with different scaling properties, and that scalability must be assessed experimentally.
We propose the linear barycentric coding model (LBCM) which utilizes the linear optimal transport (LOT) metric for analysis and synthesis of probability measures. We provide a closed-form solution to the variational problem characterizing the probability measures in the LBCM and establish equivalence of the LBCM to the set of 2-Wasserstein barycenters in the special case of compatible measures. Computational methods for synthesizing and analyzing measures in the LBCM are developed with finite sample guarantees. One of our main theoretical contributions is to identify an LBCM, expressed in terms of a simple family, which is sufficient to express all probability measures on the closed unit interval. We show that a natural analogous construction of an LBCM in 2 dimensions fails, and we leave it as an open problem to identify the proper extension in more than 1 dimension. We conclude by demonstrating the utility of LBCM for covariance estimation and data imputation.
Cosmological analyses of redshift space clustering data are primarily based on using luminous ``red'' galaxies (LRGs) and ``blue'' emission line galaxies (ELGs) to trace underlying dark matter. Using the large high-fidelity high-resolution MillenniumTNG (MTNG) and Astrid simulations, we study these galaxies with the effective field theory (EFT)-based field level forward model. We confirm that both red and blue galaxies can be accurately modeled with EFT at the field level and their parameters match those of the phenomenological halo-based models. Specifically, we consider the state of the art Halo Occupation Distribution (HOD) and High Mass Quenched (HMQ) models for the red and blue galaxies, respectively. Our results explicitly confirm the validity of the halo-based models on large scales beyond the two-point statistics. In addition, we validate the field-level HOD/HMQ-based priors for EFT full-shape analysis. We find that the local bias parameters of the ELGs are in tension with the predictions of the LRG-like HOD models and present a simple analytic argument explaining this phenomenology. We also confirm that ELGs exhibit weaker non-linear redshift-space distortions (``fingers-of-God''), suggesting that a significant fraction of their data should be perturbative. We find that the response of EFT parameters to galaxy selection is sensitive to assumptions about baryonic feedback, suggesting that a detailed understanding of feedback processes is necessary for robust predictions of EFT parameters. Finally, using neural density estimation based on paired HOD-EFT parameter samples, we obtain optimal HOD models that reproduce the clustering of Astrid and MTNG galaxies.
The Lipschitz constant of the map between the input and output space represented by a neural network is a natural metric for assessing the robustness of the model. We present a new method to constrain the Lipschitz constant of dense deep learning models that can also be generalized to other architectures. The method relies on a simple weight normalization scheme during training that ensures the Lipschitz constant of every layer is below an upper limit specified by the analyst. A simple monotonic residual connection can then be used to make the model monotonic in any subset of its inputs, which is useful in scenarios where domain knowledge dictates such dependence. Examples can be found in algorithmic fairness requirements or, as presented here, in the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider. Our normalization is minimally constraining and allows the underlying architecture to maintain higher expressiveness compared to other techniques which aim to either control the Lipschitz constant of the model or ensure its monotonicity. We show how the algorithm was used to train a powerful, robust, and interpretable discriminator for heavy-flavor-quark decays, which has been adopted for use as the primary data-selection algorithm in the LHCb real-time data-processing system in the current LHC data-taking period known as Run 3. In addition, our algorithm has also achieved state-of-the-art performance on benchmarks in medicine, finance, and other applications.
It has recently been argued that noisy intermediate-scale quantum computers may be used to optimize interpolating operator constructions for lattice quantum field theory (LQFT) calculations on classical computers. Here, two concrete realizations of the method are developed and implemented. The first approach is to maximize the overlap, or fidelity, of the state created by an interpolating operator acting on the vacuum state to the target eigenstate. The second is to instead minimize the energy expectation value of the interpolated state. These approaches are implemented in a proof-of-concept calculation in (1+1)-dimensions for a single-flavor massive Schwinger model to obtain quantum-optimized interpolating operator constructions for a vector meson state in the theory. Although fidelity maximization is preferable in the absence of noise due to quantum gate errors, it is found that energy minimization is more robust to these effects in the proof-of-concept calculation. This work serves as a concrete demonstration of how quantum computers in the intermediate term might be used to accelerate classical LQFT calculations.
Anomaly detection with convolutional autoencoders is a popular method to search for new physics in a model-agnostic manner. These techniques are powerful, but they are still a "black box," since we do not know what high-level physical observables determine how anomalous an event is. To address this, we adapt a recently proposed technique by Faucett et al., which maps out the physical observables learned by a neural network classifier, to the case of anomaly detection. We propose two different strategies that use a small number of high-level observables to mimic the decisions made by the autoencoder on background events, one designed to directly learn the output of the autoencoder, and the other designed to learn the difference between the autoencoder's outputs on a pair of events. Despite the underlying differences in their approach, we find that both strategies have similar ordering performance as the autoencoder and independently use the same six high-level observables. From there, we compare the performance of these networks as anomaly detectors. We find that both strategies perform similarly to the autoencoder across a variety of signals, giving a nontrivial demonstration that learning to order background events transfers to ordering a variety of signal events.
The upcoming Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) is expected to discover nearly a million Type Ia supernovae (SNeIa), offering an unprecedented opportunity to constrain dark energy. The vast majority of these events will lack spectroscopic classification and redshifts, necessitating a fully photometric approach to maximize cosmology constraining power. We present detailed simulations based on the Extended LSST Astronomical Time Series Classification Challenge (ELAsTiCC), and a cosmological analysis using photometrically classified SNeIa with host galaxy photometric redshifts. This dataset features realistic multi-band light curves, non-SNIa contamination, host mis-associations, and transient-host correlations across the high-redshift Deep Drilling Fields (DDF) (~ 50 deg^2). We also include a spectroscopically confirmed low-redshift sample based on the Wide Fast Deep (WFD) fields. We employ a joint SN+host photometric redshift fit, a neural network based photometric classifier (SCONE), and BEAMS with Bias Corrections (BBC) methodology to construct a bias-corrected Hubble diagram. We produce statistical + systematic covariance matrices, and perform cosmology fitting with a prior using Cosmic Microwave Background constraints. We fit and present results for the wCDM dark energy model, and the more general Chevallier-Polarski-Linder (CPL) w0wa model. With a simulated sample of ~6000 events, we achieve a Figure of Merit (FoM) value of about 150, which is significantly larger than the DESVYR FoM of 54. Averaging analysis results over 25 independent samples, we find small but significant biases indicating a need for further analysis testing and development.
Simulation-based inference (SBI) has emerged as a powerful tool for extracting cosmological information from galaxy surveys deep into the non-linear regime. Despite its great promise, its application is limited by the computational cost of running simulations that can describe the increasingly-large cosmological datasets. Recent work proposed a hybrid SBI framework (HySBI), which combines SBI on small-scales with perturbation theory (PT) on large-scales, allowing information to be extracted from high-resolution observations without large-volume simulations. In this work, we lay out the HySBI framework for galaxy clustering, a key step towards its application to next-generation datasets. We study the choice of priors on the parameters for modeling galaxies in PT analysis and in simulation-based analyses, as well as investigate their cosmology dependence. By jointly modeling large- and small-scale statistics and their associated nuisance parameters, we show that HySBI can obtain 20\% and 60\% tighter constraints on Ωm\Omega_m and σ8\sigma_8, respectively, compared to traditional PT analyses, thus demonstrating the efficacy of this approach to maximally extract information from upcoming spectroscopic datasets.
In this work, we address the question of how to enhance signal-agnostic searches by leveraging multiple testing strategies. Specifically, we consider hypothesis tests relying on machine learning, where model selection can introduce a bias towards specific families of new physics signals. We show that it is beneficial to combine different tests, characterised by distinct choices of hyperparameters, and that performances comparable to the best available test are generally achieved while providing a more uniform response to various types of anomalies. Focusing on the New Physics Learning Machine, a methodology to perform a signal-agnostic likelihood-ratio test, we explore a number of approaches to multiple testing, such as combining p-values and aggregating test statistics.
Researchers from the NSF AI Institute for Artificial Intelligence and Fundamental Interactions, MIT, Harvard, and SLAC developed Product Manifold Machine Learning, representing particle physics data on combined Euclidean, hyperbolic, and spherical geometries. This approach improves classification performance, especially for hierarchical data and in low-parameter models, by better capturing the data's intrinsic structure.
There are no more papers matching your filters at the moment.