Rotary Masked Autoencoders (RoMAE) extends the MAE framework by integrating continuous Rotary Positional Embeddings (RoPE), creating a versatile Transformer model capable of learning representations from irregular, multi-dimensional time-series data, images, and audio. The model achieved an F-score of 0.6770 on the DESC ELAsTiCC Challenge and an RMSE of 0.0183 on the Spirals 2D interpolation task, outperforming specialized architectures.
Researchers from SISSA and ICTP quantitatively demonstrate how label noise impacts the information content of neural network representations, showing that while overparameterized hidden layers remain largely robust, the final classification layer loses significant information, a phenomenon measurable by the Information Imbalance metric which also reveals a double descent in representation quality.
Axion as one of the promising dark matter candidates can be detected through narrow radio lines emitted from the magnetic white dwarf stars. Due to the existence of the strong magnetic field, the axion may resonantly convert into the radio photon (Primakoff effect) when it passes through a narrow region in the corona of the magnetic white dwarf, where the plasma frequency is equal to the axion mass. We show that for the magnetic white dwarf WD 2010+310, the future experiment SKA phase 1 with 100 hours of observation can effectively probe the parameter space of the axion-photon coupling gaγ up to ∼10−12GeV−1 for the axion mass range of 0.2∼3.7μeV. Note that in the low mass region (ma≲1.5μeV), the WD 2010+310 could give greater sensitivity than the neutron star RX J0806.4-4123.
Neural-network interatomic potentials (NNIPs) have transformed atomistic simulations, by enabling molecular dynamics simulations with near ab initio accuracy at reduced computational costs and improved scalability. Despite these advances, crafting NNIPs remains complex, demanding specialized expertise in both machine learning and electronic-structure calculations. Here, we introduce an automated, open-source, and user-friendly workflow that streamlines the creation of accurate NNIPs. Our approach integrates density-functional theory, data augmentation strategies and classical molecular dynamics to systematically explore the potential energy landscape. Our active-learning strategy leverages on-the-fly calibration of committee disagreement against true errors to ensure reliable uncertainty estimates. We use electronic-structure descriptors and dimensionality reduction to analyze the efficiency of our active learning strategy, which is shown to minimize both false positives and false negatives when deciding what to relabel with ab initio calculations. The method is validated on the fully automated training of a NNIP for a diverse set of carbon allotropes, reaching state-of-the-art accuracy and data efficiency. This platform democratizes NNIP development, empowering users to achieve high-precision simulations with minimal human intervention.
We study the projected clustering of photometric luminous red galaxies from the DESI Legacy Survey, combining their angular power spectrum, bispectrum, and cross-correlation with maps of the CMB lensing convergence from the Planck satellite. We employ a perturbative bias expansion in Eulerian space to describe the clustering of galaxies, modelling the power spectrum and bispectrum at one-loop and tree level, respectively. This allows us to use the power spectrum to self-consistently calibrate the perturbative bias parameters. We validate this model against an N-body simulation, and show that it can be used up to scales of at least kmaxP≃0.2hMpc−1 and kmaxB≃0.08hMpc−1, saturating the information recovered from the data. We obtain constraints on the amplitude of matter fluctuations σ8=0.761±0.020 and the non-relativistic matter fraction Ωm=0.307±0.015, as well as the combination S8≡σ8Ωm/0.3=0.769±0.020. Including the galaxy bispectrum leads to a 10-20% improvement on the cosmological constraints, which are also in good agreement with previous analyses of the same data, and in mild tension with Planck at the ∼2.5σ level. This tension is largely present in the standard two-point function dataset, and the addition of the bispectrum increases it slightly, marginally shifting σ8 downwards and Ωm upwards. Finally, using the bispectrum allows for a substantially more precise measurement of the bias parameters of this sample, which are in reasonable agreement with existing coevolution relations.
The paper forecasts the Hongmeng 21cm experiment's ability to constrain scattering dark matter (SDM)-baryon interactions, predicting a 21-fold improvement over current CMB limits on the SDM cross-section for a five-year mission. This enhanced sensitivity offers a decisive test for the scattering dark matter explanation of the EDGES anomaly.
Fast radio bursts (FRBs) are millisecond-duration radio transients of
extragalactic origin, with diverse time-frequency patterns and emission
properties that require explanation. With one possible exception, FRBs are
detected only in the radio, so analyzing their dynamic spectra is therefore
crucial to disentangling the physical processes governing their generation and
propagation. Furthermore, comparing FRB morphologies provides insights into
possible differences among their progenitors and environments. This study
applies unsupervised learning and deep learning techniques to investigate FRB
dynamic spectra, focusing on two approaches: Principal Component Analysis (PCA)
and a Convolutional Autoencoder (CAE) enhanced by an Information-Ordered
Bottleneck (IOB) layer. PCA served as a computationally efficient baseline,
capturing broad trends, identifying outliers, and providing valuable insights
into large datasets. However, its linear nature limited its ability to
reconstruct complex FRB structures. In contrast, the IOB-augmented CAE excelled
at capturing intricate features, with high reconstruction accuracy and
effective denoising at modest signal-to-noise ratios. The IOB layer's ability
to prioritize relevant features enabled efficient data compression, preserving
key morphological characteristics with minimal latent variables. When applied
to real FRBs from CHIME, the IOB-CAE generalized effectively, revealing a
latent space that highlighted the continuum of FRB morphologies and the
potential for distinguishing intrinsic differences between burst types. This
framework demonstrates that while FRBs may not naturally cluster into discrete
groups, advanced representation learning techniques can uncover meaningful
structures, offering new insights into the diversity and origins of these
bursts.
Hierarchical Bayesian models of perception and learning feature prominently
in contemporary cognitive neuroscience where, for example, they inform
computational concepts of mental disorders. This includes predictive coding and
hierarchical Gaussian filtering (HGF), which differ in the nature of
hierarchical representations. Predictive coding assumes that higher levels in a
given hierarchy influence the state (value) of lower levels. In HGF, however,
higher levels determine the rate of change at lower levels. Here, we extend the
space of generative models underlying HGF to include a form of nonlinear
hierarchical coupling between state values akin to predictive coding and
artificial neural networks in general. We derive the update equations
corresponding to this generalization of HGF and conceptualize them as
connecting a network of (belief) nodes where parent nodes either predict the
state of child nodes or their rate of change. This enables us to (1) create
modular architectures with generic computational steps in each node of the
network, and (2) disclose the hierarchical message passing implied by
generalized HGF models and to compare this to comparable schemes under
predictive coding. We find that the algorithmic architecture instantiated by
the generalized HGF is largely compatible with that of predictive coding but
extends it with some unique predictions which arise from precision and
volatility related computations. Our developments enable highly flexible
implementations of hierarchical Bayesian models for empirical data analysis and
are available as open source software.
An investigation into how neural networks learn hierarchical compositional data using the Random Hierarchy Model reveals that learning progresses in distinct stages, with Convolutional Neural Networks (CNNs) achieving faster learning rates and scaling laws (exponent \alpha = log f / log m) than Transformers (exponent \alpha = log f / 2log m) on this type of data.
This collection of perspective pieces captures recent advancements and
reflections from a dynamic research community dedicated to bridging quantum
gravity, hydrodynamics, and emergent cosmology. It explores four key research
areas: (a) the interplay between hydrodynamics and cosmology, including analog
gravity systems; (b) phase transitions, continuum limits and emergent geometry
in quantum gravity; (c) relational perspectives in gravity and quantum gravity;
and (d) the emergence of cosmological models rooted in quantum gravity
frameworks. Each contribution presents the distinct perspectives of its
respective authors. Additionally, the introduction by the editors proposes an
integrative view, suggesting how these thematic units could serve as
foundational pillars for a novel theoretical cosmology framework termed
"hydrodynamics on superspace".
We investigate traveling wave solutions in the two-species reaction-diffusion Lotka-Volterra competition system under weak competition. For the strict weak competition regime (b0), we construct refined upper and lower solutions combined with the Schauder fixed point theorem to establish the existence of traveling waves for all wave speeds s≥s∗:=max{2,2ad}, and provide verifiable sufficient conditions for the emergence of non-monotone waves. Such conditions for non-monotonic waves have not been explicitly addressed in previous studies. It is interesting to point out that our result for non-monotone waves also hold for the critical speed case s=s∗. In addition, in the critical weak competition case (b0), we rigorously prove, for the first time, the existence of front-pulse traveling waves.
Based on one million arXiv papers submitted from May 2018 to January 2024, we
assess the textual density of ChatGPT's writing style in their abstracts
through a statistical analysis of word frequency changes. Our model is
calibrated and validated on a mixture of real abstracts and ChatGPT-modified
abstracts (simulated data) after a careful noise analysis. The words used for
estimation are not fixed but adaptive, including those with decreasing
frequency. We find that large language models (LLMs), represented by ChatGPT,
are having an increasing impact on arXiv abstracts, especially in the field of
computer science, where the fraction of LLM-style abstracts is estimated to be
approximately 35%, if we take the responses of GPT-3.5 to one simple prompt,
"revise the following sentences", as a baseline. We conclude with an analysis
of both positive and negative aspects of the penetration of LLMs into
academics' writing style.
In the present work, we consider the industrial problem of estimating in real-time the mold-steel heat flux in continuous casting mold. We approach this problem by first considering the mold modeling problem (direct problem). Then, we plant the heat flux estimation problem as the inverse problem of estimating a Neumann boundary condition having as data pointwise temperature measurements in the interior of the mold domain. We also consider the case of having a total heat flux measurement together with the temperature measurements. We develop two methodologies for solving this inverse problem. The first one is the traditional Alifanov's regularization, the second one exploits the parameterization of the heat flux. We develop the latter method to have an offline-online decomposition with a computationally efficient online part to be performed in real-time. In the last part of this work, we test these methods on academic and industrial benchmarks. The results show that the parameterization method outclasses Alifanov's regularization both in performance and computational cost. Moreover, it proves to be robust with respect to the measurements noise. Finally, the tests confirm that the computational cost is suitable for real-time estimation of the heat flux.
Myriad viruses use positive-strand RNA molecules as their genomes. Far from being only a repository of genetic material, viral RNA performs numerous other functions mediated by its physical structure and chemical properties. In this chapter, we focus on its structure and discuss how long RNA molecules can be treated as branched polymers through planar graphs. We describe the major results that can be obtained by this approach, in particular the observation that viral RNA genomes have a characteristic compactness that sets them aside from similar random RNAs. We also discuss how different parameters used in the current RNA folding software influence the resulting structures and how they can be related to experimentally observable quantities. Finally, we show how the connection to branched polymers can be extended to take advantage of known results from polymer physics and can be further moulded to include additional interactions, such as excluded volume or electrostatics.
The rapid increase in multimodal data availability has sparked significant interest in cross-modal knowledge distillation (KD) techniques, where richer "teacher" modalities transfer information to weaker "student" modalities during model training to improve performance. However, despite successes across various applications, cross-modal KD does not always result in improved outcomes, primarily due to a limited theoretical understanding that could inform practice. To address this gap, we introduce the Cross-modal Complementarity Hypothesis (CCH): we propose that cross-modal KD is effective when the mutual information between teacher and student representations exceeds the mutual information between the student representation and the labels. We theoretically validate the CCH in a joint Gaussian model and further confirm it empirically across diverse multimodal datasets, including image, text, video, audio, and cancer-related omics data. Our study establishes a novel theoretical framework for understanding cross-modal KD and offers practical guidelines based on the CCH criterion to select optimal teacher modalities for improving the performance of weaker modalities.
We carry out a comprehensive comparison between the exact modular Hamiltonian
and the lattice version of the Bisognano-Wichmann (BW) one in one-dimensional
critical quantum spin chains. As a warm-up, we first illustrate how the trace
distance provides a more informative mean of comparison between reduced density
matrices when compared to any other Schatten n-distance, normalized or not.
In particular, as noticed in earlier works, it provides a way to bound other
correlation functions in a precise manner, i.e., providing both lower and upper
bounds. Additionally, we show that two close reduced density matrices, i.e.
with zero trace distance for large sizes, can have very different modular
Hamiltonians. This means that, in terms of describing how two states are close
to each other, it is more informative to compare their reduced density matrices
rather than the corresponding modular Hamiltonians. After setting this
framework, we consider the ground states for infinite and periodic XX spin
chain and critical Ising chain. We provide robust numerical evidence that the
trace distance between the lattice BW reduced density matrix and the exact one
goes to zero as ℓ−2 for large length of the interval ℓ. This
provides strong constraints on the difference between the corresponding
entanglement entropies and correlation functions. Our results indicate that
discretized BW reduced density matrices reproduce exact entanglement entropies
and correlation functions of local operators in the limit of large subsystem
sizes. Finally, we show that the BW reduced density matrices fall short of
reproducing the exact behavior of the logarithmic emptiness formation
probability in the ground state of the XX spin chain.
We study the statistics of branching polymers with excluded-volume interactions, by modeling them as single self-avoiding trees on a generic regular periodic lattice with coordination number q. Each lattice site can be occupied at most by one tree node, and the fraction of occupied sites can vary from dilute to dense conditions. By adopting the statistics of directed trees as a proxy for that of undirected trees without internal loops and by an exact mapping of the model into a field theory, we compute the entropy and the mean number of branch-nodes within a mean field approximation and in the thermodynamic limit. In particular, we find that the mean number of branch-nodes is independent of both the lattice details and the lattice occupation, depending only on the associated chemical potential. Monte Carlo simulations in d=2,3,4 provide evidence of the remarkable accuracy of the mean field theory, more accurate for higher dimensions.
Two-dimensional (2D) magnets host a wide range of exotic magnetic textures, whose low-energy excitations and finite-temperature properties are typically described by effective spin models based on Heisenberg-like Hamiltonians. A key challenge in this framework is the reliable determination, from ab initio calculations, of exchange parameters and their anisotropic components, crucial for stabilising long-range order. Among the different strategies proposed for this task, the energy-mapping method -- based on total-energy calculations within Density Functional Theory (DFT) -- is the most widely adopted, but it typically requires laborious, multi-step procedures. To overcome this limitation, we introduce AMaRaNTA (Automating Magnetic paRAmeters iN a Tensorial Approach), a computational package that systematically automates the energy-mapping method, specifically through its ``four-state'' formulation, to extract exchange and anisotropy parameters in 2D magnets. In its current implementation, AMaRaNTA returns the nearest-neighbour exchange tensor, complemented by scalar parameters for second- and third-nearest-neighbour exchange interactions as well as single-ion anisotropy. Together, these provide a minimal yet sufficient set of parameters to capture magnetic frustration and anisotropies, essential for stabilising several observed magnetic states in 2D materials. Applied to a representative subset of the Materials Cloud 2D Structure database, AMaRaNTA demonstrates robust, automated and reproducible screening of magnetic interactions, with clear potential for high-throughput simulations.
Hierarchies feature prominently in anatomical accounts of cortical organisation. An open question is which computational (algorithmic) processes are implemented by these hierarchies. One renowned hypothesis is that cortical hierarchies implement a model of the world's causal structure and serve to infer environmental states from sensory inputs. This view, which casts perception as hierarchical Bayesian inference, has become a highly influential concept in both basic and clinical neuroscience. So far, however, a direct correspondence between the predicted order of hierarchical Bayesian computations and the sequence of evoked neuronal activity has not been demonstrated. Here, we present evidence for this correspondence from neuroimaging and electrophysiological data in healthy volunteers. Trial-wise sequences of hierarchical computations were inferred from participants' behaviour during a social learning task that required multi-level inference about intentions. We found that the temporal sequence of neuronal activity matched the order of computations as predicted by the theory. These findings provide strong evidence for the operation of hierarchical Bayesian inference in human cortex. Furthermore, our approach offers a novel strategy for the combined computational-physiological phenotyping of patients with disorders of perception, such as schizophrenia or autism.
We present results from the High Energy Stereoscopic System (H.E.S.S.) follow-up observations of Gamma-ray Bursts (GRBs) between 2004 and 2019. We are focusing on non-detections and providing the most extensive set of very-high-energy (VHE, >100 GeV) upper limits to date. We use this catalogue to constrain the properties of VHE-detected GRBs and compare them to those detected at VHE. Our study finds that VHE-detected GRBs are not a distinct population but are instead associated with bright X-ray afterglows and low redshifts. In addition, we model the multi-wavelength emission of a few of the observed GRBs and discuss the results in the context of their obtained microphysical parameters. The results from this work help put current VHE observations into perspective and highlight the capabilities of next-generation instruments, in detecting fainter and more distant GRBs at VHE.
There are no more papers matching your filters at the moment.