Scuola Internazionale Superiore di Studi Avanzati (SISSA)
Rotary Masked Autoencoders (RoMAE) extends the MAE framework by integrating continuous Rotary Positional Embeddings (RoPE), creating a versatile Transformer model capable of learning representations from irregular, multi-dimensional time-series data, images, and audio. The model achieved an F-score of 0.6770 on the DESC ELAsTiCC Challenge and an RMSE of 0.0183 on the Spirals 2D interpolation task, outperforming specialized architectures.
Researchers from SISSA and ICTP quantitatively demonstrate how label noise impacts the information content of neural network representations, showing that while overparameterized hidden layers remain largely robust, the final classification layer loses significant information, a phenomenon measurable by the Information Imbalance metric which also reveals a double descent in representation quality.
Axion as one of the promising dark matter candidates can be detected through narrow radio lines emitted from the magnetic white dwarf stars. Due to the existence of the strong magnetic field, the axion may resonantly convert into the radio photon (Primakoff effect) when it passes through a narrow region in the corona of the magnetic white dwarf, where the plasma frequency is equal to the axion mass. We show that for the magnetic white dwarf WD 2010+310, the future experiment SKA phase 1 with 100 hours of observation can effectively probe the parameter space of the axion-photon coupling gaγg_{a\gamma} up to 1012 GeV1\sim 10^{-12}~ \text{GeV}^{-1} for the axion mass range of 0.23.7 μ0.2 \sim 3.7~ \mueV. Note that in the low mass region (ma1.5 μeVm_a \lesssim 1.5 ~\mu\text{eV}), the WD 2010+310 could give greater sensitivity than the neutron star RX J0806.4-4123.
Neural-network interatomic potentials (NNIPs) have transformed atomistic simulations, by enabling molecular dynamics simulations with near ab initio accuracy at reduced computational costs and improved scalability. Despite these advances, crafting NNIPs remains complex, demanding specialized expertise in both machine learning and electronic-structure calculations. Here, we introduce an automated, open-source, and user-friendly workflow that streamlines the creation of accurate NNIPs. Our approach integrates density-functional theory, data augmentation strategies and classical molecular dynamics to systematically explore the potential energy landscape. Our active-learning strategy leverages on-the-fly calibration of committee disagreement against true errors to ensure reliable uncertainty estimates. We use electronic-structure descriptors and dimensionality reduction to analyze the efficiency of our active learning strategy, which is shown to minimize both false positives and false negatives when deciding what to relabel with ab initio calculations. The method is validated on the fully automated training of a NNIP for a diverse set of carbon allotropes, reaching state-of-the-art accuracy and data efficiency. This platform democratizes NNIP development, empowering users to achieve high-precision simulations with minimal human intervention.
We study the projected clustering of photometric luminous red galaxies from the DESI Legacy Survey, combining their angular power spectrum, bispectrum, and cross-correlation with maps of the CMB lensing convergence from the Planck satellite. We employ a perturbative bias expansion in Eulerian space to describe the clustering of galaxies, modelling the power spectrum and bispectrum at one-loop and tree level, respectively. This allows us to use the power spectrum to self-consistently calibrate the perturbative bias parameters. We validate this model against an NN-body simulation, and show that it can be used up to scales of at least kmaxP0.2hMpc1k_{\rm max}^P\simeq 0.2\,h{\rm Mpc}^{-1} and kmaxB0.08hMpc1k_{\rm max}^B\simeq 0.08\,h{\rm Mpc}^{-1}, saturating the information recovered from the data. We obtain constraints on the amplitude of matter fluctuations σ8=0.761±0.020\sigma_8=0.761\pm 0.020 and the non-relativistic matter fraction Ωm=0.307±0.015\Omega_m=0.307\pm 0.015, as well as the combination S8σ8Ωm/0.3=0.769±0.020S_8\equiv\sigma_8\sqrt{\Omega_m/0.3}=0.769 \pm 0.020. Including the galaxy bispectrum leads to a 1010-20%20\% improvement on the cosmological constraints, which are also in good agreement with previous analyses of the same data, and in mild tension with Planck at the 2.5σ\sim2.5\sigma level. This tension is largely present in the standard two-point function dataset, and the addition of the bispectrum increases it slightly, marginally shifting σ8\sigma_8 downwards and Ωm\Omega_m upwards. Finally, using the bispectrum allows for a substantially more precise measurement of the bias parameters of this sample, which are in reasonable agreement with existing coevolution relations.
The paper forecasts the Hongmeng 21cm experiment's ability to constrain scattering dark matter (SDM)-baryon interactions, predicting a 21-fold improvement over current CMB limits on the SDM cross-section for a five-year mission. This enhanced sensitivity offers a decisive test for the scattering dark matter explanation of the EDGES anomaly.
Fast radio bursts (FRBs) are millisecond-duration radio transients of extragalactic origin, with diverse time-frequency patterns and emission properties that require explanation. With one possible exception, FRBs are detected only in the radio, so analyzing their dynamic spectra is therefore crucial to disentangling the physical processes governing their generation and propagation. Furthermore, comparing FRB morphologies provides insights into possible differences among their progenitors and environments. This study applies unsupervised learning and deep learning techniques to investigate FRB dynamic spectra, focusing on two approaches: Principal Component Analysis (PCA) and a Convolutional Autoencoder (CAE) enhanced by an Information-Ordered Bottleneck (IOB) layer. PCA served as a computationally efficient baseline, capturing broad trends, identifying outliers, and providing valuable insights into large datasets. However, its linear nature limited its ability to reconstruct complex FRB structures. In contrast, the IOB-augmented CAE excelled at capturing intricate features, with high reconstruction accuracy and effective denoising at modest signal-to-noise ratios. The IOB layer's ability to prioritize relevant features enabled efficient data compression, preserving key morphological characteristics with minimal latent variables. When applied to real FRBs from CHIME, the IOB-CAE generalized effectively, revealing a latent space that highlighted the continuum of FRB morphologies and the potential for distinguishing intrinsic differences between burst types. This framework demonstrates that while FRBs may not naturally cluster into discrete groups, advanced representation learning techniques can uncover meaningful structures, offering new insights into the diversity and origins of these bursts.
Hierarchical Bayesian models of perception and learning feature prominently in contemporary cognitive neuroscience where, for example, they inform computational concepts of mental disorders. This includes predictive coding and hierarchical Gaussian filtering (HGF), which differ in the nature of hierarchical representations. Predictive coding assumes that higher levels in a given hierarchy influence the state (value) of lower levels. In HGF, however, higher levels determine the rate of change at lower levels. Here, we extend the space of generative models underlying HGF to include a form of nonlinear hierarchical coupling between state values akin to predictive coding and artificial neural networks in general. We derive the update equations corresponding to this generalization of HGF and conceptualize them as connecting a network of (belief) nodes where parent nodes either predict the state of child nodes or their rate of change. This enables us to (1) create modular architectures with generic computational steps in each node of the network, and (2) disclose the hierarchical message passing implied by generalized HGF models and to compare this to comparable schemes under predictive coding. We find that the algorithmic architecture instantiated by the generalized HGF is largely compatible with that of predictive coding but extends it with some unique predictions which arise from precision and volatility related computations. Our developments enable highly flexible implementations of hierarchical Bayesian models for empirical data analysis and are available as open source software.
An investigation into how neural networks learn hierarchical compositional data using the Random Hierarchy Model reveals that learning progresses in distinct stages, with Convolutional Neural Networks (CNNs) achieving faster learning rates and scaling laws (exponent \alpha = log f / log m) than Transformers (exponent \alpha = log f / 2log m) on this type of data.
This collection of perspective pieces captures recent advancements and reflections from a dynamic research community dedicated to bridging quantum gravity, hydrodynamics, and emergent cosmology. It explores four key research areas: (a) the interplay between hydrodynamics and cosmology, including analog gravity systems; (b) phase transitions, continuum limits and emergent geometry in quantum gravity; (c) relational perspectives in gravity and quantum gravity; and (d) the emergence of cosmological models rooted in quantum gravity frameworks. Each contribution presents the distinct perspectives of its respective authors. Additionally, the introduction by the editors proposes an integrative view, suggesting how these thematic units could serve as foundational pillars for a novel theoretical cosmology framework termed "hydrodynamics on superspace".
We investigate traveling wave solutions in the two-species reaction-diffusion Lotka-Volterra competition system under weak competition. For the strict weak competition regime (b0)(b0), we construct refined upper and lower solutions combined with the Schauder fixed point theorem to establish the existence of traveling waves for all wave speeds ss:=max{2,2ad}s\geq s^*:=\max\{2,2\sqrt{ad}\}, and provide verifiable sufficient conditions for the emergence of non-monotone waves. Such conditions for non-monotonic waves have not been explicitly addressed in previous studies. It is interesting to point out that our result for non-monotone waves also hold for the critical speed case s=ss=s^*. In addition, in the critical weak competition case (b0)(b0), we rigorously prove, for the first time, the existence of front-pulse traveling waves.
Based on one million arXiv papers submitted from May 2018 to January 2024, we assess the textual density of ChatGPT's writing style in their abstracts through a statistical analysis of word frequency changes. Our model is calibrated and validated on a mixture of real abstracts and ChatGPT-modified abstracts (simulated data) after a careful noise analysis. The words used for estimation are not fixed but adaptive, including those with decreasing frequency. We find that large language models (LLMs), represented by ChatGPT, are having an increasing impact on arXiv abstracts, especially in the field of computer science, where the fraction of LLM-style abstracts is estimated to be approximately 35%, if we take the responses of GPT-3.5 to one simple prompt, "revise the following sentences", as a baseline. We conclude with an analysis of both positive and negative aspects of the penetration of LLMs into academics' writing style.
28 Jan 2021
In the present work, we consider the industrial problem of estimating in real-time the mold-steel heat flux in continuous casting mold. We approach this problem by first considering the mold modeling problem (direct problem). Then, we plant the heat flux estimation problem as the inverse problem of estimating a Neumann boundary condition having as data pointwise temperature measurements in the interior of the mold domain. We also consider the case of having a total heat flux measurement together with the temperature measurements. We develop two methodologies for solving this inverse problem. The first one is the traditional Alifanov's regularization, the second one exploits the parameterization of the heat flux. We develop the latter method to have an offline-online decomposition with a computationally efficient online part to be performed in real-time. In the last part of this work, we test these methods on academic and industrial benchmarks. The results show that the parameterization method outclasses Alifanov's regularization both in performance and computational cost. Moreover, it proves to be robust with respect to the measurements noise. Finally, the tests confirm that the computational cost is suitable for real-time estimation of the heat flux.
Myriad viruses use positive-strand RNA molecules as their genomes. Far from being only a repository of genetic material, viral RNA performs numerous other functions mediated by its physical structure and chemical properties. In this chapter, we focus on its structure and discuss how long RNA molecules can be treated as branched polymers through planar graphs. We describe the major results that can be obtained by this approach, in particular the observation that viral RNA genomes have a characteristic compactness that sets them aside from similar random RNAs. We also discuss how different parameters used in the current RNA folding software influence the resulting structures and how they can be related to experimentally observable quantities. Finally, we show how the connection to branched polymers can be extended to take advantage of known results from polymer physics and can be further moulded to include additional interactions, such as excluded volume or electrostatics.
The rapid increase in multimodal data availability has sparked significant interest in cross-modal knowledge distillation (KD) techniques, where richer "teacher" modalities transfer information to weaker "student" modalities during model training to improve performance. However, despite successes across various applications, cross-modal KD does not always result in improved outcomes, primarily due to a limited theoretical understanding that could inform practice. To address this gap, we introduce the Cross-modal Complementarity Hypothesis (CCH): we propose that cross-modal KD is effective when the mutual information between teacher and student representations exceeds the mutual information between the student representation and the labels. We theoretically validate the CCH in a joint Gaussian model and further confirm it empirically across diverse multimodal datasets, including image, text, video, audio, and cancer-related omics data. Our study establishes a novel theoretical framework for understanding cross-modal KD and offers practical guidelines based on the CCH criterion to select optimal teacher modalities for improving the performance of weaker modalities.
We carry out a comprehensive comparison between the exact modular Hamiltonian and the lattice version of the Bisognano-Wichmann (BW) one in one-dimensional critical quantum spin chains. As a warm-up, we first illustrate how the trace distance provides a more informative mean of comparison between reduced density matrices when compared to any other Schatten nn-distance, normalized or not. In particular, as noticed in earlier works, it provides a way to bound other correlation functions in a precise manner, i.e., providing both lower and upper bounds. Additionally, we show that two close reduced density matrices, i.e. with zero trace distance for large sizes, can have very different modular Hamiltonians. This means that, in terms of describing how two states are close to each other, it is more informative to compare their reduced density matrices rather than the corresponding modular Hamiltonians. After setting this framework, we consider the ground states for infinite and periodic XX spin chain and critical Ising chain. We provide robust numerical evidence that the trace distance between the lattice BW reduced density matrix and the exact one goes to zero as 2\ell^{-2} for large length of the interval \ell. This provides strong constraints on the difference between the corresponding entanglement entropies and correlation functions. Our results indicate that discretized BW reduced density matrices reproduce exact entanglement entropies and correlation functions of local operators in the limit of large subsystem sizes. Finally, we show that the BW reduced density matrices fall short of reproducing the exact behavior of the logarithmic emptiness formation probability in the ground state of the XX spin chain.
We study the statistics of branching polymers with excluded-volume interactions, by modeling them as single self-avoiding trees on a generic regular periodic lattice with coordination number qq. Each lattice site can be occupied at most by one tree node, and the fraction of occupied sites can vary from dilute to dense conditions. By adopting the statistics of directed trees as a proxy for that of undirected trees without internal loops and by an exact mapping of the model into a field theory, we compute the entropy and the mean number of branch-nodes within a mean field approximation and in the thermodynamic limit. In particular, we find that the mean number of branch-nodes is independent of both the lattice details and the lattice occupation, depending only on the associated chemical potential. Monte Carlo simulations in d=2,3,4d=2,3,4 provide evidence of the remarkable accuracy of the mean field theory, more accurate for higher dimensions.
Two-dimensional (2D) magnets host a wide range of exotic magnetic textures, whose low-energy excitations and finite-temperature properties are typically described by effective spin models based on Heisenberg-like Hamiltonians. A key challenge in this framework is the reliable determination, from ab initio calculations, of exchange parameters and their anisotropic components, crucial for stabilising long-range order. Among the different strategies proposed for this task, the energy-mapping method -- based on total-energy calculations within Density Functional Theory (DFT) -- is the most widely adopted, but it typically requires laborious, multi-step procedures. To overcome this limitation, we introduce AMaRaNTA (Automating Magnetic paRAmeters iN a Tensorial Approach), a computational package that systematically automates the energy-mapping method, specifically through its ``four-state'' formulation, to extract exchange and anisotropy parameters in 2D magnets. In its current implementation, AMaRaNTA returns the nearest-neighbour exchange tensor, complemented by scalar parameters for second- and third-nearest-neighbour exchange interactions as well as single-ion anisotropy. Together, these provide a minimal yet sufficient set of parameters to capture magnetic frustration and anisotropies, essential for stabilising several observed magnetic states in 2D materials. Applied to a representative subset of the Materials Cloud 2D Structure database, AMaRaNTA demonstrates robust, automated and reproducible screening of magnetic interactions, with clear potential for high-throughput simulations.
Hierarchies feature prominently in anatomical accounts of cortical organisation. An open question is which computational (algorithmic) processes are implemented by these hierarchies. One renowned hypothesis is that cortical hierarchies implement a model of the world's causal structure and serve to infer environmental states from sensory inputs. This view, which casts perception as hierarchical Bayesian inference, has become a highly influential concept in both basic and clinical neuroscience. So far, however, a direct correspondence between the predicted order of hierarchical Bayesian computations and the sequence of evoked neuronal activity has not been demonstrated. Here, we present evidence for this correspondence from neuroimaging and electrophysiological data in healthy volunteers. Trial-wise sequences of hierarchical computations were inferred from participants' behaviour during a social learning task that required multi-level inference about intentions. We found that the temporal sequence of neuronal activity matched the order of computations as predicted by the theory. These findings provide strong evidence for the operation of hierarchical Bayesian inference in human cortex. Furthermore, our approach offers a novel strategy for the combined computational-physiological phenotyping of patients with disorders of perception, such as schizophrenia or autism.
We present results from the High Energy Stereoscopic System (H.E.S.S.) follow-up observations of Gamma-ray Bursts (GRBs) between 2004 and 2019. We are focusing on non-detections and providing the most extensive set of very-high-energy (VHE, >100 GeV) upper limits to date. We use this catalogue to constrain the properties of VHE-detected GRBs and compare them to those detected at VHE. Our study finds that VHE-detected GRBs are not a distinct population but are instead associated with bright X-ray afterglows and low redshifts. In addition, we model the multi-wavelength emission of a few of the observed GRBs and discuss the results in the context of their obtained microphysical parameters. The results from this work help put current VHE observations into perspective and highlight the capabilities of next-generation instruments, in detecting fainter and more distant GRBs at VHE.
There are no more papers matching your filters at the moment.