University of Z
Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be confounding since it requires codifying a complete knowledge of the known scientific behaviors and then projecting these known behaviors on the data to look for deviations. When utilizing machine learning, this presents a particular challenge since we require that the model not only understands scientific data perfectly but also recognizes when the data is inconsistent and out of the scope of its trained behavior. In this paper, we present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains covering astrophysics, genomics, and polar science. We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR). Furthermore, we present an approach that generalizes to future machine learning challenges, enabling the possibility of large, more compute-intensive challenges that can ultimately lead to scientific discovery.
A unified, byte-level ByT5 model (ByT5-Sanskrit) achieves state-of-the-art results across multiple Sanskrit Natural Language Processing tasks, including word segmentation and dependency parsing, matching lexicon-based methods while demonstrating robustness to noisy data. The model also generalizes effectively to other morphologically rich languages for tasks like lemmatization and dependency parsing.
Self-attention is essential to Transformer architectures, yet how information is embedded in the self-attention matrices and how different objective functions impact this process remains unclear. We present a mathematical framework to analyze self-attention matrices by deriving the structures governing their weight updates. Using this framework, we demonstrate that bidirectional training induces symmetry in the weight matrices, while autoregressive training results in directionality and column dominance. Our theoretical findings are validated across multiple Transformer models - including ModernBERT, GPT, LLaMA3, and Mistral - and input modalities like text, vision, and audio. Finally, we apply these insights by showing that symmetric initialization improves the performance of encoder-only models on language tasks. This mathematical analysis offers a novel theoretical perspective on how information is embedded through self-attention, thereby improving the interpretability of Transformer models.
1
Many community detection algorithms have been developed to uncover the mesoscopic properties of complex networks. However how good an algorithm is, in terms of accuracy and computing time, remains still open. Testing algorithms on real-world network has certain restrictions which made their insights potentially biased: the networks are usually small, and the underlying communities are not defined objectively. In this study, we employ the Lancichinetti-Fortunato-Radicchi benchmark graph to test eight state-of-the-art algorithms. We quantify the accuracy using complementary measures and algorithms' computing time. Based on simple network properties and the aforementioned results, we provide guidelines that help to choose the most adequate community detection algorithm for a given network. Moreover, these rules allow uncovering limitations in the use of specific algorithms given macroscopic network properties. Our contribution is threefold: firstly, we provide actual techniques to determine which is the most suited algorithm in most circumstances based on observable properties of the network under consideration. Secondly, we use the mixing parameter as an easily measurable indicator of finding the ranges of reliability of the different algorithms. Finally, we study the dependency with network size focusing on both the algorithm's predicting power and the effective computing time.
This research provides rigorous analytical derivations for entanglement scaling in matrix product state (MPS) representations of smooth functions, demonstrating universal exponential decay of entanglement with spatial scale. The study develops an improved MPS-based algorithm that constructs shallow, linear-depth quantum circuits for state preparation, successfully encoding heavy-tailed financial distributions on IBM quantum hardware for up to 25 qubits and showing classical scalability to 64 qubits with Tensor Cross Interpolation.
·
This paper presents results from the final Dark Energy Survey's Baryon Acoustic Oscillation and Supernova datasets, finding approximately 3.2σ evidence that dark energy is not a cosmological constant but a dynamical field with an evolving equation of state. The analysis also confirms that the Hubble tension persists even when allowing for an evolving dark energy, suggesting additional physics may be required.
Regression-based decoding of continuous movements is essential for human-machine interfaces (HMIs), such as prosthetic control. This study explores a feature-based approach to encoding Surface Electromyography (sEMG) signals, focusing on the role of variability in neural-inspired population encoding. By employing heterogeneous populations of Leaky Integrate-and- Fire (LIF) neurons with varying sizes and diverse parameter distributions, we investigate how population size and variability in encoding parameters, such as membrane time constants and thresholds, influence decoding performance. Using a simple linear readout, we demonstrate that variability improves robustness and generalizability compared to single-neuron encoders. These findings emphasize the importance of optimizing variability and population size for efficient and scalable regression tasks in spiking neural networks (SNNs), paving the way for robust, low-power HMI implementations.
We propose TAMER, a Test-time Adaptive MoE-driven framework for Electronic Health Record (EHR) Representation learning. TAMER introduces a framework where a Mixture-of-Experts (MoE) architecture is co-designed with Test-Time Adaptation (TTA) to jointly mitigate the intertwined challenges of patient heterogeneity and distribution shifts in EHR modeling. The MoE focuses on latent patient subgroups through domain-aware expert specialization, while TTA enables real-time adaptation to evolving health status distributions when new patient samples are introduced. Extensive experiments across four real-world EHR datasets demonstrate that TAMER consistently improves predictive performance for both mortality and readmission risk tasks when combined with diverse EHR modeling backbones. TAMER offers a promising approach for dynamic and personalized EHR-based predictions in practical clinical settings.
Gaze redirection is the task of changing the gaze to a desired direction for a given monocular eye patch image. Many applications such as videoconferencing, films, games, and generation of training data for gaze estimation require redirecting the gaze, without distorting the appearance of the area surrounding the eye and while producing photo-realistic images. Existing methods lack the ability to generate perceptually plausible images. In this work, we present a novel method to alleviate this problem by leveraging generative adversarial training to synthesize an eye image conditioned on a target gaze direction. Our method ensures perceptual similarity and consistency of synthesized images to the real images. Furthermore, a gaze estimation loss is used to control the gaze direction accurately. To attain high-quality images, we incorporate perceptual and cycle consistency losses into our architecture. In extensive evaluations we show that the proposed method outperforms state-of-the-art approaches in terms of both image quality and redirection precision. Finally, we show that generated images can bring significant improvement for the gaze estimation task if used to augment real training data.
144
The interplay of star formation and supernova (SN) feedback in galaxy formation is a key element for understanding galaxy evolution. Since these processes occur at small scales, it is necessary to have sub-grid models that recover their evolution and environmental effects at the scales reached by cosmological simulations. We simulate the same spiral galaxy inhabiting a Milky Way (MW) size halo in a cosmological environment changing the sub-grid models for SN feedback and star formation. We test combinations of the Schmidt law and a multi-freefall based star formation with delayed cooling feedback or mechanical feedback. We reach a resolution of 35 pc in a zoom-in box of 36 Mpc. For this, we use the code RAMSES with the implementation of gas turbulence in time and trace the local hydrodynamical features of the star-forming gas. Finally, we compare the galaxies at redshift 0 with global and interstellar medium observations in the MW and local spiral galaxies. The simulations show successful comparisons with observations. Nevertheless, diverse galactic morphologies are obtained from different numerical implementations. We highlight the importance of detailed modelling of the star formation and feedback processes, especially when increasing the resolution of simulations. Future improvements could alleviate the degeneracies exhibited in our simulated galaxies under different sub-grid models.
Astrophysical observations reveal a large diversity of radii and masses of exoplanets. It is important to characterize the interiors of exoplanets to understand planetary diversity and further determine how unique, or not, Earth is. Assessing interior structure is challenging because there are few data and large uncertainties. Thus, for a given exoplanet a range of interior structure models can satisfy available data. Typically, interior models aim to constrain the radial structure and composition of the core and mantle, and additionally ice, ocean, and gas layer if appropriate. Constraining the parameters of these layers may also inform us about interior dynamics. However, it remains challenging to constrain interior dynamics using interior structure models because structure models are relatively insensitive to the thermal state of a planet. Nevertheless, elucidating interior dynamics remains a key goal in exoplanetology due to its role in determining surface conditions and hence habitability. Thus far, Earth-like habitability can be excluded for super-Earths that are in close proximity to their stars and therefore have high surface temperatures that promote local magma oceans.
Atomic clock technology is advancing rapidly, now reaching stabilities of Δf/f1018\Delta f/f \sim 10^{-18}, which corresponds to resolving 11 cm in equivalent geoid height over an integration timescale of about 7 hours. At this level of performance, ground-based atomic clock networks emerge as a tool for monitoring a variety of geophysical processes by directly measuring changes in the gravitational potential. Vertical changes of the clock's position due to magmatic, volcanic, post-seismic or tidal deformations can result in measurable variations in the clock tick rate. As an example, we discuss the geopotential change arising due to an inflating point source (Mogi model), and apply it to the Etna volcano. Its effect on an observer on the Earth's surface can be divided into two different terms: one purely due to uplift and one due to the redistribution of matter. Thus, with the centimetre-level precision of current clocks it is already possible to monitor volcanoes. The matter redistribution term is estimated to be 2-3 orders of magnitude smaller than the uplift term, and should be resolvable when clocks improve their stability to the sub-millimetre level. Additionally, clocks can be compared over distances of thousands of kilometres on a short-term basis (e.g. hourly). These clock networks will improve our ability to monitor periodic effects with long-wavelength like the solid Earth tide.
In artificial intelligence, we often specify tasks through a reward function. While this works well in some settings, many tasks are hard to specify this way. In deep reinforcement learning, for example, directly specifying a reward as a function of a high-dimensional observation is challenging. Instead, we present an interface for specifying tasks interactively using demonstrations. Our approach defines a set of increasingly complex policies. The interface allows the user to switch between these policies at fixed intervals to generate demonstrations of novel, more complex, tasks. We train new policies based on these demonstrations and repeat the process. We present a case study of our approach in the Lunar Lander domain, and show that this simple approach can quickly learn a successful landing policy and outperforms an existing comparison-based deep RL method.
Academic fields exhibit substantial levels of gender segregation. To date, most attempts to explain this persistent global phenomenon have relied on limited cross-sections of data from specific countries, fields, or career stages. Here we used a global longitudinal dataset assembled from profiles on ORCID.org to investigate which characteristics of a field predict gender differences among the academics who leave and join that field. Only two field characteristics consistently predicted such differences: (1) the extent to which a field values raw intellectual talent ("brilliance") and (2) whether a field is in Science, Technology, Engineering, and Mathematics (STEM). Women more than men moved away from brilliance-oriented and STEM fields, and men more than women moved toward these fields. Our findings suggest that stereotypes associating brilliance and other STEM-relevant traits with men more than women play a key role in maintaining gender segregation across academia.
The ability to sequentially learn multiple tasks without forgetting is a key skill of biological brains, whereas it represents a major challenge to the field of deep learning. To avoid catastrophic forgetting, various continual learning (CL) approaches have been devised. However, these usually require discrete task boundaries. This requirement seems biologically implausible and often limits the application of CL methods in the real world where tasks are not always well defined. Here, we take inspiration from neuroscience, where sparse, non-overlapping neuronal representations have been suggested to prevent catastrophic forgetting. As in the brain, we argue that these sparse representations should be chosen on the basis of feed forward (stimulus-specific) as well as top-down (context-specific) information. To implement such selective sparsity, we use a bio-plausible form of hierarchical credit assignment known as Deep Feedback Control (DFC) and combine it with a winner-take-all sparsity mechanism. In addition to sparsity, we introduce lateral recurrent connections within each layer to further protect previously learned representations. We evaluate the new sparse-recurrent version of DFC on the split-MNIST computer vision benchmark and show that only the combination of sparsity and intra-layer recurrent connections improves CL performance with respect to standard backpropagation. Our method achieves similar performance to well-known CL methods, such as Elastic Weight Consolidation and Synaptic Intelligence, without requiring information about task boundaries. Overall, we showcase the idea of adopting computational principles from the brain to derive new, task-free learning algorithms for CL.
Event cameras are advantageous for tasks that require vision sensors with low-latency and sparse output responses. However, the development of deep network algorithms using event cameras has been slow because of the lack of large labelled event camera datasets for network training. This paper reports a method for creating new labelled event datasets by using a text-to-X model, where X is one or multiple output modalities, in the case of this work, events. Our proposed text-to-events model produces synthetic event frames directly from text prompts. It uses an autoencoder which is trained to produce sparse event frames representing event camera outputs. By combining the pretrained autoencoder with a diffusion model architecture, the new text-to-events model is able to generate smooth synthetic event streams of moving objects. The autoencoder was first trained on an event camera dataset of diverse scenes. In the combined training with the diffusion model, the DVS gesture dataset was used. We demonstrate that the model can generate realistic event sequences of human gestures prompted by different text statements. The classification accuracy of the generated sequences, using a classifier trained on the real dataset, ranges between 42% to 92%, depending on the gesture group. The results demonstrate the capability of this method in synthesizing event datasets.
As AI becomes increasingly embedded in daily life, ascertaining whether an agent is human is critical. We systematically benchmark AI's ability to imitate humans in three language tasks (image captioning, word association, conversation) and three vision tasks (color estimation, object detection, attention prediction), collecting data from 636 humans and 37 AI agents. Next, we conducted 72,191 Turing-like tests with 1,916 human judges and 10 AI judges. Current AIs are approaching the ability to convincingly impersonate humans and deceive human judges in both language and vision. Even simple AI judges outperformed humans in distinguishing AI from human responses. Imitation ability showed minimal correlation with conventional AI performance metrics, suggesting that passing as human is an important independent evaluation criterion. The large-scale Turing datasets and metrics introduced here offer valuable benchmarks for assessing human-likeness in AI and highlight the importance of rigorous, quantitative imitation tests for AI development.
Federated Learning (FL) is widely recognized as a privacy-preserving machine learning paradigm due to its model-sharing mechanism that avoids direct data exchange. Nevertheless, model training leaves exploitable traces that can be used to infer sensitive information. In Decentralized FL (DFL), the topology, defining how participants are connected, plays a crucial role in shaping the model's privacy, robustness, and convergence. However, the topology introduces an unexplored vulnerability: attackers can exploit it to infer participant relationships and launch targeted attacks. This work uncovers the hidden risks of DFL topologies by proposing a novel Topology Inference Attack that infers the topology solely from model behavior. A taxonomy of topology inference attacks is introduced, categorizing them by the attacker's capabilities and knowledge. Practical attack strategies are designed for various scenarios, and experiments are conducted to identify key factors influencing attack success. The results demonstrate that analyzing only the model of each node can accurately infer the DFL topology, highlighting a critical privacy risk in DFL systems. These findings offer valuable insights for improving privacy preservation in DFL environments.
Simulations of quantum dynamics are a key application of near term quantum computing, but are hindered by the twin challenges of noise and small device scale, which limit the executable circuit depths and the number of qubits the algorithm can be run on. Towards overcoming these obstacles we develop and implement a distributed variant of the projected Variational Quantum Dynamics which we dub dp-VQD, which allows to simultaneously alleviate circuit depth and width limitations. We employ the wire cutting technique, which can be executed on the existing devices without quantum or classical communication. We demonstrate the full variational training on noisy simulators, and execute and perform the reconstruction on real IBM quantum devices. The algorithm allows to execute Hamiltonian evolution simulations for problem sizes exceeding devices' nominal qubit counts, and to combine multiple small devices in a distributed computation. We test our approach on the Heisenberg and Hubbard model dynamics.
Federated learning (FL) has garnered significant attention as a prominent privacy-preserving Machine Learning (ML) paradigm. Decentralized FL (DFL) eschews traditional FL's centralized server architecture, enhancing the system's robustness and scalability. However, these advantages of DFL also create new vulnerabilities for malicious participants to execute adversarial attacks, especially model poisoning attacks. In model poisoning attacks, malicious participants aim to diminish the performance of benign models by creating and disseminating the compromised model. Existing research on model poisoning attacks has predominantly concentrated on undermining global models within the Centralized FL (CFL) paradigm, while there needs to be more research in DFL. To fill the research gap, this paper proposes an innovative model poisoning attack called DMPA. This attack calculates the differential characteristics of multiple malicious client models and obtains the most effective poisoning strategy, thereby orchestrating a collusive attack by multiple participants. The effectiveness of this attack is validated across multiple datasets, with results indicating that the DMPA approach consistently surpasses existing state-of-the-art FL model poisoning attack strategies.
There are no more papers matching your filters at the moment.