University of Sussex
We report the results of radioactivity assays and heat leak calculations for a range of common cryogenic materials, considered for use in the QUEST-DMC superfluid 3He dark matter detector. The bolometer, instrumented with nanomechanical resonators, will be sensitive to energy deposits from dark matter interactions. Events from radioactive decays and cosmic rays constitute a significant background and must be precisely modelled, using a combination of material screening and Monte Carlo simulations. However, the results presented here are of wider interest for experiments and quantum devices sensitive to minute heat leaks and spurious events, thus we present heat leak per unit mass or surface area for every material studied. This can inform material choices for other experiments, especially if underground operation is considered where the radiogenic backgrounds will dominate even at shallow depths.
California Institute of Technology logoCalifornia Institute of TechnologyUniversity of OsloUniversity of Cambridge logoUniversity of CambridgeUniversity of VictoriaChinese Academy of Sciences logoChinese Academy of SciencesUniversity of ZurichTel Aviv University logoTel Aviv UniversityUniversity of Oxford logoUniversity of OxfordUniversity of Science and Technology of China logoUniversity of Science and Technology of ChinaScuola Normale SuperioreUniversity of Copenhagen logoUniversity of CopenhagenUniversity of EdinburghThe University of Texas at Austin logoThe University of Texas at AustinINFN logoINFNETH Zürich logoETH ZürichYonsei UniversityUniversity of CreteKavli Institute for the Physics and Mathematics of the UniverseUniversität HeidelbergUniversity of Maryland logoUniversity of MarylandUniversidad Autónoma de MadridUniversité Paris-Saclay logoUniversité Paris-SaclayStockholm University logoStockholm UniversityUniversity of HelsinkiUniversity of Arizona logoUniversity of ArizonaUniversity of Western AustraliaUniversity of SheffieldPrinceton University logoPrinceton UniversityUniversity of GenevaUniversity of PortsmouthUniversity of IcelandUniversità di GenovaUniversidade do PortoUniversity of SussexINAFAix Marseille UniversityNiels Bohr InstituteUniversity of JyväskyläUniversity of PadovaJet Propulsion LaboratoryJagiellonian UniversityInstituto de Astrofísica de CanariasUniversity of the WitwatersrandUniversity of NottinghamEuropean Space AgencyUniversity of Cape TownSISSANicolaus Copernicus Astronomical CenterObservatoire de la Côte d’AzurUniversity of Hawai’iUniversity of KwaZulu-NatalLudwig-Maximilians-UniversitätLaboratoire d’Astrophysique de MarseilleINAF-Istituto di RadioastronomiaINAF – Osservatorio Astronomico di RomaInstitut de Física d’Altes Energies (IFAE)Laboratoire de Physique des 2 Infinis Irène Joliot-CurieOsservatorio Astronomico della Regione Autonoma Valle d’AostaINAF - Osservatorio Astrofisico di CataniaINAF - Osservatorio Astronomico di ArcetriInstitut d’Astrophysique SpatialeNASADTU SpaceThe Queen’s University of BelfastInstituto de Astrofísica e Ciências do Espaço, Universidade de LisboaIRAP, Université de Toulouse, CNRS, CNESETH, Institute for AstronomyINAF-IASF, BolognaCosmic Dawn Center(DAWN)Universit degli Studi di FerraraUniversit de ParisUniversit Claude Bernard Lyon 1Excellence Cluster ‘Origins’Universit de LyonUniversit di PisaIFCA-CSIC-UCINAF Osservatorio Astronomico di PadovaUniversit degli Studi di FirenzeUniversit de MontpellierUniversit degli Studi di Napoli Federico IIUniversit di Roma Tor VergataINAF Osservatorio di Astrofisica e Scienza dello Spazio di BolognaUniversit Di BolognaINAF ` Osservatorio Astronomico di TriesteUniversit degli Studi di Trieste
Verifying the fully kinematic nature of the cosmic microwave background (CMB) dipole is of fundamental importance in cosmology. In the standard cosmological model with the Friedman-Lemaitre-Robertson-Walker (FLRW) metric from the inflationary expansion the CMB dipole should be entirely kinematic. Any non-kinematic CMB dipole component would thus reflect the preinflationary structure of spacetime probing the extent of the FLRW applicability. Cosmic backgrounds from galaxies after the matter-radiation decoupling, should have kinematic dipole component identical in velocity with the CMB kinematic dipole. Comparing the two can lead to isolating the CMB non-kinematic dipole. It was recently proposed that such measurement can be done using the near-IR cosmic infrared background (CIB) measured with the currently operating Euclid telescope, and later with Roman. The proposed method reconstructs the resolved CIB, the Integrated Galaxy Light (IGL), from Euclid's Wide Survey and probes its dipole, with a kinematic component amplified over that of the CMB by the Compton-Getting effect. The amplification coupled with the extensive galaxy samples forming the IGL would determine the CIB dipole with an overwhelming signal/noise, isolating its direction to sub-degree accuracy. We develop details of the method for Euclid's Wide Survey in 4 bands spanning 0.6 to 2 mic. We isolate the systematic and other uncertainties and present methodologies to minimize them, after confining the sample to the magnitude range with negligible IGL/CIB dipole from galaxy clustering. These include the required star-galaxy separation, accounting for the extinction correction dipole using the method newly developed here achieving total separation, accounting for the Earth's orbital motion and other systematic effects. (Abridged)
Research by Anthropic and collaborators reveals that large language models commonly exhibit sycophantic behavior across various production models and tasks. This tendency is driven by human preference data, where "matching a user's beliefs" is a highly preferred feature, leading preference models to incentivize sycophancy, sometimes even over factual truth.
71
Large language models trained on factual statements in one direction (e.g., "A is B") consistently fail to generalize and recall the inverse relationship (e.g., "B is A"), a phenomenon termed the "Reversal Curse." Experiments with various models show near-zero accuracy on reverse queries after forward-only training, and GPT-4 exhibited a 46 percentage point drop in real-world knowledge tasks when queried in the reverse order.
3
Adapting billion-parameter language models to a downstream task is still costly, even with parameter-efficient fine-tuning (PEFT). We re-cast task adaptation as output-distribution alignment: the objective is to steer the output distribution toward the task distribution directly during decoding rather than indirectly through weight updates. Building on this view, we introduce Steering Vector Decoding (SVDecode), a lightweight, PEFT-compatible, and theoretically grounded method. We start with a short warm-start fine-tune and extract a task-aware steering vector from the Kullback-Leibler (KL) divergence gradient between the output distribution of the warm-started and pre-trained models. This steering vector is then used to guide the decoding process to steer the model's output distribution towards the task distribution. We theoretically prove that SVDecode is first-order equivalent to the gradient step of full fine-tuning and derive a globally optimal solution for the strength of the steering vector. Across three tasks and nine benchmarks, SVDecode paired with four standard PEFT methods improves multiple-choice accuracy by up to 5 percentage points and open-ended truthfulness by 2 percentage points, with similar gains (1-2 percentage points) on commonsense datasets without adding trainable parameters beyond the PEFT adapter. SVDecode thus offers a lightweight, theoretically grounded path to stronger task adaptation for large language models.
· +1
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose 200+200+ concrete research questions.
· +2
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.
The biological implausibility of backpropagation (BP) has motivated many alternative, brain-inspired algorithms that attempt to rely only on local information, such as predictive coding (PC) and equilibrium propagation. However, these algorithms have notoriously struggled to train very deep networks, preventing them from competing with BP in large-scale settings. Indeed, scaling PC networks (PCNs) has recently been posed as a challenge for the community (Pinchetti et al., 2024). Here, we show that 100+ layer PCNs can be trained reliably using a Depth-μ\muP parameterisation (Yang et al., 2023; Bordelon et al., 2023) which we call "μ\muPC". By analysing the scaling behaviour of PCNs, we reveal several pathologies that make standard PCNs difficult to train at large depths. We then show that, despite addressing only some of these instabilities, μ\muPC allows stable training of very deep (up to 128-layer) residual networks on simple classification tasks with competitive performance and little tuning compared to current benchmarks. Moreover, μ\muPC enables zero-shot transfer of both weight and activity learning rates across widths and depths. Our results serve as a first step towards scaling PC to more complex architectures and have implications for other local algorithms. Code for μ\muPC is made available as part of a JAX library for PCNs.
55
The advancement of natural language processing (NLP) has been significantly boosted by the development of transformer-based large language models (LLMs). These models have revolutionized NLP tasks, particularly in code generation, aiding developers in creating software with enhanced efficiency. Despite their advancements, challenges in balancing code snippet generation with effective test case generation and execution persist. To address these issues, this paper introduces Multi-Agent Assistant Code Generation (AgentCoder), a novel solution comprising a multi-agent framework with specialized agents: the programmer agent, the test designer agent, and the test executor agent. During the coding procedure, the programmer agent will focus on the code generation and refinement based on the test executor agent's feedback. The test designer agent will generate test cases for the generated code, and the test executor agent will run the code with the test cases and write the feedback to the programmer. This collaborative system ensures robust code generation, surpassing the limitations of single-agent models and traditional methodologies. Our extensive experiments on 9 code generation models and 12 enhancement approaches showcase AgentCoder's superior performance over existing code generation models and prompt engineering techniques across various benchmarks. For example, AgentCoder (GPT-4) achieves 96.3\% and 91.8\% pass@1 in HumanEval and MBPP datasets with an overall token overhead of 56.9K and 66.3K, while state-of-the-art obtains only 90.2\% and 78.9\% pass@1 with an overall token overhead of 138.2K and 206.5K.
This paper presents a comprehensive review of predictive coding (PC) theory, detailing its mathematical foundations, experimental evidence, and biological plausibility. It also thoroughly examines PC's relationships with various machine learning techniques, positioning it as a potentially unifying account of cortical function.
The paper by Berglund et al. systematically investigates "situational awareness" in large language models by focusing on their "out-of-context reasoning" capabilities. Their framework demonstrates that this ability scales with model size and can be enhanced by data augmentation, further showing that models can exploit hidden reward functions using this type of reasoning.
38
We present a multi-agent system for automation of scientific research tasks, cmbagent (this https URL). The system is formed by about 30 Large Language Model (LLM) agents and implements a Planning & Control strategy to orchestrate the agentic workflow, with no human-in-the-loop at any point. Each agent specializes in a different task (performing retrieval on scientific papers and codebases, writing code, interpreting results, critiquing the output of other agents) and the system is able to execute code locally. We successfully apply cmbagent to carry out a PhD level cosmology task (the measurement of cosmological parameters using supernova data) and evaluate its performance on two benchmark sets, finding superior performance over state-of-the-art LLMs. The source code is available on GitHub, demonstration videos are also available, and the system is deployed on HuggingFace and will be available on the cloud.
An analysis identifies that Multi-Agent Systems of Large Language Models (MAS LLMs) frequently overlook foundational Multi-Agent Systems (MAS) principles, leading to deficiencies in native social behavior, structured environments, effective coordination, and quantifiable emergence, advocating for integrating established MAS theory to guide more robust and rigorous multi-agent AI development.
Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto-optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially-chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training.
180
Pretrained language models often generate outputs that are not in line with human preferences, such as harmful text or factually incorrect summaries. Recent work approaches the above issues by learning from a simple form of human feedback: comparisons between pairs of model-generated outputs. However, comparison feedback only conveys limited information about human preferences. In this paper, we introduce Imitation learning from Language Feedback (ILF), a new approach that utilizes more informative language feedback. ILF consists of three steps that are applied iteratively: first, conditioning the language model on the input, an initial LM output, and feedback to generate refinements. Second, selecting the refinement incorporating the most feedback. Third, finetuning the language model to maximize the likelihood of the chosen refinement given the input. We show theoretically that ILF can be viewed as Bayesian Inference, similar to Reinforcement Learning from human feedback. We evaluate ILF's effectiveness on a carefully-controlled toy task and a realistic summarization task. Our experiments demonstrate that large language models accurately incorporate feedback and that finetuning with ILF scales well with the dataset size, even outperforming finetuning on human summaries. Learning from both language and comparison feedback outperforms learning from each alone, achieving human-level summarization performance.
27
We propose a gravitational wave detector based on ultrastable optical cavities enabling the detection of gravitational wave signals in the mostly unexplored 105110^{-5}-1 Hz frequency band. We illustrate the working principle of the detector and discuss that several classes of gravitational wave sources, both of astrophysical and cosmological origin, may be within the detection range of this instrument. Our work suggests that terrestrial gravitational wave detection in the milli-Hz frequency range is potentially within reach with current technology.
R-AIF, a robust active inference framework, addresses complex robotic control from pixels by introducing a Contrastive Recurrent State Prior Preference model, achieving higher success rates and faster convergence than state-of-the-art model-based RL and AIF baselines on sparse-reward tasks including Mountain Car, Meta-World, and Robosuite.
41
We introduce JPC, a JAX library for training neural networks with Predictive Coding. JPC provides a simple, fast and flexible interface to train a variety of PC networks (PCNs) including discriminative, generative and hybrid models. Unlike existing libraries, JPC leverages ordinary differential equation solvers to integrate the gradient flow inference dynamics of PCNs. We find that a second-order solver achieves significantly faster runtimes compared to standard Euler integration, with comparable performance on a range of tasks and network depths. JPC also provides some theoretical tools that can be used to study PCNs. We hope that JPC will facilitate future research of PC. The code is available at this https URL.
Understanding the dynamic organization and homeostasis of living tissues requires high-resolution, time-resolved imaging coupled with methods capable of extracting interpretable, predictive insights from complex datasets. Here, we present the Vision Transformer Digital Twin Surrogate Network (VT-DTSN), a deep learning framework for predictive modeling of 3D+T imaging data from biological tissue. By leveraging Vision Transformers pretrained with DINO (Self-Distillation with NO Labels) and employing a multi-view fusion strategy, VT-DTSN learns to reconstruct high-fidelity, time-resolved dynamics of a Drosophila midgut while preserving morphological and feature-level integrity across imaging depths. The model is trained with a composite loss prioritizing pixel-level accuracy, perceptual structure, and feature-space alignment, ensuring biologically meaningful outputs suitable for in silico experimentation and hypothesis testing. Evaluation across layers and biological replicates demonstrates VT-DTSN's robustness and consistency, achieving low error rates and high structural similarity while maintaining efficient inference through model optimization. This work establishes VT-DTSN as a feasible, high-fidelity surrogate for cross-timepoint reconstruction and for studying tissue dynamics, enabling computational exploration of cellular behaviors and homeostasis to complement time-resolved imaging studies in biological research.
The objective of this paper is to review physiological and computational aspects of the responsiveness of the cerebral cortex to stimulation, and how responsiveness depends on the state of the system. This correspondence between brain state and brain responsiveness (state-dependent responses) is outlined at different scales from the cellular and circuit level, to the mesoscale and macroscale level. At each scale, we review how quantitative methods can be used to characterize network states based on brain responses, such as the Perturbational Complexity Index (PCI). This description will compare data and models, systematically and at multiple scales, with a focus on the mechanisms that explain how brain responses depend on brain states.
There are no more papers matching your filters at the moment.