University of Sheffield
KBASS introduces a robust framework for discovering governing equations from data, combining kernel learning with Bayesian spike-and-slab priors and efficient tensor algebra. This approach consistently recovers ground-truth equations from sparse and noisy data, outperforming state-of-the-art methods like SINDy, PINN-SR, and BSL while providing principled uncertainty quantification and improved computational efficiency.
2
This survey paper defines and systematically reviews the emerging paradigm of self-evolving AI agents, which bridge static foundation models with dynamic lifelong adaptability. It introduces a unified conceptual framework and a comprehensive taxonomy of evolution techniques, mapping the progression towards continuous self-improvement in AI systems.
1,025
California Institute of Technology logoCalifornia Institute of TechnologyUniversity of OsloUniversity of Cambridge logoUniversity of CambridgeUniversity of VictoriaChinese Academy of Sciences logoChinese Academy of SciencesUniversity of ZurichTel Aviv University logoTel Aviv UniversityUniversity of Oxford logoUniversity of OxfordUniversity of Science and Technology of China logoUniversity of Science and Technology of ChinaScuola Normale SuperioreUniversity of Copenhagen logoUniversity of CopenhagenUniversity of EdinburghThe University of Texas at Austin logoThe University of Texas at AustinINFN logoINFNETH Zürich logoETH ZürichYonsei UniversityUniversity of CreteKavli Institute for the Physics and Mathematics of the UniverseUniversität HeidelbergUniversity of Maryland logoUniversity of MarylandUniversidad Autónoma de MadridUniversité Paris-Saclay logoUniversité Paris-SaclayStockholm University logoStockholm UniversityUniversity of HelsinkiUniversity of Arizona logoUniversity of ArizonaUniversity of Western AustraliaUniversity of SheffieldPrinceton University logoPrinceton UniversityUniversity of GenevaUniversity of PortsmouthUniversity of IcelandUniversità di GenovaUniversidade do PortoUniversity of SussexINAFAix Marseille UniversityNiels Bohr InstituteUniversity of JyväskyläUniversity of PadovaJet Propulsion LaboratoryJagiellonian UniversityInstituto de Astrofísica de CanariasUniversity of the WitwatersrandUniversity of NottinghamEuropean Space AgencyUniversity of Cape TownSISSANicolaus Copernicus Astronomical CenterObservatoire de la Côte d’AzurUniversity of Hawai’iUniversity of KwaZulu-NatalLudwig-Maximilians-UniversitätLaboratoire d’Astrophysique de MarseilleINAF-Istituto di RadioastronomiaINAF – Osservatorio Astronomico di RomaInstitut de Física d’Altes Energies (IFAE)Laboratoire de Physique des 2 Infinis Irène Joliot-CurieOsservatorio Astronomico della Regione Autonoma Valle d’AostaINAF - Osservatorio Astrofisico di CataniaINAF - Osservatorio Astronomico di ArcetriInstitut d’Astrophysique SpatialeNASADTU SpaceThe Queen’s University of BelfastInstituto de Astrofísica e Ciências do Espaço, Universidade de LisboaIRAP, Université de Toulouse, CNRS, CNESETH, Institute for AstronomyINAF-IASF, BolognaCosmic Dawn Center(DAWN)Universit degli Studi di FerraraUniversit de ParisUniversit Claude Bernard Lyon 1Excellence Cluster ‘Origins’Universit de LyonUniversit di PisaIFCA-CSIC-UCINAF Osservatorio Astronomico di PadovaUniversit degli Studi di FirenzeUniversit de MontpellierUniversit degli Studi di Napoli Federico IIUniversit di Roma Tor VergataINAF Osservatorio di Astrofisica e Scienza dello Spazio di BolognaUniversit Di BolognaINAF ` Osservatorio Astronomico di TriesteUniversit degli Studi di Trieste
Verifying the fully kinematic nature of the cosmic microwave background (CMB) dipole is of fundamental importance in cosmology. In the standard cosmological model with the Friedman-Lemaitre-Robertson-Walker (FLRW) metric from the inflationary expansion the CMB dipole should be entirely kinematic. Any non-kinematic CMB dipole component would thus reflect the preinflationary structure of spacetime probing the extent of the FLRW applicability. Cosmic backgrounds from galaxies after the matter-radiation decoupling, should have kinematic dipole component identical in velocity with the CMB kinematic dipole. Comparing the two can lead to isolating the CMB non-kinematic dipole. It was recently proposed that such measurement can be done using the near-IR cosmic infrared background (CIB) measured with the currently operating Euclid telescope, and later with Roman. The proposed method reconstructs the resolved CIB, the Integrated Galaxy Light (IGL), from Euclid's Wide Survey and probes its dipole, with a kinematic component amplified over that of the CMB by the Compton-Getting effect. The amplification coupled with the extensive galaxy samples forming the IGL would determine the CIB dipole with an overwhelming signal/noise, isolating its direction to sub-degree accuracy. We develop details of the method for Euclid's Wide Survey in 4 bands spanning 0.6 to 2 mic. We isolate the systematic and other uncertainties and present methodologies to minimize them, after confining the sample to the magnitude range with negligible IGL/CIB dipole from galaxy clustering. These include the required star-galaxy separation, accounting for the extinction correction dipole using the method newly developed here achieving total separation, accounting for the Earth's orbital motion and other systematic effects. (Abridged)
ETH Zurich logoETH ZurichKAIST logoKAISTUniversity of Washington logoUniversity of WashingtonRensselaer Polytechnic InstituteGoogle DeepMind logoGoogle DeepMindUniversity of Amsterdam logoUniversity of AmsterdamUniversity of Illinois at Urbana-Champaign logoUniversity of Illinois at Urbana-ChampaignUniversity of Cambridge logoUniversity of CambridgeHeidelberg UniversityUniversity of Waterloo logoUniversity of WaterlooFacebookCarnegie Mellon University logoCarnegie Mellon UniversityUniversity of Southern California logoUniversity of Southern CaliforniaGoogle logoGoogleNew York University logoNew York UniversityUniversity of StuttgartUC Berkeley logoUC BerkeleyNational University of Singapore logoNational University of SingaporeUniversity College London logoUniversity College LondonUniversity of Oxford logoUniversity of OxfordLMU MunichShanghai Jiao Tong University logoShanghai Jiao Tong UniversityUniversity of California, Irvine logoUniversity of California, IrvineTsinghua University logoTsinghua UniversityStanford University logoStanford UniversityUniversity of Michigan logoUniversity of MichiganUniversity of Copenhagen logoUniversity of CopenhagenThe Chinese University of Hong Kong logoThe Chinese University of Hong KongUniversity of MelbourneMeta logoMetaUniversity of EdinburghOpenAI logoOpenAIThe University of Texas at Austin logoThe University of Texas at AustinCornell University logoCornell UniversityUniversity of California, San Diego logoUniversity of California, San DiegoYonsei UniversityMcGill University logoMcGill UniversityBoston University logoBoston UniversityUniversity of BambergNanyang Technological University logoNanyang Technological UniversityMicrosoft logoMicrosoftKU Leuven logoKU LeuvenColumbia University logoColumbia UniversityUC Santa BarbaraAllen Institute for AI logoAllen Institute for AIGerman Research Center for Artificial Intelligence (DFKI)University of Pennsylvania logoUniversity of PennsylvaniaJohns Hopkins University logoJohns Hopkins UniversityArizona State University logoArizona State UniversityUniversity of Maryland logoUniversity of MarylandUniversity of Tokyo logoUniversity of TokyoUniversity of North Carolina at Chapel HillHebrew University of JerusalemAmazonTilburg UniversityUniversity of Massachusetts AmherstUniversity of RochesterUniversity of Duisburg-EssenSapienza University of RomeUniversity of SheffieldPrinceton University logoPrinceton UniversityHKUST logoHKUSTUniversity of TübingenTU BerlinSaarland UniversityTechnical University of DarmstadtUniversity of HaifaUniversity of TrentoUniversity of MontrealBilkent UniversityUniversity of Cape TownBar Ilan UniversityIBMUniversity of MannheimServiceNow logoServiceNowPotsdam UniversityPolish-Japanese Academy of Information TechnologySalesforceASAPPAI21 LabsValencia Polytechnic UniversityUniversity of Trento, Italy
· +13
A large-scale and diverse benchmark, BIG-bench, was introduced to rigorously evaluate the capabilities and limitations of large language models across 204 tasks. The evaluation revealed that even state-of-the-art models currently achieve aggregate scores below 20 (on a 0-100 normalized scale), indicating significantly lower performance compared to human experts.
This research redefines dense latents in Sparse Autoencoders (SAEs) from perceived training artifacts to functional features, demonstrating they reflect intrinsic, frequently activating computations within large language models. The study reveals these latents perform diverse, interpretable roles, including tracking token position, binding contextual information, and regulating output entropy, and persist across different model architectures.
This work introduces MedVLM-R1, a medical Vision-Language Model (VLM) that leverages Group Relative Policy Optimization (GRPO) to incentivize explicit natural language reasoning in radiology tasks. The model achieves an average accuracy of 78.22% across MRI, CT, and X-ray modalities, outperforming larger models and demonstrating robust generalization to out-of-distribution data while generating interpretable reasoning steps.
Researchers investigate cross-generator image forgery detection, demonstrating that a frozen, vision-only DINOv3 foundation model achieves strong generalization by leveraging global, low-frequency structural inconsistencies between real and fake images. The proposed Fisher-Guided Token Selection (FGTS) framework, built on DINOv3, establishes new state-of-the-art accuracy across multiple benchmarks with minimal supervision.
1
Researchers identified two types of neurons in large language models, "entropy neurons" and "token frequency neurons," which regulate next-token prediction uncertainty. Entropy neurons modulate output entropy by leveraging LayerNorm and the unembedding matrix's null space, while token frequency neurons adjust the output distribution relative to empirical token frequencies.
7
Language models (LMs) may memorize personally identifiable information (PII) from training data, enabling adversaries to extract it during inference. Existing defense mechanisms such as differential privacy (DP) reduce this leakage, but incur large drops in utility. Based on a comprehensive study using circuit discovery to identify the computational circuits responsible PII leakage in LMs, we hypothesize that specific PII leakage circuits in LMs should be responsible for this behavior. Therefore, we propose PATCH (Privacy-Aware Targeted Circuit PatcHing), a novel approach that first identifies and subsequently directly edits PII circuits to reduce leakage. PATCH achieves better privacy-utility trade-off than existing defenses, e.g., reducing recall of PII leakage from LMs by up to 65%. Finally, PATCH can be combined with DP to reduce recall of residual leakage of an LM to as low as 0.01%. Our analysis shows that PII leakage circuits persist even after the application of existing defense mechanisms. In contrast, PATCH can effectively mitigate their impact.
Before executing a quantum algorithm, one must first decompose the algorithm into machine-level instructions compatible with the architecture of the quantum computer, a process known as quantum compiling. There are many different quantum circuit decompositions for the same algorithm but it is desirable to compile leaner circuits. A fundamentally important cost metric is the TT count -- the number of TT gates in a circuit. For the single qubit case, optimal compiling is essentially a solved problem. However, multi-qubit compiling is a harder problem with optimal algorithms requiring classical runtime exponential in the number of qubits. Here, we present and compare several efficient quantum compilers for multi-qubit Clifford + TT circuits. We implemented our compilers in C++ and benchmarked them on random circuits, from which we determine that our TODD compiler yields the lowest TT counts on average. We also benchmarked TODD on a library of reversible logic circuits that appear in quantum algorithms and found that it reduced the TT count for 97\% of the circuits with an average TT-count saving of 20\% when compared against the best of all previous circuit decompositions.
MERT introduces a general-purpose, computationally affordable, self-supervised acoustic music understanding model that employs a novel multi-task framework with both acoustic and music-specific teachers. It achieves state-of-the-art performance across 14 diverse Music Information Retrieval tasks while being significantly more efficient than prior large generative models.
413
Autoformalisation, the task of expressing informal mathematical statements in formal language, is often viewed as a direct translation process. This, however, disregards a critical preceding step: conjecturing. Many mathematical problems cannot be formalised directly without first conjecturing a conclusion such as an explicit answer, or a specific bound. Since Large Language Models (LLMs) already struggle with autoformalisation, and the evaluation of their conjecturing ability is limited and often entangled within autoformalisation or proof, it is particularly challenging to understand its effect. To address this gap, we augment existing datasets to create ConjectureBench, and redesign the evaluation framework and metric specifically to measure the conjecturing capabilities of LLMs both as a distinct task and within the autoformalisation pipeline. Our evaluation of foundational models, including GPT-4.1 and DeepSeek-V3.1, reveals that their autoformalisation performance is substantially overestimated when the conjecture is accounted for during evaluation. However, the conjecture should not be assumed to be provided. We design an inference-time method, Lean-FIRe to improve conjecturing and autoformalisation, which, to the best of our knowledge, achieves the first successful end-to-end autoformalisation of 13 PutnamBench problems with GPT-4.1 and 7 with DeepSeek-V3.1. We demonstrate that while LLMs possess the requisite knowledge to generate accurate conjectures, improving autoformalisation performance requires treating conjecturing as an independent task, and investigating further how to correctly integrate it within autoformalisation. Finally, we provide forward-looking guidance to steer future research toward improving conjecturing, an overlooked step of formal mathematical reasoning.
·
This survey provides a comprehensive overview of how large multimodal language models are transforming scientific discovery, experimentation, content generation, and evaluation. It maps current advancements, limitations, and ethical considerations across five stages of the research cycle, identifying specific AI applications and their impact on scientific workflows.
Researchers introduce RAGate, an adaptive mechanism that dynamically determines when to augment conversational system responses with external knowledge, addressing the limitations of constant retrieval-augmented generation. This approach maintains response quality while significantly reducing the generation confidence drop, which was 10.43% for constant augmentation but only 0.36% for RAGate-MHA.
1
A Collision Clustering (CC) decoder is introduced, optimized for hardware implementation to enable real-time, scalable, fast, and resource-efficient quantum error correction. The ASIC implementation decodes a 1057-qubit surface code in 240 ns while consuming 7.85 mW and occupying 0.06 mm², achieving a 0.78% threshold with a circuit-level noise model.
Researchers at the University of Sheffield systematically analyze the core design principles of Transformer attention, identifying which components are essential for effective language modeling. Their work demonstrates that while token mixing is crucial, principles like the mathematical form and QK derivation can be significantly simplified, particularly when combined with standard attention in hybrid architectures, achieving comparable or improved predictive performance.
Laws and their interpretations, legal arguments and agreements\ are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size. Natural language understanding (NLU) technologies can be a valuable tool to support legal practitioners in these endeavors. Their usefulness, however, largely depends on whether current state-of-the-art models can generalize across various tasks in the legal domain. To answer this currently open question, we introduce the Legal General Language Understanding Evaluation (LexGLUE) benchmark, a collection of datasets for evaluating model performance across a diverse set of legal NLU tasks in a standardized way. We also provide an evaluation and analysis of several generic and legal-oriented models demonstrating that the latter consistently offer performance improvements across multiple tasks.
195
Neuro-symbolic NLP methods aim to leverage the complementary strengths of large language models and formal logical solvers. However, current approaches are mostly static in nature, i.e., the integration of a target solver is predetermined at design time, hindering the ability to employ diverse formal inference strategies. To address this, we introduce an adaptive, multi-paradigm, neuro-symbolic inference framework that: (1) automatically identifies formal reasoning strategies from problems expressed in natural language; and (2) dynamically selects and applies specialized formal logical solvers via autoformalization interfaces. Extensive experiments on individual and multi-paradigm reasoning tasks support the following conclusions: LLMs are effective at predicting the necessary formal reasoning strategies with an accuracy above 90 percent. This enables flexible integration with formal logical solvers, resulting in our framework outperforming competing baselines by 27 percent and 6 percent compared to GPT-4o and DeepSeek-V3.1, respectively. Moreover, adaptive reasoning can even positively impact pure LLM methods, yielding gains of 10, 5, and 6 percent on zero-shot, CoT, and symbolic CoT settings with GPT-4o. Finally, although smaller models struggle with adaptive neuro-symbolic reasoning, post-training offers a viable path to improvement. Overall, this work establishes the foundations for adaptive LLM-symbolic reasoning, offering a path forward for unifying material and formal inferences on heterogeneous reasoning challenges.
Large-scale quantum computers have the potential to hold computational capabilities beyond conventional computers for certain problems. However, the physical qubits within a quantum computer are prone to noise and decoherence, which must be corrected in order to perform reliable, fault-tolerant quantum computations. Quantum Error Correction (QEC) provides the path for realizing such computations. QEC continuously generates a continuous stream of data that decoders must process at the rate it is received, which can be as fast as 1 MHz in superconducting quantum computers. A little known fact of QEC is that if the decoder infrastructure cannot keep up, a data backlog problem is encountered and the quantum computer runs exponentially slower. Today's leading approaches to quantum error correction are not scalable as existing decoders typically run slower as the problem size is increased, inevitably hitting the backlog problem. That is: the current leading proposal for fault-tolerant quantum computation is not scalable. Here, we show how to parallelize decoding to achieve almost arbitrary speed, removing this roadblock to scalability. Our parallelization requires some classical feed forward decisions to be delayed, leading to a slow-down of the logical clock speed. However, the slow-down is now only polynomial in code size, averting the exponential slowdown. We numerically demonstrate our parallel decoder for the surface code, showing no noticeable reduction in logical fidelity compared to previous decoders and demonstrating the parallelization speedup.
There are no more papers matching your filters at the moment.