alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Arizona Radio ObservatorySteward Observatory University of Arizona logo

University of Arizona

Community Detection with Heterogeneous Block Covariance Model

04 Dec 2024

George Washington University Colorado State University

Researchers developed the Heterogeneous Block Covariance Model (HBCM) and an associated Variational Expectation-Maximization (VEM) algorithm for community detection among features based on their covariance structure. This model explicitly accounts for heterogeneous feature characteristics and achieves superior accuracy and computational efficiency compared to existing methods across diverse simulation and real-world datasets, including single-cell RNA-seq and stock prices.

#clustering-algorithms #computer-science #machine-learning

Paper thumbnail

Constraining Primordial Non-Gaussianity with Density-Split Clustering

22 Nov 2024

University of Waterloo University of Arizona logo

University of Arizona

Obtaining tight constraints on primordial non-Gaussianity (PNG) is a key step in discriminating between different models for cosmic inflation. The constraining power from large-scale structure (LSS) measurements is expected to overtake that from cosmic microwave background (CMB) anisotropies with the next generation of galaxy surveys including the Dark Energy Spectroscopic Instrument (DESI) and Euclid. We consider whether Density-Split Clustering (DSC) can help improve PNG constraints from these surveys for local, equilateral and orthogonal types. DSC separates a surveyed volume into regions based on local density and measures the clustering statistics within each environment. Using the Quijote simulations and the Fisher information formalism, we compare PNG constraints from the standard halo power spectrum, DSC power spectra and joint halo/DSC power spectra. We find that the joint halo/DSC power spectra outperform the halo power spectrum by factors of

\sim

1.4, 8.8, and 3.6 for local, equilateral and orthogonal PNG, respectively. This is driven by the higher-order information that DSC captures on small scales. We find that applying DSC to a halo field does not allow sample variance cancellation on large scales by providing multiple tracers of the same volume with different local PNG responses. Additionally, we introduce a Fourier space analysis for DSC and study the impact of several modifications to the pipeline, such as varying the smoothing radius and the number of density environments and replacing random query positions with lattice points.

#cosmology-and-nongalactic-astrophysics #physics

Paper thumbnail

A Global Perspective with Updated Constraints on the Ultra-hot Jupiter WASP-19b: Atmospheric Properties and Stellar Activity

04 Dec 2024

California Institute of Technology University College London logo

University College London

We present a detailed reanalysis of the atmospheric properties of WASP-19b, an ultra-hot Jupiter (1.14 M Jup, 1.41 R Jup) orbiting an active Sun-like star every 0.79 day. We reanalyze a transit and secondary eclipse of WASP-19b observed by the Hubble Space Telescope's Wide Field Camera 3 spectrograph (1.1 - 1.7 microns). When combined with Spitzer photometry at longer wavelengths, our analyses indicate the presence of water absorption features in both the planet's transmission and emission spectra, consistent with results from previously published studies. We jointly fit WASP-19b's dayside emission and transmission spectra with a retrieval model in order to constrain its atmospheric composition, and explore the effect of stellar activity on its transmission spectrum in greater depth. We also compare our dayside emission spectrum to predictions from a general circulation model, and conclude that magnetic drag appears to be relatively unimportant in shaping WASP-19b's atmospheric circulation. Lastly, we compare the size of WASP-19b's dayside water absorption feature to the population of hot Jupiters with similar measurements, and show that it is located in the transitional irradiation regime where temperature inversions first begin to emerge. As in previous studies, we find that the current observations provide relatively weak constraints on this planet's atmospheric properties. These constraints could be significantly improved by the addition of spectroscopically resolved observations at longer wavelengths with JWST/NIRSpec PRISM.

#earth-and-planetary-astrophysics #physics

Paper thumbnail

Euclid preparation XLVI. The Near-IR Background Dipole Experiment with Euclid

24 Jun 2024

arthurmloureiro

Arthur Loureiro

California Institute of Technology University of Oslo

Verifying the fully kinematic nature of the cosmic microwave background (CMB) dipole is of fundamental importance in cosmology. In the standard cosmological model with the Friedman-Lemaitre-Robertson-Walker (FLRW) metric from the inflationary expansion the CMB dipole should be entirely kinematic. Any non-kinematic CMB dipole component would thus reflect the preinflationary structure of spacetime probing the extent of the FLRW applicability. Cosmic backgrounds from galaxies after the matter-radiation decoupling, should have kinematic dipole component identical in velocity with the CMB kinematic dipole. Comparing the two can lead to isolating the CMB non-kinematic dipole. It was recently proposed that such measurement can be done using the near-IR cosmic infrared background (CIB) measured with the currently operating Euclid telescope, and later with Roman. The proposed method reconstructs the resolved CIB, the Integrated Galaxy Light (IGL), from Euclid's Wide Survey and probes its dipole, with a kinematic component amplified over that of the CMB by the Compton-Getting effect. The amplification coupled with the extensive galaxy samples forming the IGL would determine the CIB dipole with an overwhelming signal/noise, isolating its direction to sub-degree accuracy. We develop details of the method for Euclid's Wide Survey in 4 bands spanning 0.6 to 2 mic. We isolate the systematic and other uncertainties and present methodologies to minimize them, after confining the sample to the magnitude range with negligible IGL/CIB dipole from galaxy clustering. These include the required star-galaxy separation, accounting for the extinction correction dipole using the method newly developed here achieving total separation, accounting for the Earth's orbital motion and other systematic effects. (Abridged)

#cosmology-and-nongalactic-astrophysics #general-relativity-and-quantum-cosmology #physics

Paper thumbnail

The Denario project: Deep knowledge AI agents for scientific discovery

30 Oct 2025

Google DeepMind University of Cambridge logo

University of Cambridge

We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executing code, making plots, and drafting and reviewing a scientific paper. The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis using Cmbagent as a deep-research backend. In this work, we describe in detail Denario and its modules, and illustrate its capabilities by presenting multiple AI-generated papers generated by it in many different scientific disciplines such as astrophysics, biology, biophysics, biomedical informatics, chemistry, material science, mathematical physics, medicine, neuroscience and planetary science. Denario also excels at combining ideas from different disciplines, and we illustrate this by showing a paper that applies methods from quantum physics and machine learning to astrophysical data. We report the evaluations performed on these papers by domain experts, who provided both numerical scores and review-like feedback. We then highlight the strengths, weaknesses, and limitations of the current system. Finally, we discuss the ethical implications of AI-driven research and reflect on how such technology relates to the philosophy of science. We publicly release the code at this https URL. A Denario demo can also be run directly on the web at this https URL, and the full app will be deployed on the cloud.

#agent-based-systems #agentic-frameworks #cloud-computing

Paper thumbnail

Online Rubrics Elicitation from Pairwise Comparisons

09 Oct 2025

University of Arizona Scale AI logo

Researchers at Scale AI introduce OnlineRubrics, a framework that dynamically updates evaluation criteria during Large Language Model (LLM) training through online elicitation from pairwise comparisons. This approach leads to more robust and higher-performing LLMs by mitigating reward hacking and adapting to emergent behaviors and desired qualities.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

ScienceWorld: Is your Agent Smarter than a 5th Grader?

14 Nov 2022

Allen Institute for AI

Researchers from the University of Arizona, Microsoft Research Montréal, and the Allen Institute for AI introduce SCIENCEWORLD, a new interactive text environment designed to test AI agents' grounded scientific reasoning abilities. State-of-the-art models achieved low scores on these elementary science tasks, suggesting that their success on static question-answering benchmarks does not translate to robust procedural understanding in dynamic environments.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation

29 Jun 2025

haris-riaz

Haris Riaz

University of Arizona AWS AI Labs

Researchers at AWS AI Labs developed METASYNTH, a meta-prompting-driven agentic framework for generating diverse synthetic data. The method enables significant domain adaptation for large language models, such as improving Mistral-7B-v0.3's performance by up to 13.75% in the biomedicine domain using only 25 million tokens of synthetic data, while maintaining general capabilities and demonstrating superior data diversity compared to traditional template-based approaches.

#agents #computer-science #artificial-intelligence

Paper thumbnail

A Survey of Small Language Models

25 Oct 2024

xuan-shen-shawn

Xuan Shen (Shawn)

wang-yu396

Wang Yu

Northeastern University Carnegie Mellon University logo

Carnegie Mellon University

A survey by researchers from the University of Oregon, Carnegie Mellon University, Adobe Research, and Meta AI provides the first dedicated examination of Small Language Models (SLMs). The work introduces a structured taxonomy and outlines key techniques, applications, and challenges in balancing model performance with practical deployment considerations.

#computer-science #computation-and-language #edge-computing

Paper thumbnail

Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement

31 May 2025

liangming-pan

Liangming Pan

University of California, Santa Barbara Peking University logo

Peking University

G¨odel Agent is a self-referential agent framework designed to achieve recursive self-improvement by enabling an agent to dynamically modify its own policy and meta-learning algorithm at runtime. This framework outperforms existing agent paradigms across various tasks and demonstrates efficient self-optimization and adaptability by autonomously generating task-specific optimizations.

#computer-science #continual-learning #artificial-intelligence

Paper thumbnail

How do Transformers Learn Implicit Reasoning?

06 Nov 2025

liangming-pan

Liangming Pan

Tsinghua University University of Arizona logo

University of Arizona

Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly -- producing correct answers without explicitly verbalizing intermediate steps -- but the underlying mechanisms remain poorly understood. In this paper, we study how such implicit reasoning emerges by training transformers from scratch in a controlled symbolic environment. Our analysis reveals a three-stage developmental trajectory: early memorization, followed by in-distribution generalization, and eventually cross-distribution generalization. We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures. To interpret these behaviors, we introduce two diagnostic tools: cross-query semantic patching, which identifies semantically reusable intermediate representations, and a cosine-based representational lens, which reveals that successful reasoning correlates with the cosine-base clustering in hidden space. This clustering phenomenon in turn provides a coherent explanation for the behavioral dynamics observed across training, linking representational structure to reasoning capability. These findings provide new insights into the interpretability of implicit multi-hop reasoning in LLMs, helping to clarify how complex reasoning processes unfold internally and offering pathways to enhance the transparency of such models.

#agents #chain-of-thought #computer-science

Paper thumbnail

Red-Teaming LLM Multi-Agent Systems via Communication Attacks

02 Jun 2025

Michigan State University University of Arizona logo

University of Arizona

Researchers from Michigan State University and the University of Arizona demonstrate a novel "Agent-in-the-Middle" (AiTM) attack that exploits communication channels in Large Language Model-based Multi-Agent Systems (LLM-MAS). The attack, which intercepts and manipulates inter-agent messages, achieves high success rates (frequently above 70%) in inducing malicious behaviors or denial-of-service across various LLM-MAS frameworks and real-world applications.

#computer-science #cryptography-and-security

Paper thumbnail

Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges

25 Jul 2025

Harvard University Carnegie Mellon University logo

Carnegie Mellon University

This extensive survey provides a structured overview of alignment and safety in Large Language Models (LLMs), analyzing training paradigms, safety mechanisms, and emerging challenges. It synthesizes current research, identifies industry practices, and outlines open problems to ensure LLMs align with human values and intentions.

#adversarial-robustness #agents #computer-science

Paper thumbnail

DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents

07 Oct 2024

Allen Institute for Artificial Intelligence University of Arizona logo

University of Arizona

Automated scientific discovery promises to accelerate progress across scientific domains. However, developing and evaluating an AI agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery. DISCOVERYWORLD contains a variety of different challenges, covering topics as diverse as radioisotope dating, rocket science, and proteomics, to encourage development of general discovery skills rather than task-specific solutions. DISCOVERYWORLD itself is an inexpensive, simulated, text-based environment (with optional 2D visual overlay). It includes 120 different challenge tasks, spanning eight topics each with three levels of difficulty and several parametric variations. Each task requires an agent to form hypotheses, design and run experiments, analyze results, and act on conclusions. DISCOVERYWORLD further provides three automatic metrics for evaluating performance, based on (a) task completion, (b) task-relevant actions taken, and (c) the discovered explanatory knowledge. We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks, suggesting that DISCOVERYWORLD captures some of the novel challenges of discovery, and thus that DISCOVERYWORLD may help accelerate near-term development and assessment of scientific discovery competency in agents. Code available at: this http URL

#agent-based-systems #computer-science #artificial-intelligence

Paper thumbnail

A direct black hole mass measurement in a Little Red Dot at the Epoch of Reionization

01 Sep 2025

California Institute of Technology

Researchers performed the first direct, dynamical measurement of a black hole mass in a 'Little Red Dot' (Abell2744-QSO1) at z=7.04 using JWST NIRSpec IFS data, confirming the validity of single-epoch virial mass estimates for these early universe objects and revealing a black hole significantly overmassive relative to its host galaxy's stellar mass.

#astrophysics-of-galaxies #physics

Paper thumbnail

DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models

16 May 2025

University of Notre Dame Arizona State University logo

Arizona State University

Researchers from Clemson, Arizona State, Washington University in St. Louis, Notre Dame, and Arizona developed Diversity-aware Reward Adjustment (DRA) to explicitly integrate semantic diversity into reward computation for R1-Zero-like training of large language models. This approach achieved a state-of-the-art average accuracy of 58.2% on mathematical reasoning benchmarks using a 1.5B parameter model with minimal fine-tuning data (7,000 samples) and low training costs.

#computer-science #computation-and-language #optimization-methods

Paper thumbnail

A first look at quasar-galaxy clustering at

z\simeq7.3

09 Oct 2025

Heidelberg University

University of California, Santa Barbara

We present JWST observations of the environments surrounding two high-redshift quasars -- J0252

-

0503 at

z = 7.0

and J1007

+

2115 at

z = 7.5

-- which enable the first constraints on quasar-galaxy clustering at

z \sim 7.3

. Galaxies in the vicinity of the quasars are selected through ground-based and JWST/NIRCam imaging and then spectroscopically confirmed with JWST/NIRSpec using the multi-shutter assembly (MSA). Over both fields, we identify 51

z>5

galaxies, of which eight are found within a

\Delta v_{\textrm{LOS}}=\pm1500 \rm{km} \rm{s}^{-1}

line-of-sight velocity window from the quasars and another eight in the background. The galaxy J0252\_8713, located just

7\,\rm{pkpc}

and

\Delta v_{\textrm{LOS}} \approx 360\,\rm{km}\,\rm{s}^{-1}

from quasar J0252

-

0503, emerges as a compelling candidate for one of the most distant quasar-galaxy mergers. Combining the galaxy discoveries over the two fields, we measure the quasar-galaxy cross-correlation and obtain a correlation length of

r_0^{\rm{QG}}\approx7.6_{-1.6}^{+1.7}\,h^{-1}\,\rm{cMpc}

, based on a power-law model with a fixed slope of

\gamma_{\rm{QG}} = 2.0

. Under the assumption that quasars and galaxies trace the same underlying dark matter density fluctuations, we infer a minimum dark matter halo mass for

z\simeq7.3

quasars of

\log_{10}(M_{\textrm{halo, min}}/\textrm{M}_{\odot})= 11.6\pm0.6

in a halo model framework. Compared to measurements from EIGER at

\langle z \rangle = 6.25

and ASPIRE at

\langle z \rangle = 6.7

(where

\log_{10}(M_{\textrm{halo, min}}/\textrm{M}_{\odot}) \gtrsim 12.3

), our clustering results provide tentative evidence for a non-monotonic redshift evolution of quasar clustering properties. We further estimate a quasar duty cycle of

f_{\rm{duty}}\approx0.1\%

, consistent with constraints from quasar proximity zones and IGM damping wings. (abridged)

#astrophysics-of-galaxies #physics

Paper thumbnail

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

05 Oct 2024

becca

Becca

Shanghai Artificial Intelligence Laboratory SenseTime Research

A scalable data selection method for pretraining large language models balances data quality and diversity, improving zero-shot accuracy by 1.39% across downstream tasks while maintaining computational efficiency. The approach integrates a Multi-Arm Bandit framework for cluster-based sampling with an enhanced, accelerated influence function that accounts for Transformer attention layers.

#active-learning #attention-mechanisms #computer-science

Paper thumbnail

MuSLR: Multimodal Symbolic Logical Reasoning

30 Sep 2025

University of California, Santa Barbara

National University of Singapore

This research introduces Multimodal Symbolic Logical Reasoning (MuSLR), a new task and benchmark, MuSLR-Bench, that requires vision-language models (VLMs) to perform formal logical deduction by integrating information from both visual and textual inputs. The proposed LogiCAM framework, developed by the National University of Singapore and collaborators, achieved a 14.13% average accuracy improvement on GPT-4.1, demonstrating enhanced capabilities in applying formal logic to complex multimodal scenarios.

#chain-of-thought #computer-science #computer-vision-and-pattern-recognition

Paper thumbnail

HARPA: A Testability-Driven, Literature-Grounded Framework for Research Ideation

01 Oct 2025

University of Zurich Allen Institute for AI logo

Allen Institute for AI

While there has been a surge of interest in automated scientific discovery (ASD), especially with the emergence of LLMs, it remains challenging for tools to generate hypotheses that are both testable and grounded in the scientific literature. Additionally, existing ideation tools are not adaptive to prior experimental outcomes. We developed HARPA to address these challenges by incorporating the ideation workflow inspired by human researchers. HARPA first identifies emerging research trends through literature mining, then explores hypothesis design spaces, and finally converges on precise, testable hypotheses by pinpointing research gaps and justifying design choices. Our evaluations show that HARPA-generated hypothesis-driven research proposals perform comparably to a strong baseline AI-researcher across most qualitative dimensions (e.g., specificity, novelty, overall quality), but achieve significant gains in feasibility(+0.78, p

&lt;0.05

, bootstrap) and groundedness (+0.85, p

&lt;0.01

, bootstrap) on a 10-point Likert scale. When tested with the ASD agent (CodeScientist), HARPA produced more successful executions (20 vs. 11 out of 40) and fewer failures (16 vs. 21 out of 40), showing that expert feasibility judgments track with actual execution success. Furthermore, to simulate how researchers continuously refine their understanding of what hypotheses are both testable and potentially interesting from experience, HARPA learns a reward model that scores new hypotheses based on prior experimental outcomes, achieving approx. a 28\% absolute gain over HARPA's untrained baseline scorer. Together, these methods represent a step forward in the field of AI-driven scientific discovery.

#agentic-frameworks #agents #computer-science

Paper thumbnail

There are no more papers matching your filters at the moment.