alphaXiv

History

Papers Benchmarks

Computer Vision CenterUniversitat Autònoma de Barcelona

1,263

09 Oct 2025

astrophysics-of-galaxies physics

A Comprehensive Characterization of Galaxy-cool CGM Connections at $z<0.4$ with DESI Year 1 Data

Academia Sinica National Astronomical Observatory of Japan

UC Berkeley

University College London National Taiwan University

University of Michigan

Boston University Kavli Institute for the Physics and Mathematics of the Universe The University of Texas at Dallas

Lawrence Berkeley National Laboratory

Sorbonne Université Fermi National Accelerator Laboratory Universitat Politècnica de Catalunya University of Portsmouth

The Ohio State University Sejong University Universidad Nacional Autónoma de México Universitat Autònoma de Barcelona

University of California, Santa Cruz NSF NOIRLab Universidad de Los Andes University of Wyoming CIEMAT Institut de Física d’Altes Energies (IFAE)Institució Catalana de Recerca i Estudis Avançats Siena College Instituto Astrofisica de Canarias Institute of Space Sciences (ICE–CSIC)Universit degli Studi di Milano INAF Osservatorio Astronomico di Brera

This study comprehensively characterized the cool circumgalactic medium (CGM) around galaxies at redshifts below 0.4 using data from the Dark Energy Spectroscopic Instrument (DESI) Year 1 survey. It reveals persistent correlations between cool gas absorption and galaxy properties like stellar mass and star formation rate, along with an unexpected absence of azimuthal anisotropy, indicating a possible evolution in CGM dynamics at lower redshifts.

508

30 Oct 2025

agent-based-systems agentic-frameworks cloud-computing

The Denario project: Deep knowledge AI agents for scientific discovery

Google DeepMind

University of Cambridge

Harvard University

Tel Aviv University

University of Oxford LMU Munich

the University of Tokyo

The University of Texas at Austin

Cornell University Harvard Medical School

Johns Hopkins University

University of Arizona

MIT

Princeton University ICREA Universitat de Barcelona

Flatiron Institute

University of Virginia The University of Chicago SISSA — International School for Advanced Studies Universitat Autònoma de Barcelona Donostia International Physics Center University of the Basque Country Computer Vision Center ICSC - Centro Nazionale di Ricerca in High Performance Computing, Big Data e Quantum Computing Kavli Institute for Cosmology Steward Observatory Institut de Ciències del Cosmos Infosys Ltd.Big Data Institute INFN National Institute for Nuclear Physics Boston Childreneach Hospital Ragon Institute of Mass General MCML - Munich Center for Machine Learning IFPU Institute for fundamental physics of the Universe INAF ` Osservatorio Astronomico di Trieste

We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executing code, making plots, and drafting and reviewing a scientific paper. The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis using Cmbagent as a deep-research backend. In this work, we describe in detail Denario and its modules, and illustrate its capabilities by presenting multiple AI-generated papers generated by it in many different scientific disciplines such as astrophysics, biology, biophysics, biomedical informatics, chemistry, material science, mathematical physics, medicine, neuroscience and planetary science. Denario also excels at combining ideas from different disciplines, and we illustrate this by showing a paper that applies methods from quantum physics and machine learning to astrophysical data. We report the evaluations performed on these papers by domain experts, who provided both numerical scores and review-like feedback. We then highlight the strengths, weaknesses, and limitations of the current system. Finally, we discuss the ethical implications of AI-driven research and reflect on how such technology relates to the philosophy of science. We publicly release the code at this https URL. A Denario demo can also be run directly on the web at this https URL, and the full app will be deployed on the cloud.

196

20 Oct 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Accurate and Efficient Low-Rank Model Merging in Core Space

University of Florence Warsaw University of Technology IDEAS NCBR University of Modena and Reggio Emilia Universitat Autònoma de Barcelona IDEAS Research Institute

The Core Space Merging framework is introduced for efficiently combining low-rank adaptations (LoRAs) of large neural networks. This method achieves up to 600x faster merging by performing operations in a compact, lossless subspace, while reaching state-of-the-art accuracy, including 94.16% normalized accuracy on Llama 3 8B for NLI tasks and 76.3% on ViT-B/32 for vision tasks.

291

06 Aug 2025

computer-science continual-learning computer-vision-and-pattern-recognition

Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting

Nankai University

Peking University Universitat Autònoma de Barcelona

yuyang liu

Vision-language models (VLMs) have achieved impressive performance across diverse multimodal tasks by leveraging large-scale pre-training. However, enabling them to learn continually from non-stationary data remains a major challenge, as their cross-modal alignment and generalization capabilities are particularly vulnerable to catastrophic forgetting. Unlike traditional unimodal continual learning (CL), VLMs face unique challenges such as cross-modal feature drift, parameter interference due to shared architectures, and zero-shot capability erosion. This survey offers the first focused and systematic review of continual learning for VLMs (VLM-CL). We begin by identifying the three core failure modes that degrade performance in VLM-CL. Based on these, we propose a challenge-driven taxonomy that maps solutions to their target problems: (1) \textit{Multi-Modal Replay Strategies} address cross-modal drift through explicit or implicit memory mechanisms; (2) \textit{Cross-Modal Regularization} preserves modality alignment during updates; and (3) \textit{Parameter-Efficient Adaptation} mitigates parameter interference with modular or low-rank updates. We further analyze current evaluation protocols, datasets, and metrics, highlighting the need for better benchmarks that capture VLM-specific forgetting and compositional generalization. Finally, we outline open problems and future directions, including continual pre-training and compositional zero-shot learning. This survey aims to serve as a comprehensive and diagnostic reference for researchers developing lifelong vision-language systems. All resources are available at: this https URL.

420

11 Jun 2025

computer-science machine-learning model-merging

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

University of Florence Warsaw University of Technology Universitat Aut`onoma de Barcelona IDEAS NCBR Gdańsk University of Technology Computer Vision Center

Researchers from Warsaw University of Technology and other European institutions developed Isotropic Model Merging, a framework that integrates specialized model weights into a single multi-task model by focusing on subspace alignment and isotropic singular value spectra. The approach achieves state-of-the-art performance across diverse vision and language tasks while maintaining efficiency.

632

05 Jan 2021

computer-science computer-vision-and-pattern-recognition information-retrieval

DocVQA: A Dataset for VQA on Document Images

Amazon IIIT Hyderabad Computer Vision Center

Researchers from IIIT Hyderabad and CVC UAB created DocVQA, a large-scale dataset featuring 50,000 question-answer pairs over 12,767 real document images to foster holistic document understanding. Baseline evaluations show that BERT-based models achieve 55.77% accuracy, while human performance reaches 94.36%, highlighting the remaining challenges in integrating visual and textual reasoning for complex document layouts.

221

11 Jul 2025

agentic-frameworks agents instrumentation-and-methods-for-astrophysics

Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery

Google DeepMind

California Institute of Technology

University of Cambridge LMU Munich

University of California, San Diego

Columbia University

Princeton University Haverford College

Flatiron Institute University of Sussex Universitat Autònoma de Barcelona Télécom SudParis Computer Vision Center Kavli Institute for Cosmology Indian Institute of Science Education and Research (IISER)Infosys Ltd.RWTH Aachen University

We present a multi-agent system for automation of scientific research tasks, cmbagent (this https URL). The system is formed by about 30 Large Language Model (LLM) agents and implements a Planning & Control strategy to orchestrate the agentic workflow, with no human-in-the-loop at any point. Each agent specializes in a different task (performing retrieval on scientific papers and codebases, writing code, interpreting results, critiquing the output of other agents) and the system is able to execute code locally. We successfully apply cmbagent to carry out a PhD level cosmology task (the measurement of cosmological parameters using supernova data) and evaluate its performance on two benchmark sets, finding superior performance over state-of-the-art LLMs. The source code is available on GitHub, demonstration videos are also available, and the system is deployed on HuggingFace and will be available on the cloud.

161

29 Jul 2024

computer-science continual-learning artificial-intelligence

MagMax: Leveraging Model Merging for Seamless Continual Learning

Warsaw University of Technology IDEAS NCBR Gdańsk University of Technology Computer Vision Center Autonomous University of Barcelona Tooploox

Daniel Marczak

MagMax introduces a model merging strategy for continual learning by combining sequential fine-tuning with maximum magnitude weight selection to mitigate catastrophic forgetting in large pre-trained models. The method consistently outperforms existing continual learning approaches and other merging strategies, achieving an average 2.1% improvement in class-incremental learning, while also demonstrating that sequential fine-tuning universally enhances model merging performance.

17 Oct 2025

attention-mechanisms computer-science computer-vision-and-pattern-recognition

Scope: Selective Cross-modal Orchestration of Visual Perception Experts

University of Waterloo

Université de Montréal

McGill University

University of British Columbia York University École de Technologie Supérieure Universitat Autònoma de Barcelona

ServiceNow

Vision-language models (VLMs) benefit from multiple vision encoders, but naively stacking them yields diminishing returns while multiplying inference costs. We propose SCOPE, a Mixture-of-Encoders (MoEnc) framework that dynamically selects one specialized encoder per image-text pair via instance-level routing, unlike token-level routing in traditional MoE. SCOPE maintains a shared encoder and a pool of routed encoders. A lightweight router uses cross-attention between text prompts and shared visual features to select the optimal encoder from the routed encoders. To train this router, we introduce dual entropy regularization with auxiliary losses to balance dataset-level load distribution with instance-level routing confidence. Remarkably, SCOPE with one shared plus one routed encoder outperforms models using all four extra encoders simultaneously, while reducing compute by 24-49\%. This demonstrates that intelligent encoder selection beats brute-force aggregation, challenging the prevailing paradigm in multi-encoder VLMs.

11 Nov 2025

cosmology-and-nongalactic-astrophysics astrophysics-of-galaxies physics

Backlighting extended gas halos around luminous red galaxies: kinematic Sunyaev-Zel'dovich effect from DESI Y1 x ACT

The gas density profile around galaxies, shaped by feedback and affecting the galaxy lensing signal, is imprinted on the cosmic microwave background (CMB) by the kinematic Sunyaev-Zel'dovich effect (kSZ). We precisely measure this effect (

S/N\approx 10

) via velocity stacking with more than 800,000 spectroscopically confirmed luminous red galaxies (LRG) from the Dark Energy Spectroscopic Instrument (DESI) Y1 survey, which overlap with the Atacama Cosmology Telescope (ACT) Data Release 6 temperature maps over

\geq

4,000 deg

^2

. We explore the kSZ dependence with various galaxy parameters and find no significant trend with redshift, but clear trends with stellar mass and absolute magnitude in

g

r

, and

z

bands. Our analysis suggests that the gas extends beyond the dark matter halo (99.5\% confidence, i.e. PTE = 0.005). We find a tentative preference for hydrodynamical simulation models with stronger feedback that drives gas further out (Illustris

z=0.5

, PTE = 0.37) over weaker-feedback cases (IllustrisTNG

z=0.8

, PTE = 0.045), though with limited statistical significance. In all cases, a free multiplicative amplitude was fit to the simulated profiles, and further modeling work is required to firm up these conclusions. We find consistency between kSZ profiles around spectroscopic and photometric LRG, with comparable statistical power, thus increasing our confidence in the photometric analysis. Additionally, we present the first kSZ measurement around DESI Y1 bright galaxy sample (BGS) and emission-line galaxies (ELG), whose features match qualitative expectations. Finally, we forecast

S/N \sim 50

for future stacked kSZ measurements using data from ACT, DESI Y3, and Rubin Observatory. These measurements will serve as an input for galaxy formation models and baryonic uncertainties in galaxy lensing.

196

11 Nov 2024

attention-mechanisms computer-science artificial-intelligence

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

Chinese Academy of Sciences Nankai University Mohamed bin Zayed University of AI Universitat Autònoma de Barcelona Linkoping University

This research introduces Token Merging (ToMe), a training-free inference-time method that enhances semantic binding in text-to-image synthesis by intelligently manipulating text embeddings. The approach achieves a GPT-4o Object Binding score of 0.9549, demonstrating improved accuracy in associating attributes with objects and binding sub-objects to main objects compared to prior methods.

18 Jul 2024

computer-science computer-vision-and-pattern-recognition

MVSBoost: An Efficient Point Cloud-based 3D Reconstruction

Universitat de Barcelona Computer Vision Center Institut de Neurosciències

Efficient and accurate 3D reconstruction is crucial for various applications, including augmented and virtual reality, medical imaging, and cinematic special effects. While traditional Multi-View Stereo (MVS) systems have been fundamental in these applications, using neural implicit fields in implicit 3D scene modeling has introduced new possibilities for handling complex topologies and continuous surfaces. However, neural implicit fields often suffer from computational inefficiencies, overfitting, and heavy reliance on data quality, limiting their practical use. This paper presents an enhanced MVS framework that integrates multi-view 360-degree imagery with robust camera pose estimation via Structure from Motion (SfM) and advanced image processing for point cloud densification, mesh reconstruction, and texturing. Our approach significantly improves upon traditional MVS methods, offering superior accuracy and precision as validated using Chamfer distance metrics on the Realistic Synthetic 360 dataset. The developed MVS technique enhances the detail and clarity of 3D reconstructions and demonstrates superior computational efficiency and robustness in complex scene reconstruction, effectively handling occlusions and varying viewpoints. These improvements suggest that our MVS framework can compete with and potentially exceed current state-of-the-art neural implicit field methods, especially in scenarios requiring real-time processing and scalability.

4,410

12 Oct 2025

computer-science computer-vision-and-pattern-recognition

Free-Lunch Color-Texture Disentanglement for Stylized Image Generation

Harbin Institute of Technology Nankai University Universitat Aut`onoma de Barcelona Universitat de Valencia Computer Vision Center SoftBank

Shiqi Yang

Recent advances in Text-to-Image (T2I) diffusion models have transformed image generation, enabling significant progress in stylized generation using only a few style reference images. However, current diffusion-based methods struggle with fine-grained style customization due to challenges in controlling multiple style attributes, such as color and texture. This paper introduces the first tuning-free approach to achieve free-lunch color-texture disentanglement in stylized T2I generation, addressing the need for independently controlled style elements for the Disentangled Stylized Image Generation (DisIG) problem. Our approach leverages the Image-Prompt Additivity property in the CLIP image embedding space to develop techniques for separating and extracting Color-Texture Embeddings (CTE) from individual color and texture reference images. To ensure that the color palette of the generated image aligns closely with the color reference, we apply a whitening and coloring transformation to enhance color consistency. Additionally, to prevent texture loss due to the signal-leak bias inherent in diffusion training, we introduce a noise term that preserves textural fidelity during the Regularized Whitening and Coloring Transformation (RegWCT). Through these methods, our Style Attributes Disentanglement approach (SADis) delivers a more precise and customizable solution for stylized image generation. Experiments on images from the WikiArt and StyleDrop datasets demonstrate that, both qualitatively and quantitatively, SADis surpasses state-of-the-art stylization methods in the DisIG this http URL is released at this https URL.

125

13 Dec 2024

computer-science computer-vision-and-pattern-recognition generative-models

The Art of Deception: Color Visual Illusions and Diffusion Models

Universitat Aut`onoma de Barcelona IDEAS NCBR Universitat de Valencia Computer Vision Center

Visual illusions in humans arise when interpreting out-of-distribution stimuli: if the observer is adapted to certain statistics, perception of outliers deviates from reality. Recent studies have shown that artificial neural networks (ANNs) can also be deceived by visual illusions. This revelation raises profound questions about the nature of visual information. Why are two independent systems, both human brains and ANNs, susceptible to the same illusions? Should any ANN be capable of perceiving visual illusions? Are these perceptions a feature or a flaw? In this work, we study how visual illusions are encoded in diffusion models. Remarkably, we show that they present human-like brightness/color shifts in their latent space. We use this fact to demonstrate that diffusion models can predict visual illusions. Furthermore, we also show how to generate new unseen visual illusions in realistic images using text-to-image diffusion models. We validate this ability through psychophysical experiments that show how our model-generated illusions also fool humans.

18 Sep 2025

computer-science computation-and-language human-ai-interaction

Large Language Model probabilities cannot distinguish between possible and impossible language

Humboldt-Universität zu Berlin Universitat Autònoma de Barcelona Institució Catalana de Recerca i Estudis Avançats (ICREA)Ecovadis

A controversial test for Large Language Models concerns the ability to discern possible from impossible language. While some evidence attests to the models' sensitivity to what crosses the limits of grammatically impossible language, this evidence has been contested on the grounds of the soundness of the testing material. We use model-internal representations to tap directly into the way Large Language Models represent the 'grammatical-ungrammatical' distinction. In a novel benchmark, we elicit probabilities from 4 models and compute minimal-pair surprisal differences, juxtaposing probabilities assigned to grammatical sentences to probabilities assigned to (i) lower frequency grammatical sentences, (ii) ungrammatical sentences, (iii) semantically odd sentences, and (iv) pragmatically odd sentences. The prediction is that if string-probabilities can function as proxies for the limits of grammar, the ungrammatical condition will stand out among the conditions that involve linguistic violations, showing a spike in the surprisal rates. Our results do not reveal a unique surprisal signature for ungrammatical prompts, as the semantically and pragmatically odd conditions consistently show higher surprisal. We thus demonstrate that probabilities do not constitute reliable proxies for model-internal representations of syntactic knowledge. Consequently, claims about models being able to distinguish possible from impossible language need verification through a different methodology.

28 Aug 2025

computer-science computer-vision-and-pattern-recognition generative-models

Enhancing Document VQA Models via Retrieval-Augmented Generation

Universitat Autònoma de Barcelona Computer Vision Center

Document Visual Question Answering (Document VQA) must cope with documents that span dozens of pages, yet leading systems still concatenate every page or rely on very large vision-language models, both of which are memory-hungry. Retrieval-Augmented Generation (RAG) offers an attractive alternative, first retrieving a concise set of relevant segments before generating answers from this selected evidence. In this paper, we systematically evaluate the impact of incorporating RAG into Document VQA through different retrieval variants - text-based retrieval using OCR tokens and purely visual retrieval without OCR - across multiple models and benchmarks. Evaluated on the multi-page datasets MP-DocVQA, DUDE, and InfographicVQA, the text-centric variant improves the "concatenate-all-pages" baseline by up to +22.5 ANLS, while the visual variant achieves +5.0 ANLS improvement without requiring any text extraction. An ablation confirms that retrieval and reranking components drive most of the gain, whereas the layout-guided chunking strategy - proposed in several recent works to leverage page structure - fails to help on these datasets. Our experiments demonstrate that careful evidence selection consistently boosts accuracy across multiple model sizes and multi-page benchmarks, underscoring its practical value for real-world Document VQA.

23 Oct 2025

computer-science computer-vision-and-pattern-recognition few-shot-learning

EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

City University of Hong Kong Nankai University Harbin Institute of Technology (Shenzhen)Universitat Autònoma de Barcelona Computer Vision Center City University of Hong Kong (Dongguan)

Recent advances in accelerating text-to-image (T2I) diffusion models have enabled the synthesis of high-fidelity images even in a single step. However, personalizing these models to incorporate novel concepts remains a challenge due to the limited capacity of one-step models to capture new concept distributions effectively. We propose a bidirectional concept distillation framework, EchoDistill, to enable one-step diffusion personalization (1-SDP). Our approach involves an end-to-end training process where a multi-step diffusion model (teacher) and a one-step diffusion model (student) are trained simultaneously. The concept is first distilled from the teacher model to the student, and then echoed back from the student to the teacher. During the EchoDistill, we share the text encoder between the two models to ensure consistent semantic understanding. Following this, the student model is optimized with adversarial losses to align with the real image distribution and with alignment losses to maintain consistency with the teacher's output. Furthermore, we introduce the bidirectional echoing refinement strategy, wherein the student model leverages its faster generation capability to feedback to the teacher model. This bidirectional concept distillation mechanism not only enhances the student ability to personalize novel concepts but also improves the generative quality of the teacher model. Our experiments demonstrate that this collaborative framework significantly outperforms existing personalization methods over the 1-SDP setup, establishing a novel paradigm for rapid and effective personalization in T2I diffusion models.

23 Oct 2025

computer-science computer-vision-and-pattern-recognition generative-models

GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models

Computer Vision Center

Recent years have seen impressive advances in text-to-image generation, with image generative or unified models producing high-quality images from text. Yet these models still struggle with fine-grained color controllability, often failing to accurately match colors specified in text prompts. While existing benchmarks evaluate compositional reasoning and prompt adherence, none systematically assess color precision. Color is fundamental to human visual perception and communication, critical for applications from art to design workflows requiring brand consistency. However, current benchmarks either neglect color or rely on coarse assessments, missing key capabilities such as interpreting RGB values or aligning with human expectations. To this end, we propose GenColorBench, the first comprehensive benchmark for text-to-image color generation, grounded in color systems like ISCC-NBS and CSS3/X11, including numerical colors which are absent elsewhere. With 44K color-focused prompts covering 400+ colors, it reveals models' true capabilities via perceptual and automated assessments. Evaluations of popular text-to-image models using GenColorBench show performance variations, highlighting which color conventions models understand best and identifying failure modes. Our GenColorBench assessments will guide improvements in precise color generation. The benchmark will be made public upon acceptance.

139

30 Jul 2025

computer-science computer-vision-and-pattern-recognition

ComicsPAP: understanding comic strips by picking the correct panel

University of Florence Universitat Autònoma de Barcelona

Researchers from the Computer Vision Center at the Autonomous University of Barcelona and Media Integration and Communication Center at the University of Florence developed ComicsPAP, a large-scale benchmark designed to evaluate Large Multimodal Models (LMMs) on complex comic strip comprehension tasks. The benchmark reveals that LMMs perform near random chance in zero-shot settings but achieve substantial accuracy improvements through targeted fine-tuning, with a smaller 7B parameter model surpassing a 72B parameter model's zero-shot performance.

15 Sep 2025

computer-science machine-learning mathematics

The Morgan-Pitman Test of Equality of Variances and its Application to Machine Learning Model Evaluation and Selection

Universitat Politècnica de Catalunya Universitat Autònoma de Barcelona Centro de Matemática Centro de Matemática, Facultad de Ciencias

Researchers developed an enhanced Morgan-Pitman test to compare the equality of prediction error variances for machine learning models, offering a statistically sound criterion for model evaluation and selection. This robust method, which handles non-normal errors and dependent residuals, helps identify simpler, more generalizable models that exhibit equivalent predictive stability.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

A Comprehensive Characterization of Galaxy-cool CGM Connections at $z<0.4$ with DESI Year 1 Data

The Denario project: Deep knowledge AI agents for scientific discovery

Accurate and Efficient Low-Rank Model Merging in Core Space

Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

DocVQA: A Dataset for VQA on Document Images

Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery

MagMax: Leveraging Model Merging for Seamless Continual Learning

Scope: Selective Cross-modal Orchestration of Visual Perception Experts

Backlighting extended gas halos around luminous red galaxies: kinematic Sunyaev-Zel'dovich effect from DESI Y1 x ACT

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

MVSBoost: An Efficient Point Cloud-based 3D Reconstruction

Free-Lunch Color-Texture Disentanglement for Stylized Image Generation

The Art of Deception: Color Visual Illusions and Diffusion Models

Large Language Model probabilities cannot distinguish between possible and impossible language

Enhancing Document VQA Models via Retrieval-Augmented Generation

EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models

ComicsPAP: understanding comic strips by picking the correct panel

The Morgan-Pitman Test of Equality of Variances and its Application to Machine Learning Model Evaluation and Selection

Events

AI for Law

Personalize Your Feed

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

A Comprehensive Characterization of Galaxy-cool CGM Connections at z&lt;0.4 with DESI Year 1 Data

The Denario project: Deep knowledge AI agents for scientific discovery

Accurate and Efficient Low-Rank Model Merging in Core Space

Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting

No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

DocVQA: A Dataset for VQA on Document Images

Open Source Planning & Control System with Language Agents for Autonomous Scientific Discovery

MagMax: Leveraging Model Merging for Seamless Continual Learning

Scope: Selective Cross-modal Orchestration of Visual Perception Experts

Backlighting extended gas halos around luminous red galaxies: kinematic Sunyaev-Zel'dovich effect from DESI Y1 x ACT

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

MVSBoost: An Efficient Point Cloud-based 3D Reconstruction

Free-Lunch Color-Texture Disentanglement for Stylized Image Generation

The Art of Deception: Color Visual Illusions and Diffusion Models

Large Language Model probabilities cannot distinguish between possible and impossible language

Enhancing Document VQA Models via Retrieval-Augmented Generation

EchoDistill: Bidirectional Concept Distillation for One-Step Diffusion Personalization

GenColorBench: A Color Evaluation Benchmark for Text-to-Image Generation Models

ComicsPAP: understanding comic strips by picking the correct panel

The Morgan-Pitman Test of Equality of Variances and its Application to Machine Learning Model Evaluation and Selection

Events

AI for Law

Personalize Your Feed

A Comprehensive Characterization of Galaxy-cool CGM Connections at $z<0.4$ with DESI Year 1 Data