alphaXiv

History

Papers Benchmarks

Universitat Politècnica de Catalunya

732

10 Oct 2025

agentic-frameworks agents chain-of-thought

Fundamentals of Building Autonomous LLM Agents

Universitat Politècnica de Catalunya Technische Universität München

This paper offers a comprehensive review of the fundamental architectural components—Perception, Reasoning, Memory, and Execution—necessary for building autonomous Large Language Model (LLM) agents. It synthesizes state-of-the-art techniques to enhance the execution of complex automation tasks, providing a structured framework for future development.

1,263

09 Oct 2025

astrophysics-of-galaxies physics

A Comprehensive Characterization of Galaxy-cool CGM Connections at $z<0.4$ with DESI Year 1 Data

Academia Sinica National Astronomical Observatory of Japan

UC Berkeley

University College London National Taiwan University

University of Michigan

Boston University Kavli Institute for the Physics and Mathematics of the Universe The University of Texas at Dallas

Lawrence Berkeley National Laboratory

Sorbonne Université Fermi National Accelerator Laboratory Universitat Politècnica de Catalunya University of Portsmouth

The Ohio State University Sejong University Universidad Nacional Autónoma de México Universitat Autònoma de Barcelona

University of California, Santa Cruz NSF NOIRLab Universidad de Los Andes University of Wyoming CIEMAT Institut de Física d’Altes Energies (IFAE)Institució Catalana de Recerca i Estudis Avançats Siena College Instituto Astrofisica de Canarias Institute of Space Sciences (ICE–CSIC)Universit degli Studi di Milano INAF Osservatorio Astronomico di Brera

This study comprehensively characterized the cool circumgalactic medium (CGM) around galaxies at redshifts below 0.4 using data from the Dark Energy Spectroscopic Instrument (DESI) Year 1 survey. It reveals persistent correlations between cool gas absorption and galaxy properties like stellar mass and star formation rate, along with an unexpected absence of azimuthal anisotropy, indicating a possible evolution in CGM dynamics at lower redshifts.

1,506

13 Oct 2024

computer-science computation-and-language explainable-ai

A Primer on the Inner Workings of Transformer-based Language Models

Meta Universitat Politècnica de Catalunya

University of Groningen

Gabriele Sarti

Javi Ferrando

This primer provides a comprehensive technical introduction to interpreting transformer-based language models, particularly generative decoder-only architectures, by consolidating current techniques and systematically mapping discovered internal mechanisms and behaviors across model components.

757

21 Apr 2025

computer-science machine-learning deep-reinforcement-learning

Simplifying Deep Temporal Difference Learning

University of Oxford Universitat Politècnica de Catalunya Barcelona Supercomputing Center Institut de Ciències del Mar

This paper accelerates and simplifies deep Temporal Difference (TD) learning by theoretically demonstrating that LayerNormalization and L2 regularization enable provably convergent off-policy Q-learning without target networks or large replay buffers. The resulting Parallelized Q-Network (PQN) achieves competitive performance with state-of-the-art algorithms on single and multi-agent benchmarks, while being up to 50x faster due to its design for end-to-end GPU training.

186

5,048

26 Mar 2025

computer-science hardware-architecture

Analyzing Modern NVIDIA GPU cores

Universitat Politècnica de Catalunya

A detailed reverse engineering analysis of modern NVIDIA GPU cores uncovers key microarchitectural details and provides an updated simulation model in Accel-sim, reducing mean absolute percentage error in execution cycle predictions while revealing compiler-guided scheduling policies and software-based dependency management approaches used in current GPU designs.

271

30 Jan 2024

computer-science machine-learning neural-and-evolutionary-computing

ENN: A Neural Network with DCT Adaptive Activation Functions

Universitat Politècnica de Catalunya Centre Tecnològic de Telecomunicacions de Catalunya ICREA Acadèmia

The Expressive Neural Network (ENN) introduces a neural network architecture where activation functions are dynamically adapted during training using the Discrete Cosine Transform (DCT). This approach enables the network to achieve significantly higher accuracy in complex classification tasks (up to 40% improvement in some scenarios) and orders of magnitude lower Mean Squared Error in regression compared to models with fixed activation functions.

1,980

12 Feb 2025

computer-science artificial-intelligence evolutionary-algorithms

Metaheuristics and Large Language Models Join Forces: Toward an Integrated Optimization Approach

Universitat Politècnica de Catalunya Artificial Intelligence Research Institute (IIIA-CSIC)

Camilo Chacón Sartori

Christian Blum

Since the rise of Large Language Models (LLMs) a couple of years ago, researchers in metaheuristics (MHs) have wondered how to use their power in a beneficial way within their algorithms. This paper introduces a novel approach that leverages LLMs as pattern recognition tools to improve MHs. The resulting hybrid method, tested in the context of a social network-based combinatorial optimization problem, outperforms existing state-of-the-art approaches that combine machine learning with MHs regarding the obtained solution quality. By carefully designing prompts, we demonstrate that the output obtained from LLMs can be used as problem knowledge, leading to improved results. Lastly, we acknowledge LLMs' potential drawbacks and limitations and consider it essential to examine them to advance this type of research further. Our method can be reproduced using a tool available at: this https URL

18 Aug 2025

agentic-frameworks agents computer-science

Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems

Universitat Politècnica de Catalunya

Researchers from Universitat Politècnica de Catalunya conducted a detailed empirical analysis of the SWE-Bench leaderboards, profiling 80 unique LLM- and agent-based repair systems submitted by diverse entities. The study reveals a dominance of industrial contributions and proprietary LLMs for top performance, along with insights into prevalent architectural patterns and pipeline strategies for automated program repair.

08 Jul 2025

computer-science software-engineering

Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models

Universitat Politècnica de Catalunya

Researchers at Universitat Politècnica de Catalunya introduce Multi-Agent Debate (MAD) strategies to improve Large Language Model performance in Requirements Engineering tasks. Their empirical evaluation demonstrated that MAD statistically outperforms single-agent LLM approaches in requirements classification, achieving F1-score improvements of over 10 points, despite incurring significantly higher computational costs.

148

11 Jul 2025

computer-science distributed-parallel-and-cluster-computing machine-learning

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference

IBM Research Universitat Politècnica de Catalunya Barcelona Supercomputing Center

Large-batch LLM inference for smaller models on GPUs remains memory-bound due to DRAM bandwidth saturation in the attention mechanism, not compute-bound as commonly assumed, leading to significant GPU underutilization. The research introduces a Batching Configuration Advisor (BCA) and model replication, achieving up to 33.7% throughput increase for OPT-1.3B by efficiently utilizing freed memory.

13 Jan 2023

computer-science artificial-intelligence machine-learning

TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning Problems

CSIC Universitat Politècnica de Catalunya

Coordination is one of the most difficult aspects of multi-agent reinforcement learning (MARL). One reason is that agents normally choose their actions independently of one another. In order to see coordination strategies emerging from the combination of independent policies, the recent research has focused on the use of a centralized function (CF) that learns each agent's contribution to the team reward. However, the structure in which the environment is presented to the agents and to the CF is typically overlooked. We have observed that the features used to describe the coordination problem can be represented as vertex features of a latent graph structure. Here, we present TransfQMix, a new approach that uses transformers to leverage this latent structure and learn better coordination policies. Our transformer agents perform a graph reasoning over the state of the observable entities. Our transformer Q-mixer learns a monotonic mixing-function from a larger graph that includes the internal and external states of the agents. TransfQMix is designed to be entirely transferable, meaning that same parameters can be used to control and train larger or smaller teams of agents. This enables to deploy promising approaches to save training time and derive general policies in MARL, such as transfer learning, zero-shot transfer, and curriculum learning. We report TransfQMix's performances in the Spread and StarCraft II environments. In both settings, it outperforms state-of-the-art Q-Learning models, and it demonstrates effectiveness in solving problems that other methods can not solve.

480

16 Apr 2024

attention-mechanisms computer-science artificial-intelligence

Information Flow Routes: Automatically Interpreting Language Models at Scale

Meta Universitat Politècnica de Catalunya

This research introduces an attribution-based method for interpreting Large Language Models by tracing "information flow routes," enabling automated, scalable analysis of internal computations. The approach provides a 100x speedup over existing patching methods and uncovers general and domain-specific component specialization in Llama 2-7B, offering a more comprehensive understanding of its internal mechanisms.

840

29 Mar 2025

computer-science artificial-intelligence computation-and-language

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

Amazon Universitat Politècnica de Catalunya Telefónica Scientific Research

Advancements in Natural Language Processing (NLP), have led to the emergence of Large Language Models (LLMs) such as GPT, Llama, Claude, and Gemini, which excel across a range of tasks but require extensive fine-tuning to align their outputs with human expectations. A widely used method for achieving this alignment is Reinforcement Learning from Human Feedback (RLHF), which, despite its success, faces challenges in accurately modelling human preferences. In this paper, we introduce GazeReward, a novel framework that integrates implicit feedback -- and specifically eye-tracking (ET) data -- into the Reward Model (RM). In addition, we explore how ET-based features can provide insights into user preferences. Through ablation studies we test our framework with different integration methods, LLMs, and ET generator models, demonstrating that our approach significantly improves the accuracy of the RM on established human preference datasets. This work advances the ongoing discussion on optimizing AI alignment with human values, exploring the potential of cognitive data for shaping future NLP research.

125

09 Sep 2023

computer-science computation-and-language efficient-transformers

Neurons in Large Language Models: Dead, N-gram, Positional

Meta Universitat Politècnica de Catalunya

Analyzing feed-forward network (FFN) neurons in large language models using a lightweight, single-GPU approach, this research categorizes neuron types and demonstrates that FFN neurons explicitly suppress information about their triggering tokens, a mechanism distinct from previous understandings of information addition. The study found many "dead" neurons in early layers and identified neurons dedicated to positional information, with behaviors evolving as models scale.

15 Sep 2025

computer-science machine-learning mathematics

The Morgan-Pitman Test of Equality of Variances and its Application to Machine Learning Model Evaluation and Selection

Universitat Politècnica de Catalunya Universitat Autònoma de Barcelona Centro de Matemática Centro de Matemática, Facultad de Ciencias

Researchers developed an enhanced Morgan-Pitman test to compare the equality of prediction error variances for machine learning models, offering a statistically sound criterion for model evaluation and selection. This robust method, which handles non-normal errors and dependent residuals, helps identify simpler, more generalizable models that exhibit equivalent predictive stability.

200

07 Oct 2022

computer-science machine-learning networking-and-internet-architecture

Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case

Universitat Politècnica de Catalunya AGH University of Science and Technology

Deep Reinforcement Learning (DRL) has shown a dramatic improvement in decision-making and automated control problems. Consequently, DRL represents a promising technique to efficiently solve many relevant optimization problems (e.g., routing) in self-driving networks. However, existing DRL-based solutions applied to networking fail to generalize, which means that they are not able to operate properly when applied to network topologies not observed during training. This lack of generalization capability significantly hinders the deployment of DRL technologies in production networks. This is because state-of-the-art DRL-based networking solutions use standard neural networks (e.g., fully connected, convolutional), which are not suited to learn from information structured as graphs. In this paper, we integrate Graph Neural Networks (GNN) into DRL agents and we design a problem specific action space to enable generalization. GNNs are Deep Learning models inherently designed to generalize over graphs of different sizes and structures. This allows the proposed GNN-based DRL agent to learn and generalize over arbitrary network topologies. We test our DRL+GNN agent in a routing optimization use case in optical networks and evaluate it on 180 and 232 unseen synthetic and real-world network topologies respectively. The results show that the DRL+GNN agent is able to outperform state-of-the-art solutions in topologies never seen during training.

21 Aug 2025

attention-mechanisms computer-science contrastive-learning

GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting

ETH Zurich

Google

Technical University of Munich Universitat Politècnica de Catalunya Munich Center for Machine Learning VisualAIs University of T¨ubingen

3D scene reconstruction and understanding have gained increasing popularity, yet existing methods still struggle to capture fine-grained, language-aware 3D representations from 2D images. In this paper, we present GALA, a novel framework for open-vocabulary 3D scene understanding with 3D Gaussian Splatting (3DGS). GALA distills a scene-specific 3D instance feature field via self-supervised contrastive learning. To extend to generalized language feature fields, we introduce the core contribution of GALA, a cross-attention module with two learnable codebooks that encode view-independent semantic embeddings. This design not only ensures intra-instance feature similarity but also supports seamless 2D and 3D open-vocabulary queries. It reduces memory consumption by avoiding per-Gaussian high-dimensional feature learning. Extensive experiments on real-world datasets demonstrate GALA's remarkable open-vocabulary performance on both 2D and 3D.

27 Sep 2025

physics quantum-physics

Do quantum linear solvers offer advantage for networks-based system of linear equations?

Universitat Politècnica de Catalunya Academy of Scientific and Innovative Research (AcSIR)Qilimanjaro Quantum Tech National Institute of Technology Agartala Centre for Quantum Engineering, Research and Education, TCG CREST Deloitte Tohmatsu Financial Advisory LLC

In this exploratory numerical study, we assess the suitability of Quantum Linear Solvers (QLSs) toward providing a quantum advantage for Networks-based Linear System Problems (NLSPs). NLSPs are of importance as they are naturally connected to real-world applications. In an NLSP, one starts with a graph and arrives at a system of linear equations. The advantage that one may obtain with a QLS for an NLSP is determined by the interplay between three variables: the scaling of condition number and sparsity functions of matrices associated with the graphs considered, as well as the function describing the system size growth. We recommend graph families that can offer potential for an exponential advantage (best graph families) and those that offer sub-exponential but at least polynomial advantage (better graph families), with the HHL algorithm considered relative to an efficient classical linear solver. Within the scope of our analyses, we observe that only 4% of the 50 considered graph families offer prospects for an exponential advantage, whereas about 20% of the considered graph families show a polynomial advantage. Furthermore, we observe and report some interesting cases where some graph families not only fare better with improved algorithms such as the Childs-Kothari-Somma algorithm but also graduate from offering no advantage to promising a polynomial advantage, graph families that exhibit futile exponential advantage, etc. Given the limited number of graph families that one can survey through numerical studies, we discuss an interesting case where we unify several graph families into one superfamily, and show the existence of infinite best and better graphs in it. Lastly, we very briefly touch upon some practical issues that one may face even if the aforementioned graph theoretic requirements are satisfied, including quantum hardware challenges.

07 Aug 2025

computer-science computation-and-language efficient-transformers

Towards Pareto Optimal Throughput in Small Language Model Serving

IBM Research Universitat Politècnica de Catalunya Barcelona Supercomputing Center

This research comprehensively benchmarks Small Language Models (SLMs) to reveal that they achieve Pareto-optimal throughput on a single high-end GPU, unlike larger models that require distributed systems. The study demonstrates that replicating SLMs on a single accelerator significantly boosts overall throughput and GPU utilization, offering a new approach to efficient serving.

23 Jul 2024

computer-science machine-learning neural-and-evolutionary-computing

Towards a "universal translator" for neural dynamics at single-cell, single-spike resolution

Georgia Institute of Technology

Stanford University

Columbia University Universitat Politècnica de Catalunya Champalimaud Foundation Mila, McGill University

Neuroscience research has made immense progress over the last decade, but our understanding of the brain remains fragmented and piecemeal: the dream of probing an arbitrary brain region and automatically reading out the information encoded in its neural activity remains out of reach. In this work, we build towards a first foundation model for neural spiking data that can solve a diverse set of tasks across multiple brain areas. We introduce a novel self-supervised modeling approach for population activity in which the model alternates between masking out and reconstructing neural activity across different time steps, neurons, and brain regions. To evaluate our approach, we design unsupervised and supervised prediction tasks using the International Brain Laboratory repeated site dataset, which is comprised of Neuropixels recordings targeting the same brain locations across 48 animals and experimental sessions. The prediction tasks include single-neuron and region-level activity prediction, forward prediction, and behavior decoding. We demonstrate that our multi-task-masking (MtM) approach significantly improves the performance of current state-of-the-art population models and enables multi-task learning. We also show that by training on multiple animals, we can improve the generalization ability of the model to unseen animals, paving the way for a foundation model of the brain at single-cell, single-spike resolution.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Fundamentals of Building Autonomous LLM Agents

A Comprehensive Characterization of Galaxy-cool CGM Connections at $z<0.4$ with DESI Year 1 Data

A Primer on the Inner Workings of Transformer-based Language Models

Simplifying Deep Temporal Difference Learning

Analyzing Modern NVIDIA GPU cores

ENN: A Neural Network with DCT Adaptive Activation Functions

Metaheuristics and Large Language Models Join Forces: Toward an Integrated Optimization Approach

Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems

Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference

TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning Problems

Information Flow Routes: Automatically Interpreting Language Models at Scale

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

Neurons in Large Language Models: Dead, N-gram, Positional

The Morgan-Pitman Test of Equality of Variances and its Application to Machine Learning Model Evaluation and Selection

Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case

GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting

Do quantum linear solvers offer advantage for networks-based system of linear equations?

Towards Pareto Optimal Throughput in Small Language Model Serving

Towards a "universal translator" for neural dynamics at single-cell, single-spike resolution

Events

AI for Law

Personalize Your Feed

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Fundamentals of Building Autonomous LLM Agents

A Comprehensive Characterization of Galaxy-cool CGM Connections at z&lt;0.4 with DESI Year 1 Data

A Primer on the Inner Workings of Transformer-based Language Models

Simplifying Deep Temporal Difference Learning

Analyzing Modern NVIDIA GPU cores

ENN: A Neural Network with DCT Adaptive Activation Functions

Metaheuristics and Large Language Models Join Forces: Toward an Integrated Optimization Approach

Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems

Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models

Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference

TransfQMix: Transformers for Leveraging the Graph Structure of Multi-Agent Reinforcement Learning Problems

Information Flow Routes: Automatically Interpreting Language Models at Scale

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

Neurons in Large Language Models: Dead, N-gram, Positional

The Morgan-Pitman Test of Equality of Variances and its Application to Machine Learning Model Evaluation and Selection

Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case

GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting

Do quantum linear solvers offer advantage for networks-based system of linear equations?

Towards Pareto Optimal Throughput in Small Language Model Serving

Towards a "universal translator" for neural dynamics at single-cell, single-spike resolution

Events

AI for Law

Personalize Your Feed

A Comprehensive Characterization of Galaxy-cool CGM Connections at $z<0.4$ with DESI Year 1 Data