alphaXiv

History

Papers Benchmarks

Idiap Research Institute

2,550

31 Aug 2020

computer-science machine-learning efficient-transformers

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

University of Washington Idiap Research Institute

EPFL

François Fleuret

This paper introduces linear transformers, which reduce the computational and memory complexity of the Transformer architecture from quadratic to linear by reformulating self-attention using kernel methods and matrix associativity. This approach enables thousands of times faster autoregressive inference and efficient processing of very long sequences while maintaining competitive performance across tasks like image generation and speech recognition.

09 Oct 2025

computer-science robotics

Geometry-aware Policy Imitation

Harvard University Idiap Research Institute

EPFL

Researchers at Idiap Research Institute, EPFL, and Harvard University introduce Geometry-aware Policy Imitation (GPI), a method that interprets robotic demonstrations as geometric curves to directly synthesize control policies. It achieves significantly faster inference times and lower memory footprints than diffusion-based policies, maintaining high success rates across various complex robotic tasks in both simulation and real-world settings.

240

01 Jul 2024

computer-science artificial-intelligence machine-learning

σ-GPTs: A New Approach to Autoregressive Models

Idiap Research Institute Université de Genève Ecole Polytechnique Fédérale de Lausanne

This work introduces σ-GPTs, a class of autoregressive models that relax the fixed-order generation assumption by training on randomly shuffled sequences. The models leverage double positional encodings and a dynamic token-based rejection sampling mechanism to achieve efficient, sub-linear inference and enable natural infilling capabilities, demonstrating comparable or improved performance against traditional GPTs across various tasks.

134

21 Feb 2024

computer-science computation-and-language machine-learning

TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Idiap Research Institute

Yale University

Allen Institute for AI

EPFL

Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models to natural language remains challenging due to its discrete nature and the need for a large number of diffusion steps to generate text, making diffusion-based generation expensive. In this work, we propose Text-to-text Self-conditioned Simplex Diffusion (TESS), a text diffusion model that is fully non-autoregressive, employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the learned embedding space. Through extensive experiments on natural language understanding and generation tasks including summarization, text simplification, paraphrase generation, and question generation, we demonstrate that TESS outperforms state-of-the-art non-autoregressive models, requires fewer diffusion steps with minimal drop in performance, and is competitive with pretrained autoregressive sequence-to-sequence models. We publicly release our codebase at this https URL.

614

13 Nov 2023

computer-science artificial-intelligence computation-and-language

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Idiap Research Institute

HyperMixer introduces an MLP-based neural network architecture that offers a cost-effective alternative to Transformers for natural language understanding. This model achieves competitive or superior performance across several GLUE benchmark tasks while significantly reducing computational expense, data requirements, and tuning effort by using hypernetworks to dynamically generate an attention-like token mixing function.

08 Oct 2025

agents chain-of-thought computer-science

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition

University of Manchester Idiap Research Institute University of Sheffield École Polytechnique Fédérale de Lausanne (EPFL)CRUK Manchester Institute

Neuro-symbolic NLP methods aim to leverage the complementary strengths of large language models and formal logical solvers. However, current approaches are mostly static in nature, i.e., the integration of a target solver is predetermined at design time, hindering the ability to employ diverse formal inference strategies. To address this, we introduce an adaptive, multi-paradigm, neuro-symbolic inference framework that: (1) automatically identifies formal reasoning strategies from problems expressed in natural language; and (2) dynamically selects and applies specialized formal logical solvers via autoformalization interfaces. Extensive experiments on individual and multi-paradigm reasoning tasks support the following conclusions: LLMs are effective at predicting the necessary formal reasoning strategies with an accuracy above 90 percent. This enables flexible integration with formal logical solvers, resulting in our framework outperforming competing baselines by 27 percent and 6 percent compared to GPT-4o and DeepSeek-V3.1, respectively. Moreover, adaptive reasoning can even positively impact pure LLM methods, yielding gains of 10, 5, and 6 percent on zero-shot, CoT, and symbolic CoT settings with GPT-4o. Finally, although smaller models struggle with adaptive neuro-symbolic reasoning, post-training offers a viable path to improvement. Overall, this work establishes the foundations for adaptive LLM-symbolic reasoning, offering a path forward for unifying material and formal inferences on heterogeneous reasoning challenges.

13 Oct 2025

computer-science computer-vision-and-pattern-recognition robotics

Visual Affordance Prediction: Survey and Reproducibility

Idiap Research Institute

Queen Mary University of London Ecole Polytechnique Fédérale de Lausanne Istituto Italiano di Tecnologia

Researchers from IIT, Queen Mary University of London, and Idiap/EPFL developed a unified formulation for visual affordance prediction and introduced the "Affordance Sheet" to promote reporting standards. This work systematically reviews existing methodologies and datasets, critically identifying pervasive reproducibility challenges across the field.

28 Jul 2025

adversarial-attacks adversarial-robustness ai-for-cybersecurity

FantasyID: A dataset for detecting digital manipulations of ID-documents

Idiap Research Institute

FantasyID introduces a publicly available dataset designed for detecting digital manipulations of ID documents, featuring genuinely pristine bonafide cards, real human faces, and diverse digital forgeries across multiple languages. It provides a robust and unbiased benchmark to develop and evaluate algorithms for combating the growing threat of forged identity documents in applications like KYC.

23 Sep 2025

audio-and-speech-processing electrical-engineering

Generalizability of Predictive and Generative Speech Enhancement Models to Pathological Speakers

Idiap Research Institute

NVIDIA ["École Polytechnique Fédérale de Lausanne"]

A study from Idiap, EPFL, and NVIDIA rigorously assessed how well current speech enhancement (SE) models perform on pathological speech, revealing that pre-training on large neurotypical datasets followed by fine-tuning with smaller pathological datasets substantially enhances performance, improving generative model intelligibility (∆E-STOI) from 0.01 to 0.17.

119

11 Oct 2024

computer-science computation-and-language explainable-ai

Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving

University of Manchester Idiap Research Institute

Researchers from the University of Manchester and Idiap Research Institute developed Explanation-Refiner, a neuro-symbolic framework for verifying and refining natural language explanations (NLEs) in Natural Language Inference (NLI) tasks using Large Language Models (LLMs) and a symbolic theorem prover. This approach iteratively improved GPT-4's logical validity on the e-SNLI dataset from 36% to 84% and substantially reduced syntactic errors in formalisations.

14 Jul 2025

computer-science artificial-intelligence computation-and-language

FaceLLM: A Multimodal Large Language Model for Face Understanding

Idiap Research Institute

Multimodal large language models (MLLMs) have shown remarkable performance in vision-language tasks. However, existing MLLMs are primarily trained on generic datasets, limiting their ability to reason on domain-specific visual cues such as those in facial images. In particular, tasks that require detailed understanding of facial structure, expression, emotion, and demographic features remain underexplored by MLLMs due to the lack of large-scale annotated face image-text datasets. In this work, we introduce FaceLLM, a multimodal large language model trained specifically for facial image understanding. To construct the training data, we propose a novel weakly supervised pipeline that uses ChatGPT with attribute-aware prompts to generate high-quality question-answer pairs based on images from the FairFace dataset. The resulting corpus, called FairFaceGPT, covers a diverse set of attributes including expression, pose, skin texture, and forensic information. Our experiments demonstrate that FaceLLM improves the performance of MLLMs on various face-centric tasks and achieves state-of-the-art performance. This work highlights the potential of synthetic supervision via language models for building domain-specialized MLLMs, and sets a precedent for trustworthy, human-centric multimodal AI systems. FairFaceGPT dataset and pretrained FaceLLM models are publicly available in the project page.

178

10 May 2025

computer-science artificial-intelligence computation-and-language

Evaluating Creative Short Story Generation in Humans and Large Language Models

University of Amsterdam Idiap Research Institute

EPFL Universit della Svizzera italiana

Researchers from EPFL, Idiap, USI, and University of Amsterdam systematically compared creative short story generation between 60 diverse LLMs and 60 average human writers. Their analysis revealed that LLMs generated more linguistically complex but less novel, surprising, and diverse stories than humans, and also exposed biases in non-expert and LLM evaluators who often mistook complexity for creativity.

126

30 May 2025

agents computer-science artificial-intelligence

CoRet: Improved Retriever for Code Editing

Idiap Research Institute

EPFL AWS AI Labs

In this paper, we introduce CoRet, a dense retrieval model designed for code-editing tasks that integrates code semantics, repository structure, and call graph dependencies. The model focuses on retrieving relevant portions of a code repository based on natural language queries such as requests to implement new features or fix bugs. These retrieved code chunks can then be presented to a user or to a second code-editing model or agent. To train CoRet, we propose a loss function explicitly designed for repository-level retrieval. On SWE-bench and Long Code Arena's bug localisation datasets, we show that our model substantially improves retrieval recall by at least 15 percentage points over existing models, and ablate the design choices to show their importance in achieving these results.

111

01 May 2024

computer-science computation-and-language instruction-tuning

Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models

University of Manchester Idiap Research Institute

The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) using demonstrations generated from robust Large Language Models (LLMs). Although these approaches deliver more performant models, they do not show sufficiently strong generalization ability as the training only relies on the provided demonstrations. In this paper, we propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities. Our approach is based on a two-stage process, where reasoning abilities are first transferred between LLMs and Small Language Models (SLMs) via Instruction-tuning on demonstrations provided by LLMs, and then the instructed models Self-refine their abilities through preference optimization strategies. In particular, the second phase operates refinement heuristics based on the Direct Preference Optimization algorithm, where the SLMs are elicited to deliver a series of reasoning paths by automatically sampling the generated responses and providing rewards using ground truths from the LLMs. Results obtained on commonsense and math reasoning tasks show that this approach significantly outperforms Instruction-tuning in both in-domain and out-domain scenarios, aligning the reasoning abilities of Smaller and Larger Language Models.

16 Oct 2025

computer-science artificial-intelligence computation-and-language

Benchmarking Multimodal Large Language Models for Face Recognition

Idiap Research Institute

Multimodal large language models (MLLMs) have achieved remarkable performance across diverse vision-and-language tasks. However, their potential in face recognition remains underexplored. In particular, the performance of open-source MLLMs needs to be evaluated and compared with existing face recognition models on standard benchmarks with similar protocol. In this work, we present a systematic benchmark of state-of-the-art MLLMs for face recognition on several face recognition datasets, including LFW, CALFW, CPLFW, CFP, AgeDB and RFW. Experimental results reveal that while MLLMs capture rich semantic cues useful for face-related tasks, they lag behind specialized models in high-precision recognition scenarios in zero-shot applications. This benchmark provides a foundation for advancing MLLM-based face recognition, offering insights for the design of next-generation models with higher accuracy and generalization. The source code of our benchmark is publicly available in the project page.

641

20 Oct 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision

Imperial College London

University of Manchester

Fudan University Idiap Research Institute

Zhejiang University

Creativity in AI: Progresses and Challenges

Idiap Research Institute

EPFL Universit della Svizzera italiana (USI)

This survey paper from EPFL, USI, and Idiap Research Institute provides a timely overview of the creative capabilities of the latest AI systems, identifying key advancements in linguistic and artistic content generation while highlighting challenges in areas like true originality and abstract reasoning. It systematically reviews machine creativity across diverse domains and advocates for more comprehensive evaluation frameworks.

362

29 Apr 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Neural Redshift: Random Networks are not Random Functions

ETH Zurich Idiap Research Institute

EPFL University of Adelaide

Our understanding of the generalization capabilities of neural networks (NNs) is still incomplete. Prevailing explanations are based on implicit biases of gradient descent (GD) but they cannot account for the capabilities of models from gradient-free methods nor the simplicity bias recently observed in untrained networks. This paper seeks other sources of generalization in NNs. Findings. To understand the inductive biases provided by architectures independently from GD, we examine untrained, random-weight networks. Even simple MLPs show strong inductive biases: uniform sampling in weight space yields a very biased distribution of functions in terms of complexity. But unlike common wisdom, NNs do not have an inherent "simplicity bias". This property depends on components such as ReLUs, residual connections, and layer normalizations. Alternative architectures can be built with a bias for any level of complexity. Transformers also inherit all these properties from their building blocks. Implications. We provide a fresh explanation for the success of deep learning independent from gradient-based training. It points at promising avenues for controlling the solutions implemented by trained models.

12 Sep 2025

adversarial-robustness ai-for-cybersecurity computer-science

Detecting Text Manipulation in Images using Vision Language Models

Idiap Research Institute

Recent works have shown the effectiveness of Large Vision Language Models (VLMs or LVLMs) in image manipulation detection. However, text manipulation detection is largely missing in these studies. We bridge this knowledge gap by analyzing closed- and open-source VLMs on different text manipulation datasets. Our results suggest that open-source models are getting closer, but still behind closed-source ones like GPT- 4o. Additionally, we benchmark image manipulation detection-specific VLMs for text manipulation detection and show that they suffer from the generalization problem. We benchmark VLMs for manipulations done on in-the-wild scene texts and on fantasy ID cards, where the latter mimic a challenging real-world misuse.

22 Oct 2025

computer-science machine-learning domain-adaptation

Latent Space Factorization in LoRA

Idiap Research Institute

EPFL BUT

FVAE-LoRA, developed by researchers at Idiap Research Institute and EPFL, introduces a parameter-efficient fine-tuning method that integrates a Factorized Variational Autoencoder into LoRA, explicitly disentangling task-relevant features from other information. This approach enhances robustness to spurious correlations and improves performance, often surpassing standard LoRA and matching full fine-tuning across diverse image, text, and audio tasks.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Geometry-aware Policy Imitation

σ-GPTs: A New Approach to Autoregressive Models

TESS: Text-to-Text Self-Conditioned Simplex Diffusion

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Adaptive LLM-Symbolic Reasoning via Dynamic Logical Solver Composition

Visual Affordance Prediction: Survey and Reproducibility

FantasyID: A dataset for detecting digital manipulations of ID-documents

Generalizability of Predictive and Generative Speech Enhancement Models to Pathological Speakers

Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving

FaceLLM: A Multimodal Large Language Model for Face Understanding

Evaluating Creative Short Story Generation in Humans and Large Language Models

CoRet: Improved Retriever for Code Editing

Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models

Benchmarking Multimodal Large Language Models for Face Recognition

Nexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision

Creativity in AI: Progresses and Challenges

Neural Redshift: Random Networks are not Random Functions

Detecting Text Manipulation in Images using Vision Language Models

Latent Space Factorization in LoRA

Events

AI for Law

Personalize Your Feed