alphaXiv

History

Papers Benchmarks

Univ-Rennes

921

28 Jun 2023

computer-science contrastive-learning computer-vision-and-pattern-recognition

A Cookbook of Self-Supervised Learning

New York University

Pulp Motion: Framing-aware multimodal camera and human motion generation

CNRS IRISA

Inria LIX `Ecole Polytechnique Univ-Rennes

Treating human motion and camera trajectory generation separately overlooks a core principle of cinematography: the tight interplay between actor performance and camera work in the screen space. In this paper, we are the first to cast this task as a text-conditioned joint generation, aiming to maintain consistent on-screen framing while producing two heterogeneous, yet intrinsically linked, modalities: human motion and camera trajectories. We propose a simple, model-agnostic framework that enforces multimodal coherence via an auxiliary modality: the on-screen framing induced by projecting human joints onto the camera. This on-screen framing provides a natural and effective bridge between modalities, promoting consistency and leading to more precise joint distribution. We first design a joint autoencoder that learns a shared latent space, together with a lightweight linear transform from the human and camera latents to a framing latent. We then introduce auxiliary sampling, which exploits this linear transform to steer generation toward a coherent framing modality. To support this task, we also introduce the PulpMotion dataset, a human-motion and camera-trajectory dataset with rich captions, and high-quality human motions. Extensive experiments across DiT- and MAR-based architectures show the generality and effectiveness of our method in generating on-frame coherent human-camera motions, while also achieving gains on textual alignment for both modalities. Our qualitative results yield more cinematographically meaningful framings setting the new state of the art for this task. Code, models and data are available in our \href{this https URL}{project page}.

07 Oct 2025

computer-science machine-learning domain-adaptation

How Foundational are Foundation Models for Time Series Forecasting?

CNRS IRISA

Inria

CEA Univ Grenoble Alpes Univ. Savoie Mont Blanc Univ-Rennes

Foundation Models are designed to serve as versatile embedding machines, with strong zero shot capabilities and superior generalization performance when fine-tuned on diverse downstream tasks. While this is largely true for language and vision foundation models, we argue that the inherent diversity of time series data makes them less suited for building effective foundation models. We demonstrate this using forecasting as our downstream task. We show that the zero-shot capabilities of a time series foundation model are significantly influenced and tied to the specific domains it has been pretrained on. Furthermore, when applied to unseen real-world time series data, fine-tuned foundation models do not consistently yield substantially better results, relative to their increased parameter count and memory footprint, than smaller, dedicated models tailored to the specific forecasting task at hand.

30 Oct 2025

computer-science robotics

Agile and Cooperative Aerial Manipulation of a Cable-Suspended Load

Delft University of Technology University of Catania University of Twente Univ-Rennes

Quadrotors can carry slung loads to hard-to-reach locations at high speed. Since a single quadrotor has limited payload capacities, using a team of quadrotors to collaboratively manipulate a heavy object is a scalable and promising solution. However, existing control algorithms for multi-lifting systems only enable low-speed and low-acceleration operations due to the complex dynamic coupling between quadrotors and the load, limiting their use in time-critical missions such as search and rescue. In this work, we present a solution to significantly enhance the agility of cable-suspended multi-lifting systems. Unlike traditional cascaded solutions, we introduce a trajectory-based framework that solves the whole-body kinodynamic motion planning problem online, accounting for the dynamic coupling effects and constraints between the quadrotors and the load. The planned trajectory is provided to the quadrotors as a reference in a receding-horizon fashion and is tracked by an onboard controller that observes and compensates for the cable tension. Real-world experiments demonstrate that our framework can achieve at least eight times greater acceleration than state-of-the-art methods to follow agile trajectories. Our method can even perform complex maneuvers such as flying through narrow passages at high speed. Additionally, it exhibits high robustness against load uncertainties and does not require adding any sensors to the load, demonstrating strong practicality.

26 Sep 2025

adversarial-robustness computer-science computer-vision-security

Guidance Watermarking for Diffusion Models

CNRS IRISA

Inria LABEL4.AI Univ-Rennes

Researchers from Univ. Rennes, Inria, CNRS, IRISA, and LABEL4.AI developed a guidance watermarking framework that enables any differentiable post-hoc watermarking scheme to be intrinsically embedded into diffusion model outputs. This method robustly identifies AI-generated images without retraining the generative model, achieving up to three times greater watermark capacity and significantly improved detectability against diverse attacks.

13 Oct 2025

computer-science machine-learning optimization-methods

PAC-Bayesian Bounds on Constrained f-Entropic Risk Measures

CNRS

Inria Universite de Lyon Laboratoire Hubert Curien UMR 5516 Universit´e Jean Monnet Saint-´Etienne Lyon 2 ERIC UR3083 IRISA - UMR 6074 Institut d Optique Graduate School Univ-Rennes

PAC generalization bounds on the risk, when expressed in terms of the expected loss, are often insufficient to capture imbalances between subgroups in the data. To overcome this limitation, we introduce a new family of risk measures, called constrained f-entropic risk measures, which enable finer control over distributional shifts and subgroup imbalances via f-divergences, and include the Conditional Value at Risk (CVaR), a well-known risk measure. We derive both classical and disintegrated PAC-Bayesian generalization bounds for this family of risks, providing the first disintegratedPAC-Bayesian guarantees beyond standard risks. Building on this theory, we design a self-bounding algorithm that minimizes our bounds directly, yielding models with guarantees at the subgroup level. Finally, we empirically demonstrate the usefulness of our approach.

02 Oct 2025

clustering-algorithms computer-science machine-learning

A reproducible comparative study of categorical kernels for Gaussian process regression, with new clustering-based nested kernels

CNRS Institut Polytechnique de Paris CREST ENSAI SAFRAN TECH Univ-Rennes

Designing categorical kernels is a major challenge for Gaussian process regression with continuous and categorical inputs. Despite previous studies, it is difficult to identify a preferred method, either because the evaluation metrics, the optimization procedure, or the datasets change depending on the study. In particular, reproducible code is rarely available. The aim of this paper is to provide a reproducible comparative study of all existing categorical kernels on many of the test cases investigated so far. We also propose new evaluation metrics inspired by the optimization community, which provide quantitative rankings of the methods across several tasks. From our results on datasets which exhibit a group structure on the levels of categorical inputs, it appears that nested kernels methods clearly outperform all competitors. When the group structure is unknown or when there is no prior knowledge of such a structure, we propose a new clustering-based strategy using target encodings of categorical variables. We show that on a large panel of datasets, which do not necessarily have a known group structure, this estimation strategy still outperforms other approaches while maintaining low computational cost.

02 Oct 2025

computer-science continual-learning machine-learning

Are Time Series Foundation Models Susceptible to Catastrophic Forgetting?

CNRS IRISA

Inria

CEA Univ Grenoble Alpes Univ. Savoie Mont Blanc LIST Univ-Rennes

Time Series Foundation Models (TSFMs) have shown promising zero-shot generalization across diverse forecasting tasks. However, their robustness to continual adaptation remains underexplored. In this work, we investigate the extent to which TSFMs suffer from catastrophic forgetting when fine-tuned sequentially on multiple datasets. Using synthetic datasets designed with varying degrees of periodic structure, we measure the trade-off between adaptation to new data and retention of prior knowledge. Our experiments reveal that, while fine-tuning improves performance on new tasks, it often causes significant degradation on previously learned ones, illustrating a fundamental stability-plasticity dilemma.

05 Jun 2024

computer-science machine-learning edge-computing

Training of Physical Neural Networks

CNRS

Google DeepMind

University of Cambridge

University of Pennsylvania City University of New York

Université Paris-Saclay Max Planck Institute for the Science of Light Politecnico di Milano NTT Research Inc.IBM Research Europe Swiss Federal Institute of Technology in Lausanne (EPFL)Rain AI University Bourgogne Franche-Comté Univ-Rennes

sylvain gigan

This paper offers a comprehensive review of training methodologies for Physical Neural Networks (PNNs), addressing the escalating energy and performance demands of digital AI. It systematically categorizes diverse training approaches, from physics-aware backpropagation to in-situ gradient computation, and evaluates their potential to enable energy-efficient, scalable AI systems.

18 Oct 2024

adversarial-robustness computer-science computation-and-language

WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off

CNRS IRISA

Inria Univ-Rennes

Watermarking is a technical means to dissuade malfeasant usage of Large Language Models. This paper proposes a novel watermarking scheme, so-called WaterMax, that enjoys high detectability while sustaining the quality of the generated text of the original LLM. Its new design leaves the LLM untouched (no modification of the weights, logits, temperature, or sampling technique). WaterMax balances robustness and complexity contrary to the watermarking techniques of the literature inherently provoking a trade-off between quality and robustness. Its performance is both theoretically proven and experimentally validated. It outperforms all the SotA techniques under the most complete benchmark suite. Code available at this https URL.

215

31 Jul 2025

computer-science computer-vision-security computer-vision-and-pattern-recognition

Sparfels: Fast Reconstruction from Sparse Unposed Imagery

CNRS IRISA

Inria Univ-Rennes

Sparfels presents a method for rapidly reconstructing detailed 3D geometry from a few unposed images, combining a 3D foundation model with test-time 2D Gaussian Splatting and novel variance regularization. The approach achieves state-of-the-art reconstruction accuracy on the DTU dataset within approximately three minutes on a consumer GPU while enhancing novel view synthesis quality and camera pose estimation.

02 Apr 2025

computer-science computer-vision-and-pattern-recognition generative-models

BOGausS: Better Optimized Gaussian Splatting

CNRS Orange Innovation INSA Rennes IETR-UMR 6164 Univ-Rennes

BOGausS improves 3D Gaussian Splatting optimization by addressing challenges in parameter tuning, model size reduction, and visual artifacts. It achieves higher quality scene reconstructions with up to ten times fewer Gaussians than prior methods, developed by researchers from Orange Innovation and French academic institutions.

03 Oct 2025

computer-science formal-languages-and-automata-theory computer-science-and-game-theory

Reach together: How populations win repeated games

CNRS IRISA

Université Paris-Saclay

Inria ENS Paris-Saclay MPI-SWS Univ-Rennes

In repeated games, players choose actions concurrently at each step. We consider a parameterized setting of repeated games in which the players form a population of an arbitrary size. Their utility functions encode a reachability objective. The problem is whether there exists a uniform coalition strategy for the players so that they are sure to win independently of the population size. We use algebraic tools to show that the problem can be solved in polynomial space. First we exhibit a finite semigroup whose elements summarize strategies over a finite interval of population sizes. Then, we characterize the existence of winning strategies by the existence of particular elements in this semigroup. Finally, we provide a matching complexity lower bound, to conclude that repeated population games with reachability objectives are PSPACE-complete.

16 Oct 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

Unsupervised Deep Generative Models for Anomaly Detection in Neuroimaging: A Systematic Scoping Review

CNRS IRISA

Inria INSERM Siemens Healthineers CHU Rennes Centre de Kerpape Univ-Rennes

This systematic scoping review synthesizes findings from 49 studies (2018-2025) on unsupervised deep generative models for anomaly detection in neuroimaging, providing a pathology-specific comparison of performance metrics and architectural design choices. The review finds that these models achieve Dice scores up to 0.77 for large lesions like brain tumors but consistently struggle with smaller or sparser abnormalities such as those in multiple sclerosis and stroke, where Dice scores are often below 0.50.

100

11 Jun 2025

combinatorics mathematics probability

Community detection with the Bethe-Hessian

CNRS

University of Southern California CREST ENSAI Univ-Rennes

The Bethe-Hessian matrix, introduced by Saade, Krzakala, and Zdeborová (2014), is a Hermitian matrix designed for applying spectral clustering algorithms to sparse networks. Rather than employing a non-symmetric and high-dimensional non-backtracking operator, a spectral method based on the Bethe-Hessian matrix is conjectured to also reach the Kesten-Stigum detection threshold in the sparse stochastic block model (SBM). We provide the first rigorous analysis of the Bethe-Hessian spectral method in the SBM under both the bounded expected degree and the growing degree regimes. Specifically, we demonstrate that: (i) When the expected degree

d\geq 2

, the number of negative outliers of the Bethe-Hessian matrix can consistently estimate the number of blocks above the Kesten-Stigum threshold, thus confirming a conjecture from Saade, Krzakala, and Zdeborová (2014) for

d\geq 2

. (ii) For sufficiently large

d

, its eigenvectors can be used to achieve weak recovery. (iii) As

d\to\infty

, we establish the concentration of the locations of its negative outlier eigenvalues, and weak consistency can be achieved via a spectral method based on the Bethe-Hessian matrix.

13 May 2019

computer-science machine-learning statistics

Optimal Transport for structured data with application on graphs

CNRS IRISA Univ. Côte d’azur Univ. Bretagne Sud OCA Lagrange LETG Univ-Rennes

Researchers from French institutions introduce the Fused Gromov-Wasserstein (FGW) distance, a novel optimal transport metric for structured data like graphs, which unifies both node-level feature information and graph-level structural information. The FGW distance achieves state-of-the-art performance in graph classification across various benchmarks and enables the computation of meaningful graph barycenters for unsupervised learning tasks.

04 Mar 2025

computer-science machine-learning statistics

Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements

CNRS IRISA

Inria Univ-Rennes

Antonio Marino

A decentralized reinforcement learning framework, LGTC-IPPO, enables multi-agent multi-resource allocation through dynamic cluster agreements, achieving stable, high rewards and successfully reallocating resources on physical drones. This approach integrates a Liquid-Graph Time-Constant (LGTC) neural network to learn dynamic clustering, improving coordination and adaptability in complex environments.

04 Jun 2025

computer-science software-engineering

LLM Code Customization with Visual Results: A Benchmark on TikZ

CNRS IRISA

Inria Univ. Lille IUF Univ-Rennes

With the rise of AI-based code generation, customizing existing code out of natural language instructions to modify visual results -such as figures or images -has become possible, promising to reduce the need for deep programming expertise. However, even experienced developers can struggle with this task, as it requires identifying relevant code regions (feature location), generating valid code variants, and ensuring the modifications reliably align with user intent. In this paper, we introduce vTikZ, the first benchmark designed to evaluate the ability of Large Language Models (LLMs) to customize code while preserving coherent visual outcomes. Our benchmark consists of carefully curated vTikZ editing scenarios, parameterized ground truths, and a reviewing tool that leverages visual feedback to assess correctness. Empirical evaluation with stateof-the-art LLMs shows that existing solutions struggle to reliably modify code in alignment with visual intent, highlighting a gap in current AI-assisted code editing approaches. We argue that vTikZ opens new research directions for integrating LLMs with visual feedback mechanisms to improve code customization tasks in various domains beyond TikZ, including image processing, art creation, Web design, and 3D modeling.

02 Sep 2025

adversarial-attacks adversarial-robustness ai-for-cybersecurity

Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models

CNRS Idiap Research Institute MBZUAI IRISA Tallinn University of Technology Univ. Bretagne Sud Validsoft Ltd.Univ-Rennes

Parallel to the development of advanced deepfake audio generation, audio deepfake detection has also seen significant progress. However, a standardized and comprehensive benchmark is still missing. To address this, we introduce Speech DeepFake (DF) Arena, the first comprehensive benchmark for audio deepfake detection. Speech DF Arena provides a toolkit to uniformly evaluate detection systems, currently across 14 diverse datasets and attack scenarios, standardized evaluation metrics and protocols for reproducibility and transparency. It also includes a leaderboard to compare and rank the systems to help researchers and developers enhance their reliability and robustness. We include 14 evaluation sets, 12 state-of-the-art open-source and 3 proprietary detection systems. Our study presents many systems exhibiting high EER in out-of-domain scenarios, highlighting the need for extensive cross-domain evaluation. The leaderboard is hosted on Huggingface1 and a toolkit for reproducing results across the listed datasets is available on GitHub.

01 Oct 2025

adversarial-robustness ai-for-cybersecurity computer-science

Fast, Secure, and High-Capacity Image Watermarking with Autoencoded Text Vectors

CNRS IRISA IMATAG LABEL4.AI Univ-Rennes

Researchers from IRISA, Univ. Rennes, CNRS, Imatag, and LABEL4.AI developed LatentSeal, an image watermarking system that redefines watermarking as semantic communication to embed full-sentence textual messages into images. It achieves up to 121 times faster decoding compared to baselines and robustly reconstructs text while maintaining high imperceptibility and providing a confidence metric for extracted messages.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

A Cookbook of Self-Supervised Learning

Pulp Motion: Framing-aware multimodal camera and human motion generation

How Foundational are Foundation Models for Time Series Forecasting?

Agile and Cooperative Aerial Manipulation of a Cable-Suspended Load

Guidance Watermarking for Diffusion Models

PAC-Bayesian Bounds on Constrained f-Entropic Risk Measures

A reproducible comparative study of categorical kernels for Gaussian process regression, with new clustering-based nested kernels

Are Time Series Foundation Models Susceptible to Catastrophic Forgetting?

Training of Physical Neural Networks

WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off

Sparfels: Fast Reconstruction from Sparse Unposed Imagery

BOGausS: Better Optimized Gaussian Splatting

Reach together: How populations win repeated games

Unsupervised Deep Generative Models for Anomaly Detection in Neuroimaging: A Systematic Scoping Review

Community detection with the Bethe-Hessian

Optimal Transport for structured data with application on graphs

Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements

LLM Code Customization with Visual Results: A Benchmark on TikZ

Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models

Fast, Secure, and High-Capacity Image Watermarking with Autoencoded Text Vectors

Events

AI for Law

Personalize Your Feed