alphaXiv

History

Papers Benchmarks

CEA\LIST

16 Oct 2025

adversarial-attacks adversarial-robustness computer-science

PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models

Université Paris-Saclay CEA\LIST

The Proof-of-Training-Steps (PoTS) protocol enables efficient, step-wise verification of Large Language Model training integrity to detect backdoor attacks. On instruct LLMs, verification steps were up to three times faster than full training steps, effectively preventing high attack success rates with minimal data poisoning.

16 Jun 2010

computer-science logic-in-computer-science mathematics

A convenient differential category

University of Ottawa Universite Paris Diderot Paris 7 CEA\LIST

In this paper, we show that the category of Mackey-complete, separated, topological convex bornological vector spaces and bornological linear maps is a differential category. Such spaces were introduced by Frölicher and Kriegl, where they were called convenient vector spaces. While much of the structure necessary to demonstrate this observation is already contained in Frölicher and Kriegl's book, we here give a new interpretation of the category of convenient vector spaces as a model of the differential linear logic of Ehrhard and Regnier. Rather than base our proof on the abstract categorical structure presented by Frölicher and Kriegl, we prefer to focus on the bornological structure of convenient vector spaces. We believe bornological structures will ultimately yield a wide variety of models of differential logics.

03 Oct 2025

computer-science hardware-architecture data-structures-and-algorithms

A Hardware Accelerator for the Goemans-Williamson Algorithm

Universit Grenoble Alpes CEA\LIST

The combinatorial problem Max-Cut has become a benchmark in the evaluation of local search heuristics for both quantum and classical optimisers. In contrast to local search, which only provides average-case performance guarantees, the convex semidefinite relaxation of Max-Cut by Goemans and Williamson, provides worst-case guarantees and is therefore suited to both the construction of benchmarks and in applications to performance-critic scenarios. We show how extended floating point precision can be incorporated in algebraic subroutines in convex optimisation, namely in indirect matrix inversion methods like Conjugate Gradient, which are used in Interior Point Methods in the case of very large problem sizes. Also, an estimate is provided of the expected acceleration of the time to solution for a hardware architecture that runs natively on extended precision. Specifically, when using indirect matrix inversion methods like Conjugate Gradient, which have lower complexity than direct methods and are therefore used in very large problems, we see that increasing the internal working precision reduces the time to solution by a factor that increases with the system size.

170

13 May 2025

computer-science artificial-intelligence computation-and-language

Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs

CNRS

Université Paris-Saclay

Inria LIG Grenoble-INP CEA\LIST

An innovative framework combines Gradual Binary Search and Dimension Expansion with Hadamard matrices to enable accurate 3-bit quantization of LLM weights, activations, and KV caches. This method improves the accuracy of models like Mistral 7B by 40% at 3-bit WAKV compared to existing rotation-based quantization techniques.

30 Apr 2025

computer-science computation-and-language efficient-transformers

Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models

CNRS

Inria Grenoble-INP CEA\LIST

A mixed-precision quantization framework for LLaMA-based models identifies and preserves high-precision calculations specifically in down-projection layers and Beginning-of-Text tokens where activation spikes occur most frequently, achieving superior perplexity scores compared to uniform quantization approaches while maintaining model accuracy through targeted precision allocation.

10 Jun 2024

computer-science machine-learning

Efficient Neural Compression with Inference-time Decoding

Université Paris-Saclay CEA-Leti Universit Grenoble Alpes CEA\LIST

This paper explores the combination of neural network quantization and entropy coding for memory footprint minimization. Edge deployment of quantized models is hampered by the harsh Pareto frontier of the accuracy-to-bitwidth tradeoff, causing dramatic accuracy loss below a certain bitwidth. This accuracy loss can be alleviated thanks to mixed precision quantization, allowing for more flexible bitwidth allocation. However, standard mixed precision benefits remain limited due to the 1-bit frontier, that forces each parameter to be encoded on at least 1 bit of data. This paper introduces an approach that combines mixed precision, zero-point quantization and entropy coding to push the compression boundary of Resnets beyond the 1-bit frontier with an accuracy drop below 1% on the ImageNet benchmark. From an implementation standpoint, a compact decoder architecture features reduced latency, thus allowing for inference-compatible decoding.

15 Mar 2025

computer-science artificial-intelligence computation-and-language

The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation

Loria LINAGORA CEA\LIST

LINAGORA and partners introduce TheLucie-7B, a multilingual language model and training dataset focused on French-English balance, achieving comparable performance to Llama-3.1-8B while prioritizing data rights compliance and providing full training transparency through released checkpoints and optimizer states.

16 Nov 2018

computer-science software-engineering

A Survey of Challenges for Runtime Verification from Advanced Application Domains (Beyond Software)

CNRS Austrian Institute of Technology

ETH Zürich Universidade de Lisboa University of Luxembourg

Inria TU Wien

Chalmers University of Technology University of Gothenburg IMDEA Software Institute LIG University of Malta Univ Grenoble Alpes Universidade Nova de Lisboa Accemic Technologies GmbH CEA\LIST

Runtime verification is an area of formal methods that studies the dynamic analysis of execution traces against formal specifications. Typically, the two main activities in runtime verification efforts are the process of creating monitors from specifications, and the algorithms for the evaluation of traces against the generated monitors. Other activities involve the instrumentation of the system to generate the trace and the communication between the system under analysis and the monitor. Most of the applications in runtime verification have been focused on the dynamic analysis of software, even though there are many more potential applications to other computational devices and target systems. In this paper we present a collection of challenges for runtime verification extracted from concrete application domains, focusing on the difficulties that must be overcome to tackle these specific challenges. The computational models that characterize these domains require to devise new techniques beyond the current state of the art in runtime verification.

14 Sep 2023

computer-science artificial-intelligence machine-learning

Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning

Université Paris-Saclay CEA\LIST

In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance.

16 Oct 2025

adversarial-attacks adversarial-robustness ai-for-cybersecurity

PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models

Université Paris-Saclay CEA\LIST

As Large Language Models (LLMs) gain traction across critical domains, ensuring secure and trustworthy training processes has become a major concern. Backdoor attacks, where malicious actors inject hidden triggers into training data, are particularly insidious and difficult to detect. Existing post-training verification solutions like Proof-of-Learning are impractical for LLMs due to their requirement for full retraining, lack of robustness against stealthy manipulations, and inability to provide early detection during training. Early detection would significantly reduce computational costs. To address these limitations, we introduce Proof-of-Training Steps, a verification protocol that enables an independent auditor (Alice) to confirm that an LLM developer (Bob) has followed the declared training recipe, including data batches, architecture, and hyperparameters. By analyzing the sensitivity of the LLMs' language modeling head (LM-Head) to input perturbations, our method can expose subtle backdoor injections or deviations in training. Even with backdoor triggers in up to 10 percent of the training data, our protocol significantly reduces the attacker's ability to achieve a high attack success rate (ASR). Our method enables early detection of attacks at the injection step, with verification steps being 3x faster than training steps. Our results highlight the protocol's potential to enhance the accountability and security of LLM development, especially against insider threats.

19 Mar 2020

computer-science cryptography-and-security

Automatically Proving Microkernels Free from Privilege Escalation from their Executable

CNRS

Inria PSL University ENS CEA\LIST

Operating system kernels are the security keystone of most computer systems, as they provide the core protection mechanisms. Kernels are in particular responsible for their own security, i.e. they must prevent untrusted user tasks from reaching their level of privilege. We demonstrate that proving such absence of privilege escalation is a pre-requisite for any definitive security proof of the kernel. While prior OS kernel formal verifications were performed either on source code or crafted kernels, with manual or semi-automated methods requiring significant human efforts in annotations or proofs, we show that it is possible to compute such kernel security proofs using fully-automated methods and starting from the executable code of an existing microkernel with no modification, thus formally verifying absence of privilege escalation with high confidence for a low cost. We applied our method on two embedded microkernels, including the industrial kernel AnonymOS: with only 58 lines of annotation and less than 10 minutes of computation, our method finds a vulnerability in a first (buggy) version of AnonymOS and verifies absence of privilege escalation in a second (secure) version.

17 Aug 2020

computer-science cryptography-and-security

Binary-level Directed Fuzzing for Use-After-Free Vulnerabilities

Univ Grenoble Alpes Univ Paris-Saclay Tweag I/O CEA\LIST

Directed fuzzing focuses on automatically testing specific parts of the code by taking advantage of additional information such as (partial) bug stack trace, patches or risky operations. Key applications include bug reproduction, patch testing and static analysis report verification. Although directed fuzzing has received a lot of attention recently, hard-to-detect vulnerabilities such as Use-After-Free (UAF) are still not well addressed, especially at the binary level. We propose UAFuzz, the first (binary-level) directed greybox fuzzer dedicated to UAF bugs. The technique features a fuzzing engine tailored to UAF specifics, a lightweight code instrumentation and an efficient bug triage step. Experimental evaluation for bug reproduction on real cases demonstrates that UAFuzz significantly outperforms state-of-the-art directed fuzzers in terms of fault detection rate, time to exposure and bug triaging. UAFuzz has also been proven effective in patch testing, leading to the discovery of 30 new bugs (7 CVEs) in programs such as Perl, GPAC and GNU Patch. Finally, we provide to the community a large fuzzing benchmark dedicated to UAF, built on both real codes and real bugs.

14 Dec 2021

computer-science distributed-parallel-and-cluster-computing

Accountability and Reconfiguration: Self-Healing Lattice Agreement

Institut Polytechnique de Paris Télécom Paris Université de Paris-Saclay CEA\LIST

An accountable distributed system provides means to detect deviations of system components from their expected behavior. It is natural to complement fault detection with a reconfiguration mechanism, so that the system could heal itself, by replacing malfunctioning parts with new ones. In this paper, we describe a framework that can be used to implement a large class of accountable and reconfigurable replicated services. We build atop the fundamental lattice agreement abstraction lying at the core of storage systems and cryptocurrencies. Our asynchronous implementation of accountable lattice agreement ensures that every violation of consistency is followed by an undeniable evidence of misbehavior of a faulty replica. The system can then be seamlessly reconfigured by evicting faulty replicas, adding new ones and merging inconsistent states. We believe that this paper opens a direction towards asynchronous "self-healing" systems that combine accountability and reconfiguration.

21 Dec 2014

computer-science distributed-parallel-and-cluster-computing performance

Performance comparison between Java and JNI for optimal implementation of computational micro-kernels

Univ. Grenoble Aselta Nanographics CEA\LIST

General purpose CPUs used in high performance computing (HPC) support a vector instruction set and an out-of-order engine dedicated to increase the instruction level parallelism. Hence, related optimizations are currently critical to improve the performance of applications requiring numerical computation. Moreover, the use of a Java run-time environment such as the HotSpot Java Virtual Machine (JVM) in high performance computing is a promising alternative. It benefits from its programming flexibility, productivity and the performance is ensured by the Just-In-Time (JIT) compiler. Though, the JIT compiler suffers from two main drawbacks. First, the JIT is a black box for developers. We have no control over the generated code nor any feedback from its optimization phases like vectorization. Secondly, the time constraint narrows down the degree of optimization compared to static compilers like GCC or LLVM. So, it is compelling to use statically compiled code since it benefits from additional optimization reducing performance bottlenecks. Java enables to call native code from dynamic libraries through the Java Native Interface (JNI). Nevertheless, JNI methods are not inlined and require an additional cost to be invoked compared to Java ones. Therefore, to benefit from better static optimization, this call overhead must be leveraged by the amount of computation performed at each JNI invocation. In this paper we tackle this problem and we propose to do this analysis for a set of micro-kernels. Our goal is to select the most efficient implementation considering the amount of computation defined by the calling context. We also investigate the impact on performance of several different optimization schemes which are vectorization, out-of-order optimization, data alignment, method inlining and the use of native memory for JNI methods.

16 Jun 2025

agentic-frameworks agents computer-science

Querying Large Automotive Software Models: Agentic vs. Direct LLM Approaches

Université Paris-Saclay

Technical University of Munich Tensor embedded GmbH CEA\LIST

Large language models (LLMs) offer new opportunities for interacting with complex software artifacts, such as software models, through natural language. They present especially promising benefits for large software models that are difficult to grasp in their entirety, making traditional interaction and analysis approaches challenging. This paper investigates two approaches for leveraging LLMs to answer questions over software models: direct prompting, where the whole software model is provided in the context, and an agentic approach combining LLM-based agents with general-purpose file access tools. We evaluate these approaches using an Ecore metamodel designed for timing analysis and software optimization in automotive and embedded domains. Our findings show that while the agentic approach achieves accuracy comparable to direct prompting, it is significantly more efficient in terms of token usage. This efficiency makes the agentic approach particularly suitable for the automotive industry, where the large size of software models makes direct prompting infeasible, establishing LLM agents as not just a practical alternative but the only viable solution. Notably, the evaluation was conducted using small LLMs, which are more feasible to be executed locally - an essential advantage for meeting strict requirements around privacy, intellectual property protection, and regulatory compliance. Future work will investigate software models in diverse formats, explore more complex agent architectures, and extend agentic workflows to support not only querying but also modification of software models.

10 Jul 2025

computer-science artificial-intelligence computation-and-language

CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text

CEA\LIST

This paper presents a competitive approach to multilingual subjectivity detection using large language models (LLMs) with few-shot prompting. We participated in Task 1: Subjectivity of the CheckThat! 2025 evaluation campaign. We show that LLMs, when paired with carefully designed prompts, can match or outperform fine-tuned smaller language models (SLMs), particularly in noisy or low-quality data settings. Despite experimenting with advanced prompt engineering techniques, such as debating LLMs and various example selection strategies, we found limited benefit beyond well-crafted standard few-shot prompts. Our system achieved top rankings across multiple languages in the CheckThat! 2025 subjectivity detection task, including first place in Arabic and Polish, and top-four finishes in Italian, English, German, and multilingual tracks. Notably, our method proved especially robust on the Arabic dataset, likely due to its resilience to annotation inconsistencies. These findings highlight the effectiveness and adaptability of LLM-based few-shot learning for multilingual sentiment tasks, offering a strong alternative to traditional fine-tuning, particularly when labeled data is scarce or inconsistent.

17 Aug 2015

computer-science software-engineering

Software Architecture of Code Analysis Frameworks Matters: The Frama-C Example

CEA\LIST

Implementing large software, as software analyzers which aim to be used in industrial settings, requires a well-engineered software architecture in order to ease its daily development and its maintenance process during its lifecycle. If the analyzer is not only a single tool, but an open extensible collaborative framework in which external developers may develop plug-ins collaborating with each other, such a well designed architecture even becomes more important. In this experience report, we explain difficulties of developing and maintaining open extensible collaborative analysis frameworks, through the example of Frama-C, a platform dedicated to the analysis of code written in C. We also present the new upcoming software architecture of Frama-C and how it aims to solve some of these issues.

14 May 2018

computer-science distributed-parallel-and-cluster-computing

Blockchain Abstract Data Type

CNRS

Sorbonne Université IMT Atlantique CEA\LIST

The presented work continues the line of recent distributed computing communityefforts dedicated to the theoretical aspects of blockchains. This paper is the rst tospecify blockchains as a composition of abstract data types all together with a hierarchyof consistency criteria that formally characterizes the histories admissible for distributedprograms that use them. Our work is based on an original oracle-based constructionthat, along with new consistency deffnitions, captures the eventual convergence processin blockchain systems. The paper presents as well some results on implementability ofthe presented abstractions and a mapping of representative existing blockchains fromboth academia and industry in our framework.

09 Jan 2025

high-energy-physics-experiment physics instrumentation-and-detectors

PICOSEC Micromegas Precise-timing Detectors: Development towards Large-Area and Integration

University of Science and Technology of China University of Bonn

Stony Brook University

Université Paris-Saclay Friedrich-Alexander-Universität Erlangen-Nürnberg University of Zagreb Aristotle University of Thessaloniki

CEA National Technical University of Athens

University of Virginia Ruđer Bošković Institute NCSR Demokritos Ludwig Maximilian University of Munich INFN, Sezione di Pavia EUROPEAN ORGANISATION FOR NUCLEAR RESEARCH (CERN)Jefferson Lab Laboratório de Instrumentacão e Física Experimental de Partículas CEA\LIST Universit di Pavia

PICOSEC Micromegas (MM) is a precise timing gaseous detector based on a Cherenkov radiator coupled with a semi-transparent photocathode and an MM amplifying structure. The detector conceprt was successfully demonstrated through a single-channel prototype, achieving sub-25 ps time resolution with Minimum Ionizing Particles (MIPs). A series of studies followed, aimed at developing robust, large-area, and scalable detectors with high time resolution, complemented by specialized fast-response readout electronics. This work presents recent advancements towards large-area resistive PICOSEC MM, including 10

\times

\text{cm}^2

area prototypes and a 20

\times

\text{cm}^2

prototype, which features the jointing of four photocathodes. The time resolution of these detector prototypes was tested during the test beam, achieved a timing performance of around 25 ps for individual pads in MIPs. Meanwhile, customized electronics have been developed dedicated to the high-precision time measurement of the large-area PICOSEC MM. The performance of the entire system was evaluated during the test beam, demonstrating its capability for large-area integration. These advancements highlight the potential of PICOSEC MM to meet the stringent requirements of future particle physics experiments.

29 Jul 2024

computer-science artificial-intelligence machine-learning

Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman Process

Université Paris-Saclay

Aalto University Federal University of Cear CEA\LIST

In system monitoring, automatic fault diagnosis seeks to infer the systems' state based on sensor readings, e.g., through machine learning models. In this context, it is of key importance that, based on historical data, these systems are able to generalize to incoming data. In parallel, many factors may induce changes in the data probability distribution, hindering the possibility of such models to generalize. In this sense, domain adaptation is an important framework for adapting models to different probability distributions. In this paper, we propose a new benchmark, based on the Tennessee Eastman Process of Downs and Vogel (1993), for benchmarking domain adaptation methods in the context of chemical processes. Besides describing the process, and its relevance for domain adaptation, we describe a series of data processing steps for reproducing our benchmark. We then test 11 domain adaptation strategies on this novel benchmark, showing that optimal transport-based techniques outperform other strategies.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models

A convenient differential category

A Hardware Accelerator for the Goemans-Williamson Algorithm

Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs

Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models

Efficient Neural Compression with Inference-time Decoding

The Lucie-7B LLM and the Lucie Training Dataset: Open resources for multilingual language generation

A Survey of Challenges for Runtime Verification from Advanced Application Domains (Beyond Software)

Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning

PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models

Automatically Proving Microkernels Free from Privilege Escalation from their Executable

Binary-level Directed Fuzzing for Use-After-Free Vulnerabilities

Accountability and Reconfiguration: Self-Healing Lattice Agreement

Performance comparison between Java and JNI for optimal implementation of computational micro-kernels

Querying Large Automotive Software Models: Agentic vs. Direct LLM Approaches

CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text

Software Architecture of Code Analysis Frameworks Matters: The Frama-C Example

Blockchain Abstract Data Type

PICOSEC Micromegas Precise-timing Detectors: Development towards Large-Area and Integration

Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman Process

Events

AI for Law

Personalize Your Feed