CEA\LIST
The Proof-of-Training-Steps (PoTS) protocol enables efficient, step-wise verification of Large Language Model training integrity to detect backdoor attacks. On instruct LLMs, verification steps were up to three times faster than full training steps, effectively preventing high attack success rates with minimal data poisoning.
In this paper, we show that the category of Mackey-complete, separated, topological convex bornological vector spaces and bornological linear maps is a differential category. Such spaces were introduced by Frölicher and Kriegl, where they were called convenient vector spaces. While much of the structure necessary to demonstrate this observation is already contained in Frölicher and Kriegl's book, we here give a new interpretation of the category of convenient vector spaces as a model of the differential linear logic of Ehrhard and Regnier. Rather than base our proof on the abstract categorical structure presented by Frölicher and Kriegl, we prefer to focus on the bornological structure of convenient vector spaces. We believe bornological structures will ultimately yield a wide variety of models of differential logics.
The combinatorial problem Max-Cut has become a benchmark in the evaluation of local search heuristics for both quantum and classical optimisers. In contrast to local search, which only provides average-case performance guarantees, the convex semidefinite relaxation of Max-Cut by Goemans and Williamson, provides worst-case guarantees and is therefore suited to both the construction of benchmarks and in applications to performance-critic scenarios. We show how extended floating point precision can be incorporated in algebraic subroutines in convex optimisation, namely in indirect matrix inversion methods like Conjugate Gradient, which are used in Interior Point Methods in the case of very large problem sizes. Also, an estimate is provided of the expected acceleration of the time to solution for a hardware architecture that runs natively on extended precision. Specifically, when using indirect matrix inversion methods like Conjugate Gradient, which have lower complexity than direct methods and are therefore used in very large problems, we see that increasing the internal working precision reduces the time to solution by a factor that increases with the system size.
An innovative framework combines Gradual Binary Search and Dimension Expansion with Hadamard matrices to enable accurate 3-bit quantization of LLM weights, activations, and KV caches. This method improves the accuracy of models like Mistral 7B by 40% at 3-bit WAKV compared to existing rotation-based quantization techniques.
A mixed-precision quantization framework for LLaMA-based models identifies and preserves high-precision calculations specifically in down-projection layers and Beginning-of-Text tokens where activation spikes occur most frequently, achieving superior perplexity scores compared to uniform quantization approaches while maintaining model accuracy through targeted precision allocation.
This paper explores the combination of neural network quantization and entropy coding for memory footprint minimization. Edge deployment of quantized models is hampered by the harsh Pareto frontier of the accuracy-to-bitwidth tradeoff, causing dramatic accuracy loss below a certain bitwidth. This accuracy loss can be alleviated thanks to mixed precision quantization, allowing for more flexible bitwidth allocation. However, standard mixed precision benefits remain limited due to the 1-bit frontier, that forces each parameter to be encoded on at least 1 bit of data. This paper introduces an approach that combines mixed precision, zero-point quantization and entropy coding to push the compression boundary of Resnets beyond the 1-bit frontier with an accuracy drop below 1% on the ImageNet benchmark. From an implementation standpoint, a compact decoder architecture features reduced latency, thus allowing for inference-compatible decoding.
LINAGORA and partners introduce TheLucie-7B, a multilingual language model and training dataset focused on French-English balance, achieving comparable performance to Llama-3.1-8B while prioritizing data rights compliance and providing full training transparency through released checkpoints and optimizer states.
Runtime verification is an area of formal methods that studies the dynamic analysis of execution traces against formal specifications. Typically, the two main activities in runtime verification efforts are the process of creating monitors from specifications, and the algorithms for the evaluation of traces against the generated monitors. Other activities involve the instrumentation of the system to generate the trace and the communication between the system under analysis and the monitor. Most of the applications in runtime verification have been focused on the dynamic analysis of software, even though there are many more potential applications to other computational devices and target systems. In this paper we present a collection of challenges for runtime verification extracted from concrete application domains, focusing on the difficulties that must be overcome to tackle these specific challenges. The computational models that characterize these domains require to devise new techniques beyond the current state of the art in runtime verification.
In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance.
As Large Language Models (LLMs) gain traction across critical domains, ensuring secure and trustworthy training processes has become a major concern. Backdoor attacks, where malicious actors inject hidden triggers into training data, are particularly insidious and difficult to detect. Existing post-training verification solutions like Proof-of-Learning are impractical for LLMs due to their requirement for full retraining, lack of robustness against stealthy manipulations, and inability to provide early detection during training. Early detection would significantly reduce computational costs. To address these limitations, we introduce Proof-of-Training Steps, a verification protocol that enables an independent auditor (Alice) to confirm that an LLM developer (Bob) has followed the declared training recipe, including data batches, architecture, and hyperparameters. By analyzing the sensitivity of the LLMs' language modeling head (LM-Head) to input perturbations, our method can expose subtle backdoor injections or deviations in training. Even with backdoor triggers in up to 10 percent of the training data, our protocol significantly reduces the attacker's ability to achieve a high attack success rate (ASR). Our method enables early detection of attacks at the injection step, with verification steps being 3x faster than training steps. Our results highlight the protocol's potential to enhance the accountability and security of LLM development, especially against insider threats.
Operating system kernels are the security keystone of most computer systems, as they provide the core protection mechanisms. Kernels are in particular responsible for their own security, i.e. they must prevent untrusted user tasks from reaching their level of privilege. We demonstrate that proving such absence of privilege escalation is a pre-requisite for any definitive security proof of the kernel. While prior OS kernel formal verifications were performed either on source code or crafted kernels, with manual or semi-automated methods requiring significant human efforts in annotations or proofs, we show that it is possible to compute such kernel security proofs using fully-automated methods and starting from the executable code of an existing microkernel with no modification, thus formally verifying absence of privilege escalation with high confidence for a low cost. We applied our method on two embedded microkernels, including the industrial kernel AnonymOS: with only 58 lines of annotation and less than 10 minutes of computation, our method finds a vulnerability in a first (buggy) version of AnonymOS and verifies absence of privilege escalation in a second (secure) version.
Directed fuzzing focuses on automatically testing specific parts of the code by taking advantage of additional information such as (partial) bug stack trace, patches or risky operations. Key applications include bug reproduction, patch testing and static analysis report verification. Although directed fuzzing has received a lot of attention recently, hard-to-detect vulnerabilities such as Use-After-Free (UAF) are still not well addressed, especially at the binary level. We propose UAFuzz, the first (binary-level) directed greybox fuzzer dedicated to UAF bugs. The technique features a fuzzing engine tailored to UAF specifics, a lightweight code instrumentation and an efficient bug triage step. Experimental evaluation for bug reproduction on real cases demonstrates that UAFuzz significantly outperforms state-of-the-art directed fuzzers in terms of fault detection rate, time to exposure and bug triaging. UAFuzz has also been proven effective in patch testing, leading to the discovery of 30 new bugs (7 CVEs) in programs such as Perl, GPAC and GNU Patch. Finally, we provide to the community a large fuzzing benchmark dedicated to UAF, built on both real codes and real bugs.
An accountable distributed system provides means to detect deviations of system components from their expected behavior. It is natural to complement fault detection with a reconfiguration mechanism, so that the system could heal itself, by replacing malfunctioning parts with new ones. In this paper, we describe a framework that can be used to implement a large class of accountable and reconfigurable replicated services. We build atop the fundamental lattice agreement abstraction lying at the core of storage systems and cryptocurrencies. Our asynchronous implementation of accountable lattice agreement ensures that every violation of consistency is followed by an undeniable evidence of misbehavior of a faulty replica. The system can then be seamlessly reconfigured by evicting faulty replicas, adding new ones and merging inconsistent states. We believe that this paper opens a direction towards asynchronous "self-healing" systems that combine accountability and reconfiguration.
General purpose CPUs used in high performance computing (HPC) support a vector instruction set and an out-of-order engine dedicated to increase the instruction level parallelism. Hence, related optimizations are currently critical to improve the performance of applications requiring numerical computation. Moreover, the use of a Java run-time environment such as the HotSpot Java Virtual Machine (JVM) in high performance computing is a promising alternative. It benefits from its programming flexibility, productivity and the performance is ensured by the Just-In-Time (JIT) compiler. Though, the JIT compiler suffers from two main drawbacks. First, the JIT is a black box for developers. We have no control over the generated code nor any feedback from its optimization phases like vectorization. Secondly, the time constraint narrows down the degree of optimization compared to static compilers like GCC or LLVM. So, it is compelling to use statically compiled code since it benefits from additional optimization reducing performance bottlenecks. Java enables to call native code from dynamic libraries through the Java Native Interface (JNI). Nevertheless, JNI methods are not inlined and require an additional cost to be invoked compared to Java ones. Therefore, to benefit from better static optimization, this call overhead must be leveraged by the amount of computation performed at each JNI invocation. In this paper we tackle this problem and we propose to do this analysis for a set of micro-kernels. Our goal is to select the most efficient implementation considering the amount of computation defined by the calling context. We also investigate the impact on performance of several different optimization schemes which are vectorization, out-of-order optimization, data alignment, method inlining and the use of native memory for JNI methods.
Large language models (LLMs) offer new opportunities for interacting with complex software artifacts, such as software models, through natural language. They present especially promising benefits for large software models that are difficult to grasp in their entirety, making traditional interaction and analysis approaches challenging. This paper investigates two approaches for leveraging LLMs to answer questions over software models: direct prompting, where the whole software model is provided in the context, and an agentic approach combining LLM-based agents with general-purpose file access tools. We evaluate these approaches using an Ecore metamodel designed for timing analysis and software optimization in automotive and embedded domains. Our findings show that while the agentic approach achieves accuracy comparable to direct prompting, it is significantly more efficient in terms of token usage. This efficiency makes the agentic approach particularly suitable for the automotive industry, where the large size of software models makes direct prompting infeasible, establishing LLM agents as not just a practical alternative but the only viable solution. Notably, the evaluation was conducted using small LLMs, which are more feasible to be executed locally - an essential advantage for meeting strict requirements around privacy, intellectual property protection, and regulatory compliance. Future work will investigate software models in diverse formats, explore more complex agent architectures, and extend agentic workflows to support not only querying but also modification of software models.
This paper presents a competitive approach to multilingual subjectivity detection using large language models (LLMs) with few-shot prompting. We participated in Task 1: Subjectivity of the CheckThat! 2025 evaluation campaign. We show that LLMs, when paired with carefully designed prompts, can match or outperform fine-tuned smaller language models (SLMs), particularly in noisy or low-quality data settings. Despite experimenting with advanced prompt engineering techniques, such as debating LLMs and various example selection strategies, we found limited benefit beyond well-crafted standard few-shot prompts. Our system achieved top rankings across multiple languages in the CheckThat! 2025 subjectivity detection task, including first place in Arabic and Polish, and top-four finishes in Italian, English, German, and multilingual tracks. Notably, our method proved especially robust on the Arabic dataset, likely due to its resilience to annotation inconsistencies. These findings highlight the effectiveness and adaptability of LLM-based few-shot learning for multilingual sentiment tasks, offering a strong alternative to traditional fine-tuning, particularly when labeled data is scarce or inconsistent.
Implementing large software, as software analyzers which aim to be used in industrial settings, requires a well-engineered software architecture in order to ease its daily development and its maintenance process during its lifecycle. If the analyzer is not only a single tool, but an open extensible collaborative framework in which external developers may develop plug-ins collaborating with each other, such a well designed architecture even becomes more important. In this experience report, we explain difficulties of developing and maintaining open extensible collaborative analysis frameworks, through the example of Frama-C, a platform dedicated to the analysis of code written in C. We also present the new upcoming software architecture of Frama-C and how it aims to solve some of these issues.
The presented work continues the line of recent distributed computing communityefforts dedicated to the theoretical aspects of blockchains. This paper is the rst tospecify blockchains as a composition of abstract data types all together with a hierarchyof consistency criteria that formally characterizes the histories admissible for distributedprograms that use them. Our work is based on an original oracle-based constructionthat, along with new consistency deffnitions, captures the eventual convergence processin blockchain systems. The paper presents as well some results on implementability ofthe presented abstractions and a mapping of representative existing blockchains fromboth academia and industry in our framework.
PICOSEC Micromegas (MM) is a precise timing gaseous detector based on a Cherenkov radiator coupled with a semi-transparent photocathode and an MM amplifying structure. The detector conceprt was successfully demonstrated through a single-channel prototype, achieving sub-25 ps time resolution with Minimum Ionizing Particles (MIPs). A series of studies followed, aimed at developing robust, large-area, and scalable detectors with high time resolution, complemented by specialized fast-response readout electronics. This work presents recent advancements towards large-area resistive PICOSEC MM, including 10 ×\times 10 cm2\text{cm}^2 area prototypes and a 20 ×\times 20 cm2\text{cm}^2 prototype, which features the jointing of four photocathodes. The time resolution of these detector prototypes was tested during the test beam, achieved a timing performance of around 25 ps for individual pads in MIPs. Meanwhile, customized electronics have been developed dedicated to the high-precision time measurement of the large-area PICOSEC MM. The performance of the entire system was evaluated during the test beam, demonstrating its capability for large-area integration. These advancements highlight the potential of PICOSEC MM to meet the stringent requirements of future particle physics experiments.
In system monitoring, automatic fault diagnosis seeks to infer the systems' state based on sensor readings, e.g., through machine learning models. In this context, it is of key importance that, based on historical data, these systems are able to generalize to incoming data. In parallel, many factors may induce changes in the data probability distribution, hindering the possibility of such models to generalize. In this sense, domain adaptation is an important framework for adapting models to different probability distributions. In this paper, we propose a new benchmark, based on the Tennessee Eastman Process of Downs and Vogel (1993), for benchmarking domain adaptation methods in the context of chemical processes. Besides describing the process, and its relevance for domain adaptation, we describe a series of data processing steps for reproducing our benchmark. We then test 11 domain adaptation strategies on this novel benchmark, showing that optimal transport-based techniques outperform other strategies.
There are no more papers matching your filters at the moment.