Univ Angers
Researchers from Inria and the University of Bordeaux, with collaborators from Hugging Face, developed GLAM, a method for functionally grounding Large Language Models in interactive textual environments using online Reinforcement Learning. GLAM-trained Flan-T5 models achieved superior sample efficiency and better generalization to novel objects and task compositions compared to traditional RL agents and behavioral cloning baselines.
·
MAGELLAN equips Large Language Model (LLM) agents with a metacognitive ability to predict their learning progress (LP) and competence in extensive, language-defined goal spaces. This framework facilitates efficient self-organized learning, allowing agents to master diverse goal categories faster than baselines without expert knowledge and generalize effectively to new tasks.
3
Reinforcement learning (RL) is a promising approach for aligning large language models (LLMs) knowledge with sequential decision-making tasks. However, few studies have thoroughly investigated the impact on LLM agents capabilities of fine-tuning them with RL in a specific environment. In this paper, we propose a novel framework to analyze the sensitivity of LLMs to prompt formulations following RL training in a textual environment. Our findings reveal that the performance of LLMs degrades when faced with prompt formulations different from those used during the RL training phase. Besides, we analyze the source of this sensitivity by examining the model's internal representations and salient tokens. Finally, we propose to use a contrastive loss to mitigate this sensitivity and improve the robustness and generalization capabilities of LLMs.
·
HERAKLES introduces a hierarchical framework for open-ended AI agents that allows for the continuous compilation of mastered complex behaviors into reusable low-level skills, enabling efficient and scalable learning in dynamic environments. This approach, integrating a large language model for high-level planning with a neural network for execution, demonstrates enhanced sample efficiency and robust generalization to novel tasks.
·
The past years have seen Large Language Models (LLMs) strive not only as generative models but also as agents solving textual sequential decision-making tasks. When facing complex environments where their zero-shot abilities are insufficient, recent work showed online Reinforcement Learning (RL) could be used for the LLM agent to discover and learn efficient strategies interactively. However, most prior work sticks to on-policy algorithms, which greatly reduces the scope of methods such agents could use for both exploration and exploitation, such as experience replay and hindsight relabeling. Yet, such methods may be key for LLM learning agents, and in particular when designing autonomous intrinsically motivated agents sampling and pursuing their own goals (i.e. autotelic agents). This paper presents and studies an adaptation of Soft Actor-Critic and hindsight relabeling to LLM agents. Our method not only paves the path towards autotelic LLM agents that learn online but can also outperform on-policy methods in more classic multi-goal RL environments.
A survey by Guérin, Chauvet, and Saubion from the University of Angers systematically categorizes and details advancements in Self-Organizing Map (SOM) research from 2014 to 2024. It highlights improvements across data management, topology, learning techniques, visualization, performance, and hyperparameterization, serving as a resource for researchers and practitioners.
The accurate prediction of chemical reaction outcomes is a major challenge in computational chemistry. Current models rely heavily on either highly specific reaction templates or template-free methods, both of which present limitations. To address these, this work proposes the Broad Reaction Set (BRS), a set featuring 20 generic reaction templates written in SMARTS, a pattern-based notation designed to describe substructures and reactivity. Additionally, we introduce ProPreT5, a T5-based model specifically adapted for chemistry and, to the best of our knowledge, the first language model capable of directly handling and applying SMARTS reaction templates. To further improve generalization, we propose the first augmentation strategy for SMARTS, which injects structural diversity at the pattern level. Trained on augmented templates, ProPreT5 demonstrates strong predictive performance and generalization to unseen reactions. Together, these contributions provide a novel and practical alternative to current methods, advancing the field of template-based reaction prediction.
The efficient exploration of chemical space remains a central challenge, as many generative models still produce unstable or non-synthesizable compounds. To address these limitations, we present EvoMol-RL, a significant extension of the EvoMol evolutionary algorithm that integrates reinforcement learning to guide molecular mutations based on local structural context. By leveraging Extended Connectivity Fingerprints (ECFPs), EvoMol-RL learns context-aware mutation policies that prioritize chemically plausible transformations. This approach significantly improves the generation of valid and realistic molecules, reducing the frequency of structural artifacts and enhancing optimization performance. The results demonstrate that EvoMol-RL consistently outperforms its baseline in molecular pre-filtering realism. These results emphasize the effectiveness of combining reinforcement learning with molecular fingerprints to generate chemically relevant molecular structures.
We construct symmetric monoidal higher categories of iterated Calabi-Yau cospans, that are noncommutative analogs of iterated lagrangian correspondences. We actually give a general (and functorial) procedure that applies to iterated nondegenerate cospans on certain comma categories. This allows us to factor the AKSZ fully extended TFT associated with the moduli of objects of a Calabi-Yau category (taking values in iterated lagrangian correspondences) through a fully extended TFT taking values in iterated Calabi-Yau cospans.
Mathematical modeling offers the opportunity to test hypothesis concerning Myeloproliferative emergence and development. We tested different mathematical models based on a training cohort (n=264 patients) (Registre de la côte d'Or) to determine the emergence and evolution times before JAK2V617F classical Myeloproliferative disorders (respectively Polycythemia Vera and Essential Thrombocytemia) are diagnosed. We dissected the time before diagnosis as two main periods: the time from embryonic development for the JAK2V617F mutation to occur, not disappear and enter in proliferation, and a second time corresponding to the expansion of the clonal population until diagnosis. We demonstrate using progressively complexified models that the rate of active mutation occurrence is not constant and doesn't just rely on individual variability, but rather increases with age and takes a median time of 63.1+/-13 years. A contrario, the expansion time can be considered as constant: 8.8 years once the mutation has emerged. Results were validated in an external cohort (national FIMBANK Cohort, n=1248 patients). Analyzing JAK2V617F Essential Thrombocytema versus Polycythemia Vera, we noticed that the first period of time (rate of active homozygous mutation occurrence) for PV takes approximatively 1.5 years more than for ET to develop when the expansion time was quasi-similar. In conclusion, our multi-step approach and the ultimate time-dependent model of MPN emergence and development demonstrates that the emergence of a JAK2V617F mutation should be linked to an aging mechanism, and indicates a 8-9 years period of time to develop a full MPN.
In \cite{BCP}, the authors built and studied an algorithm based on the (self)-interaction of a dynamics with its occupation measure to approximate Quasi-Stationary Distributions (QSD) of general Markov chains conditioned to stay in a compact set. In this paper, we propose to tackle the case of McKean-Vlasov-type dynamics, \emph{i.e.} of dynamics interacting with their marginal distribution (conditioned to not be killed). In this non-linear setting, we are able to exhibit some conditions which guarantee that weak limits of these sequences of random measures are QSDs of the given dynamics. We also prove tightness results in the non-compact case. These general conditions are then applied to Euler schemes of McKean-Vlasov SDEs and in the compact case, the behavior of these QSDs when the step hh goes to 00 is investigated. Our results also allow to consider some examples in the non-compact case and some new tightness criterions are also provided in this setting. Finally, we illustrate our theoretical results with several simulations.
We investigate the effect of a tunable spectral filter on the dynamics of a passively mode-locked fiber laser in the anomalous dispersion regime. The results show that noise-like pulse emission evolves toward bound-state regime as the filter bandwidth is reduced. Thanks to the Shannon entropy applied to dispersive Fourier transform signals, it is demonstrated that the system undergoes a phase transition.
We study the propagation of initial condition in the presence of two topological insulators without magnetic field where the interface is a smooth connected not compact curve without boundaries. The solution is governed by an adiabatic modulation of a Dirac operator with a variable mass. We determine the evolution of the semiclassical measure of the solution with a two-scale Wigner measure method by reducing the Dirac operator to a normal form.
The efficient exploration of chemical space remains a central challenge, as many generative models still produce unstable or non-synthesizable compounds. To address these limitations, we present EvoMol-RL, a significant extension of the EvoMol evolutionary algorithm that integrates reinforcement learning to guide molecular mutations based on local structural context. By leveraging Extended Connectivity Fingerprints (ECFPs), EvoMol-RL learns context-aware mutation policies that prioritize chemically plausible transformations. This approach significantly improves the generation of valid and realistic molecules, reducing the frequency of structural artifacts and enhancing optimization performance. The results demonstrate that EvoMol-RL consistently outperforms its baseline in molecular pre-filtering realism. These results emphasize the effectiveness of combining reinforcement learning with molecular fingerprints to generate chemically relevant molecular structures.
Given a static vertex-selection problem (e.g. independent set, dominating set) on a graph, we can define a corresponding temporally satisfying reconfiguration problem on a temporal graph which asks for a sequence of solutions to the vertex-selection problem at each time such that we can reconfigure from one solution to the next. We can think of each solution in the sequence as a set of vertices with tokens placed on them; our reconfiguration model allows us to slide tokens along active edges of a temporal graph at each time-step. We show that it is possible to efficiently check whether one solution can be reconfigured to another, and show that approximation results on the static vertex-selection problem can be adapted with a lifetime factor to the reconfiguration version. Our main contributions are fixed-parameter tractable algorithms with respect to: enumeration time of the related static problem; the combination of temporal neighbourhood diversity and lifetime of the input temporal graph; and the combination of lifetime and treewidth of the footprint graph.
The Casas-Alvero conjecture predicts that every univariate polynomial ff over a field KK of characteristic zero having a common factor with each of its derivatives Hi(f)H_i(f) is a power of a linear polynomial. Let f=xd+a1xd1++a1xK[a1,,ad1][x]f=x^d+a_1x^{d-1}+\cdots+a_1x \in K[a_1,\ldots,a_{d-1}][x] and let $R_i = Res(f,H_i(f))\in K[a_1,\ldots,a_{d-1}]betheresultantof be the resultant of fand and H_i(f)$, i{1,,d1}i \in \{1,\ldots,d-1\}. The Casas-Alvero Conjecture is equivalent to saying that R1,,Rd1R_1,\ldots,R_{d-1} are ``independent'' in a certain sense, namely that the height ht(R1,,Rd1)=d1ht(R_1,\ldots,R_{d-1})=d-1 in K[a1,,ad1]K[a_1,\ldots,a_{d-1}]. In this paper we prove a very partial result in this direction : if $i \in \{d-3,d-2,d-1\}then then R_i \notin \sqrt{(R_1,\ldots,\breve{R_i},\ldots,R_{d-1}}$.
07 Sep 2023
In this paper, we considered the problem of dependent censoring models with a positive probability that the times of failure are equal. In this context, we proposed to consider the Marshall-Olkin type model and studied some properties of the associated survival copula in its application to censored data. We also introduced estimators for the marginal distributions and the joint survival probabilities under different schemes and showed their asymptotic normality under appropriate conditions. Finally, we evaluated the finite-sample performance of our approach relying on a small simulation study on synthetic data, and an application to real data.
We study the multiplicative statistics associated to the limiting determinantal point process describing unitary random matrices with a critical edge point, where limiting density vanishes like a power 5/2. We prove that these statistics are governed by the first three equations of the KdV hierarchy, and study the asymptotic behavior of the relevant solutions.
We report observations of stripe-like features in Enceladus' plumes captured simultaneously by Cassini's VIMS-IR and ISS NAC instruments during flyby E17, with similar patterns seen in VIMS-IR data from flyby E13 and E19. These parallel stripes, inclined at approximately 16^{\circ} to the ecliptic and 43^{\circ} to Saturn's ring plane, appear continuous across images when projected in the J2000 frame. A bright stripe, most visible at wavelengths around 5 μ\mum, acts as the zeroth-order diffraction peak of a reflection grating with an estimated groove spacing of 0.12-2.60 mm, while adjacent stripes are attributed to higher-order diffraction peaks. We suggest that this light-dispersing phenomenon originates from an inclined periodic structure within Saturn's E ring. This structure, constrained between Saturn's G ring and Rhea's orbit, likely consists of fresh ice particles supplied by Enceladus' plumes.
We consider a discrete Markov-additive process, that is a Markov chain on a state space Zd×E\mathbb{Z}^d \times E with invariant jumps along the Zd\mathbb{Z}^d component. In the case where the set EE is finite, we derive an asymptotic equivalent of the Green function of the process, providing a new proof of a result obtained by Dussaule in 2020. This result generalizes the famous theorem of Ney and Spitzer of 1966, that deals with the sum of independent and identically distributed random variables, to a spatially non-homogeneous case. In this new proof, we generalize the arguments used in Woess's book Random Walks on Infinite Graphs and Groups to prove Ney and Spitzer's theorem, that consists in establishing an integral formula of the Green function from which we get the asymptotic equivalent. To do so, we use techniques developed by Babillot. In particular, we use dyadic splitting of integrals, a powerful Fourier analysis tool that enables us to control the Fourier transform of a function that has a singularity at the origin.
There are no more papers matching your filters at the moment.