alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Universite de Montreal

Hyperparameter Optimization for Large Language Model Instruction-Tuning

30 Jan 2024

Huawei Noah’s Ark Lab Polytechnique Montreal

The fine-tuning of Large Language Models (LLMs) has enabled them to recently achieve milestones in natural language processing applications. The emergence of ever larger LLMs has paved the way for more efficient fine-tuning methods. Among these, the Low-Rank Adaptation (LoRA) method keeps most of the weights of the pre-trained LLM frozen while introducing a low-rank decomposition of the weight matrix, enabling the tuning of only a very small proportion of the network. The performance on downstream tasks of models fine-tuned with LoRA heavily relies on a set of hyperparameters including the rank of the decomposition. In this work, we investigate the choice of these hyperparameters through two main blackbox optimization (BBO) techniques. We examine the whole pipeline of performing fine-tuning and validation on a pre-trained LLM as a blackbox and efficiently explore the space of hyperparameters with the \nomad algorithm, achieving a boost in performance and human alignment of the tuned model.

#bayesian-optimization #computer-science #computation-and-language

Paper thumbnail

It Takes Two: Your GRPO Is Secretly DPO

01 Oct 2025

Huawei Noah’s Ark Lab Zhejiang University logo

Zhejiang University

Research establishes a theoretical link between Group Relative Policy Optimization (GRPO) and Direct Preference Optimization (DPO) by reinterpreting GRPO as a contrastive learning objective. This insight leads to "2-GRPO," a variant that achieves comparable mathematical reasoning performance to standard GRPO while reducing training time by over 70% and requiring only 1/8 of the rollouts.

#agents #computer-science #contrastive-learning

Paper thumbnail

A Cookbook of Self-Supervised Learning

28 Jun 2023

amir-bar

Amir Bar

jonas-geiping

Jonas Geiping

mark-ibrahim

Mark Ibrahim

New York University Meta logo

This paper offers a comprehensive guide to self-supervised learning (SSL), systematizing diverse methods into coherent families and providing practical implementation advice. It aims to make the rapidly evolving field more accessible by distilling historical context, theoretical underpinnings, and empirical best practices for various data modalities.

#computer-science #contrastive-learning #computer-vision-and-pattern-recognition

Paper thumbnail

AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems

06 Oct 2025

Mila - Quebec AI Institute International Laboratory on Learning Systems (ILLS)

The AINSTEIN framework evaluates the capacity of Large Language Models (LLMs) to solve AI research problems using only their parametric knowledge. It demonstrates that leading LLMs, through iterative self-critique, can effectively generalize problems from abstracts and generate novel, valid technical solutions, often proposing alternative approaches rather than simply rediscovering existing ones.

#agents #computer-science #artificial-intelligence

Paper thumbnail

Improving GUI Grounding with Explicit Position-to-Coordinate Mapping

03 Oct 2025

Mila - Quebec AI Institute McGill University logo

McGill University

GUI grounding, the task of mapping natural-language instructions to pixel coordinates, is crucial for autonomous agents, yet remains difficult for current VLMs. The core bottleneck is reliable patch-to-pixel mapping, which breaks when extrapolating to high-resolution displays unseen during training. Current approaches generate coordinates as text tokens directly from visual features, forcing the model to infer complex position-to-pixel mappings implicitly; as a result, accuracy degrades and failures proliferate on new resolutions. We address this with two complementary innovations. First, RULER tokens serve as explicit coordinate markers, letting the model reference positions similar to gridlines on a map and adjust rather than generate coordinates from scratch. Second, Interleaved MRoPE (I-MRoPE) improves spatial encoding by ensuring that width and height dimensions are represented equally, addressing the asymmetry of standard positional schemes. Experiments on ScreenSpot, ScreenSpot-V2, and ScreenSpot-Pro show consistent gains in grounding accuracy, with the largest improvements on high-resolution interfaces. By providing explicit spatial guidance rather than relying on implicit learning, our approach enables more reliable GUI automation across diverse resolutions and platforms.

#agents #computer-science #artificial-intelligence

Paper thumbnail

Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

15 Oct 2025

johan-s-obando-c

Johan S Obando C

Mila - Quebec AI Institute McGill University logo

McGill University

Simplicial Embeddings (SEM) are integrated as an architectural component in deep reinforcement learning to improve sample efficiency and final performance across actor-critic agents. This approach imposes a geometric inductive bias on latent representations, yielding more stable learning dynamics across a variety of continuous and discrete control tasks.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Planner Aware Path Learning in Diffusion Language Models Training

27 Sep 2025

California Institute of Technology University of Oxford logo

University of Oxford

Diffusion language models have emerged as a powerful alternative to autoregressive models, enabling fast inference through flexible and parallel generation paths. This flexibility is enabled by new sampling strategies, or planners, that iteratively choose where to denoise along the sequence rather than sampling uniformly at random. However, by modifying reverse paths, planners introduce a mismatch between the uniformly random denoising paths used during training and the planning-based paths used at inference. In this work, we systematically investigate this mismatch and theoretically show that the standard discrete diffusion training evidence lower bound (ELBO) does not accurately describe a denoiser under non-uniform planning. To bridge this gap, we derive a new Planned Evidence Lower Bound (P-ELBO) that directly incorporates planner-based reverse dynamics into the training objective. Building on this, we propose Planner Aware Path Learning (PAPL), a simple and effective modification of the standard masked discrete diffusion loss that aligns training and inference under planned denoisers. Empirically, PAPL delivers consistent improvements across domains, including a 40% relative gain in protein sequence modeling, up to a 4x improvement in MAUVE for text generation, and a 23% relative gain in HumanEval pass@10 for code generation.

#computer-science #machine-learning #generative-models

Paper thumbnail

Meta-World+: An Improved, Standardized, RL Benchmark

21 Nov 2025

Google DeepMind Université de Montréal logo

Université de Montréal

Meta-World+ re-engineers the Meta-World benchmark, standardizing its reward functions and updating it for modern reinforcement learning frameworks. The work demonstrates how past undocumented reward changes significantly impacted multi-task reinforcement learning performance and provides a unified, reproducible platform for future research.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Convergence Bound and Critical Batch Size of Muon Optimizer

21 Nov 2025

Université de Montréal Meiji University

Muon, a recently proposed optimizer that leverages the inherent matrix structure of neural network parameters, has demonstrated strong empirical performance, indicating its potential as a successor to standard optimizers such as AdamW. This paper presents theoretical analysis to support its practical success. We provide convergence proofs for Muon across four practical settings, systematically examining its behavior with and without the inclusion of Nesterov momentum and weight decay. Our analysis covers the standard configuration using both, thereby elucidating its real-world performance. We then demonstrate that the addition of weight decay yields strictly tighter theoretical bounds and clarify the interplay between the weight decay coefficient and the learning rate. Finally, we derive the critical batch size for Muon that minimizes the computational cost of training. Our analysis identifies the hyperparameters governing this value, and our experiments validate the corresponding theoretical findings across workloads including image classification and language modeling task.

#computer-science #machine-learning #optimization-methods

Paper thumbnail

Masked Siamese Networks for Label-Efficient Learning

14 Apr 2022

florian-bordes

Florian Bordes

Mila - Quebec AI Institute McGill University logo

McGill University

Masked Siamese Networks (MSN) is a self-supervised learning framework that integrates masked image modeling with joint-embedding architectures to learn visual representations. It achieves state-of-the-art performance in low-shot image classification, improving top-1 accuracy by 11% over DINO on ImageNet-1K with 5 labels per class, while significantly reducing computational costs through aggressive masking.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

Communication Efficient LLM Pre-training with SparseLoCo

05 Nov 2025

Université de Montréal Concordia University

Communication-efficient distributed training algorithms have received considerable interest recently due to their benefits for training Large Language Models (LLMs) in bandwidth-constrained settings, such as across datacenters and over the internet. Despite reducing communication frequency, these methods still typically require communicating a full copy of the model's gradients-resulting in a communication bottleneck even for cross-datacenter links. Furthermore, they can slightly degrade performance compared to a naive AdamW DDP baseline. While quantization is often applied to reduce the pseudo-gradient's size, in the context of LLM pre-training, existing approaches have been unable to additionally leverage sparsification and have obtained limited quantization. In this work, we introduce SparseLoCo, a communication-efficient training algorithm for LLMs that effectively leverages error feedback with Top-k sparsification and 2-bit quantization to reach extreme sparsity as low as 1-3% while outperforming full-precision DiLoCo. Our key observations are that outer momentum can be locally approximated by an error feedback accumulator combined with aggressive sparsity, and that sparse aggregation can actually improve model performance. We empirically demonstrate in a range of communication-constrained LLM training settings that SparseLoCo provides significant benefits in both performance and communication cost.

#computer-science #machine-learning #distributed-learning

Paper thumbnail

ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review

09 Oct 2025

Mila - Quebec AI Institute ServiceNow logo

Peer review is the cornerstone of scientific publishing, yet it suffers from inconsistencies, reviewer subjectivity, and scalability challenges. We introduce ReviewerToo, a modular framework for studying and deploying AI-assisted peer review to complement human judgment with systematic and consistent assessments. ReviewerToo supports systematic experiments with specialized reviewer personas and structured evaluation criteria, and can be partially or fully integrated into real conference workflows. We validate ReviewerToo on a carefully curated dataset of 1,963 paper submissions from ICLR 2025, where our experiments with the gpt-oss-120b model achieves 81.8% accuracy for the task of categorizing a paper as accept/reject compared to 83.9% for the average human reviewer. Additionally, ReviewerToo-generated reviews are rated as higher quality than the human average by an LLM judge, though still trailing the strongest expert contributions. Our analysis highlights domains where AI reviewers excel (e.g., fact-checking, literature coverage) and where they struggle (e.g., assessing methodological novelty and theoretical contributions), underscoring the continued need for human expertise. Based on these findings, we propose guidelines for integrating AI into peer-review pipelines, showing how AI can enhance consistency, coverage, and fairness while leaving complex evaluative judgments to domain experts. Our work provides a foundation for systematic, hybrid peer-review systems that scale with the growth of scientific publishing.

#agents #computer-science #artificial-intelligence

Paper thumbnail

BinaryConnect: Training Deep Neural Networks with binary weights during propagations

18 Apr 2016

Universite de Montreal Ecole Polytechnique de Montreal

BinaryConnect introduces a method to train deep neural networks by constraining weights to binary values (+1 or -1) during forward and backward propagation, while retaining high-precision real-valued weights for updates. This approach significantly reduces computational cost and memory footprint, achieving near state-of-the-art performance on image classification tasks like MNIST, CIFAR-10, and SVHN.

#computer-science #computer-vision-and-pattern-recognition #machine-learning

Paper thumbnail

Possible Evidence for the Presence of Volatiles on the Warm Super-Earth TOI-270 b

17 Sep 2025

University of Washington Heidelberg University

The search for atmospheres on rocky exoplanets is a crucial step in understanding the processes driving atmosphere formation, retention, and loss. Past studies have revealed the existence of planets interior to the radius valley with densities lower than would be expected for pure-rock compositions, indicative of the presence of large volatile inventories which could facilitate atmosphere retention. Here we present an analysis of the JWST NIRSpec/G395H transmission spectrum of the warm (

T_\mathrm{eq,{A_B}=0}

= 569 K) super-Earth TOI-270 b (

R_\mathrm{p}

= 1.306

R_\oplus

), captured alongside the transit of TOI-270 d. The JWST white light-curve transit depth updates TOI-270 b's density to

\rho_\mathrm{p}

= 3.7

\pm

0.5 g/cm

^3

, inconsistent at 4.4

\sigma

with an Earth-like composition. Instead, the planet is best explained by a non-zero, percent-level water mass fraction, possibly residing on the surface or stored within the interior. The JWST transmission spectrum shows possible spectroscopic evidence for the presence of this water as part of an atmosphere on TOI-270 b, favoring a H

_2

O-rich steam atmosphere model over a flat spectrum (

\ln\mathcal{B}

=

0.3-3.2

, inconclusive to moderate), with the exact significance depending on whether an offset parameter between the NIRSpec detectors is included. We leverage the transit of the twice-larger TOI-270 d crossing the stellar disk almost simultaneously to rule out the alternative hypothesis that the transit-light-source effect could have caused the water feature in TOI-270 b's observed transmission spectrum. Planetary evolution modeling furthermore shows that TOI-270 b could sustain a significant atmosphere on Gyr timescales, despite its high stellar irradiation, if it formed with a large initial volatile inventory.

#earth-and-planetary-astrophysics #solar-and-stellar-astrophysics #physics

Paper thumbnail

LEO-Vetter: Fully Automated Flux- and Pixel-Level Vetting of TESS Planet Candidates to Support Occurrence Rates

12 Sep 2025

California Institute of Technology Université de Montréal logo

Université de Montréal

The Transiting Exoplanet Survey Satellite (TESS) has identified several thousand planet candidates orbiting a wide variety of stars, and has provided an exciting opportunity for demographic studies. However, current TESS planet searches require significant manual inspection efforts to identify planets among the enormous number of detected transit-like signatures, which limits the scope of such searches. Demographic studies also require a detailed understanding of the relationship between observed and true exoplanet populations; a task for which current TESS planet catalogs are rendered unsuitable by the subjectivity of vetting by eye. We present LEO-Vetter, a publicly available and fully automated exoplanet vetting system designed after the Kepler Robovetter, which is capable of efficiently producing catalogs of promising planet candidates and making statistically robust TESS demographic studies possible. LEO-Vetter implements flux- and pixel-level tests against noise/systematic false positives and astrophysical false positives. The vetter achieves high completeness (91%) and high reliability against noise/systematic false alarms (97%) based on its performance on simulated data. We demonstrate the usefulness of the vetter by searching ~200,000 M dwarf light curves, and reducing ~20,000 transit-like detections down to 172 uniformly vetted planet candidates. LEO-Vetter facilitates analyses that would otherwise be impractical to perform on all possible signals due to time constraints or computational limitations. Users will be able to efficiently produce their own TESS planet catalog starting with transit-like detections, as well as have the framework needed to characterize their catalog's completeness and reliability for occurrence rates.

#earth-and-planetary-astrophysics #instrumentation-and-methods-for-astrophysics #physics

Paper thumbnail

On Bonus-Based Exploration Methods in the Arcade Learning Environment

22 Sep 2021

Université de Montréal Universite de Montreal

This paper systematically re-evaluates prominent bonus-based exploration methods in deep reinforcement learning using a standardized framework built upon the Rainbow agent across the full Atari 2600 suite. It identifies that while these methods excel on specific benchmarks, their general benefits often do not surpass simpler strategies or may even have negative impacts when integrated with a strong base algorithm.

#agent-based-systems #computer-science #machine-learning

Paper thumbnail

A Computational Framework for Solving Wasserstein Lagrangian Flows

03 Jul 2024

rob-brekelmans

Rob Brekelmans

University of Toronto UT Austin

Researchers from Mila, Vector Institute, and the University of Toronto developed a deep learning framework, Wasserstein Lagrangian Flows (WLF), that unifies various optimal transport (OT) problems by formulating them as action-minimizing curves on probability density manifolds. The framework, leveraging a dual Hamiltonian formulation and neural networks, achieves superior performance in high-dimensional single-cell RNA-sequencing trajectory inference tasks, particularly when incorporating biological priors like mass changes or external potentials.

#computer-science #machine-learning #geometric-deep-learning

Paper thumbnail

A complex structure of escaping helium spanning more than half the orbit of the ultra-hot Jupiter WASP-121\,b

10 Oct 2025

University of Victoria University of Waterloo logo

University of Waterloo

Atmospheric escape of planets on short orbital periods, driven by the host star's irradiation, influences their evolution, composition, and atmospheric dynamics. Our main avenue to probe atmospheric escape is through the near-infrared metastable helium triplet, which has enabled mass loss rate measurements for tens of exoplanets. Among them, only a few studies show evidence for out-of-transit absorption, supporting the presence of a hydrodynamic outflow. However, none of these observations precisely identified the physical extent of the outflow, either due to non-continuous or short-duration observations. This limits our measurements of accurate mass loss rates. Here we present the first continuous, full-orbit helium phase curve monitoring of an exoplanet, the ultra-hot Jupiter WASP-121b, obtained with JWST/NIRISS. It reveals helium absorption for nearly 60% of the orbit at >3sigma significance. Our results show that WASP-121b sustains a strong outflow, separating into two tails trailing and leading the planet. The persistent absorption from these tails, together with their measured radial velocity shifts, suggests that they remain in a collisional fluid regime at large distances from the planet and display very different dynamics. The leading trail has a higher density and moves toward the star, whereas the trailing trail is being pushed away from the star, with the latter being blue-shifted due to stellar irradiation pressure. While qualitatively agreeing with theoretical expectations, the observed structure of helium is not self-consistently reproducible by current models, limiting constraints on the mass loss rate. Furthermore, we show that while ground-based observations of the helium triplet are essential to measure the outflow dynamics precisely, they ideally should be combined with continuous JWST phase curves to constrain the absolute level of helium absorption.

#earth-and-planetary-astrophysics #physics

Paper thumbnail

Persistent Instability in LLM's Personality Measurements: Effects of Scale, Reasoning, and Conversation History

06 Aug 2025

Mila - Quebec AI Institute Delft University of Technology (TU Delft)

Large language models require consistent behavioral patterns for safe deployment, yet their personality-like traits remain poorly understood. We present PERSIST (PERsonality Stability in Synthetic Text), a comprehensive evaluation framework testing 25+ open-source models (1B-671B parameters) across 500,000+ responses. Using traditional (BFI-44, SD3) and novel LLM-adapted personality instruments, we systematically vary question order, paraphrasing, personas, and reasoning modes. Our findings challenge fundamental deployment assumptions: (1) Even 400B+ models exhibit substantial response variability (SD > 0.4); (2) Minor prompt reordering alone shifts personality measurements by up to 20%; (3) Interventions expected to stabilize behavior, such as chain-of-thought reasoning, detailed personas instruction, inclusion of conversation history, can paradoxically increase variability; (4) LLM-adapted instruments show equal instability to human-centric versions, confirming architectural rather than translational limitations. This persistent instability across scales and mitigation strategies suggests current LLMs lack the foundations for genuine behavioral consistency. For safety-critical applications requiring predictable behavior, these findings indicate that personality-based alignment strategies may be fundamentally inadequate.

#agents #computer-science #artificial-intelligence

Paper thumbnail

Wavefunction Flows: Efficient Quantum Simulation of Continuous Flow Models

09 Oct 2025

Mila - Quebec AI Institute IBM Research

This work formally links continuous flow models from machine learning with the Schrödinger equation via a "continuity Hamiltonian," providing an efficient quantum algorithm to prepare quantum samples (qsamples) for distributions learned by these models, which offers advantages for statistical inference tasks like mean estimation.

#computer-science #machine-learning #generative-models

Paper thumbnail

There are no more papers matching your filters at the moment.