alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Montreal Neurological Institute McGill University logo

McGill University

The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning

13 Nov 2025

Université de Montréal Polytechnique Montréal logo

Polytechnique Montréal

Reinforcement learning (RL) has recently become a strong recipe for training reasoning LLMs that produce long chains of thought (LongCoT). Yet the standard RL "thinking environment", where the state is the prompt plus all prior reasoning tokens, makes the state unbounded and forces attention-based policies to pay quadratic compute as thoughts lengthen. We revisit the environment itself. We propose Markovian Thinking, a paradigm in which the policy advances reasoning while conditioning on a constant-size state, decoupling thinking length from context size. As an immediate consequence this yields linear compute with constant memory. We instantiate this idea with Delethink, an RL environment that structures reasoning into fixed-size chunks. Within each chunk, the model thinks as usual; at the boundary, the environment resets the context and reinitializes the prompt with a short carryover. Through RL, the policy learns to write a textual state near the end of each chunk sufficient for seamless continuation of reasoning after reset. Trained in this environment, an R1-Distill 1.5B model reasons in 8K-token chunks yet thinks up to 24K tokens, matching or surpassing LongCoT-RL trained with a 24K budget. With test-time scaling, Delethink continues to improve where LongCoT plateaus. The effect of linear compute is substantial: we empirically estimate at 96K average thinking length LongCoT-RL costs 27 H100-months vs. 7 for Delethink. Analysis at RL initialization shows off-the-shelf reasoning models (1.5B-120B) often sample Markovian traces zero-shot across diverse benchmarks, providing positive samples that make RL effective at scale. Our results show that redesigning the thinking environment is a powerful lever: it enables very long reasoning without quadratic overhead and opens a path toward efficient, scalable reasoning LLMs.

#agents #chain-of-thought #computer-science

Paper thumbnail

GraSS: Combining Graph Neural Networks with Expert Knowledge for SAT Solver Selection

17 May 2024

Huawei Noah’s Ark Lab McGill University logo

McGill University

GraSS (Graph Neural Network SAT Solver Selector) is a machine learning approach that automatically selects the most appropriate SAT solver for a given problem instance by combining graph neural networks with domain expertise. The method achieves approximately 40% reduction in average solving time on the LEC dataset compared to traditional selection techniques.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

13 Apr 2023

New York University Mila - Quebec AI Institute logo

Mila - Quebec AI Institute

I-JEPA presents a Joint-Embedding Predictive Architecture (JEPA) for self-supervised image learning that predicts abstract representations of masked image blocks. This approach achieves competitive performance on ImageNet-1K linear evaluation and dense prediction tasks while significantly reducing pretraining computational costs by over 10x compared to prior methods like MAE.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Resources 2,909

Paper thumbnail

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

13 Jun 2025

Wenhao Chai

University of Washington University of Waterloo logo

University of Waterloo

LiveCodeBench Pro introduces a new benchmark for evaluating Large Language Models (LLMs) in competitive programming using real-time problem collection and expert human analysis. The benchmark provides fine-grained diagnostics, revealing that frontier LLMs still exhibit significant limitations, particularly on hard and observation-heavy problems, despite strong implementation capabilities.

#agents #computer-science #artificial-intelligence

Paper thumbnail

It Takes Two: Your GRPO Is Secretly DPO

01 Oct 2025

Huawei Noah’s Ark Lab Zhejiang University logo

Zhejiang University

Research establishes a theoretical link between Group Relative Policy Optimization (GRPO) and Direct Preference Optimization (DPO) by reinterpreting GRPO as a contrastive learning objective. This insight leads to "2-GRPO," a variant that achieves comparable mathematical reasoning performance to standard GRPO while reducing training time by over 70% and requiring only 1/8 of the rollouts.

#agents #computer-science #contrastive-learning

Paper thumbnail

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

12 Jun 2023

allen-nie

Allen Nie

jos-rozen

Jos Rozen

sgdgp

Sayan Ghosh

ETH Zurich KAIST logo

A large-scale and diverse benchmark, BIG-bench, was introduced to rigorously evaluate the capabilities and limitations of large language models across 204 tasks. The evaluation revealed that even state-of-the-art models currently achieve aggregate scores below 20 (on a 0-100 normalized scale), indicating significantly lower performance compared to human experts.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation

30 Jan 2024

Université de Montréal McGill University logo

McGill University

Large Language Models have emerged as prime candidates to tackle misinformation mitigation. However, existing approaches struggle with hallucinations and overconfident predictions. We propose an uncertainty quantification framework that leverages both direct confidence elicitation and sampled-based consistency methods to provide better calibration for NLP misinformation mitigation solutions. We first investigate the calibration of sample-based consistency methods that exploit distinct features of consistency across sample sizes and stochastic levels. Next, we evaluate the performance and distributional shift of a robust numeric verbalization prompt across single vs. two-step confidence elicitation procedure. We also compare the performance of the same prompt with different versions of GPT and different numerical scales. Finally, we combine the sample-based consistency and verbalized methods to propose a hybrid framework that yields a better uncertainty estimation for GPT models. Overall, our work proposes novel uncertainty quantification methods that will improve the reliability of Large Language Models in misinformation mitigation applications.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

04 May 2025

fuyuanlyu

Fuyuan Lyu

sun-ze-xu

孙泽旭

test-time-scaling-llm

Test Time Scaling LLM

City University of Hong Kong Renmin University of China logo

Renmin University of China

A comprehensive survey introduces a unified, four-axis taxonomy to systematically organize the rapidly growing field of Test-Time Scaling (TTS) in Large Language Models (LLMs). This work provides a structured framework for classifying methods, offers practical guidelines for deployment, and identifies critical open challenges for future research in enhancing LLM performance during inference.

#agents #chain-of-thought #computer-science

Paper thumbnail

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

10 Nov 2025

University of Chicago McGill University logo

McGill University

Researchers from Scale AI and partner universities created RESEARCHRUBRICS, a human-curated benchmark with 101 prompts and over 2,500 expert-authored rubrics for evaluating Deep Research agents. It reveals that current state-of-the-art agents achieve 60-67% rubric compliance and consistently struggle with implicit reasoning and multi-document synthesis.

#agents #computer-science #artificial-intelligence

Paper thumbnail

An Introduction to Vision-Language Modeling

27 May 2024

yunyangx

Yunyang Xiong

mark-ibrahim

Mark Ibrahim

zhiqiu-lin

Zhiqiu Lin

University of Toronto Carnegie Mellon University logo

Carnegie Mellon University

Bordes et al. present a structured introduction to Vision-Language Models (VLMs), classifying them into four architectural families and detailing practical aspects of their training, data curation, and evaluation methodologies, including extensions to video understanding and future research directions.

#computer-science #machine-learning #image-generation

Paper thumbnail

RL Fine-Tuning Heals OOD Forgetting in SFT

01 Nov 2025

Google DeepMind Mila - Quebec AI Institute logo

Mila - Quebec AI Institute

Research from Mila and associated universities identifies a phenomenon termed "OOD forgetting" during Supervised Fine-Tuning (SFT), where out-of-distribution reasoning performance declines after an early peak. Reinforcement Learning (RL) fine-tuning subsequently functions as an "OOD restoration mechanism," recovering lost capabilities by conditionally re-aligning singular vectors, while singular values of parameter matrices remain stable.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

A Survey on Vision-Language-Action Models for Autonomous Driving

30 Jun 2025

maxjiang

Sicong Jiang

Tsinghua University McGill University logo

McGill University

This comprehensive survey provides the first structured overview of Vision-Language-Action (VLA) models for Autonomous Driving (VLA4AD), consolidating fragmented research by categorizing over 20 models and their evolutionary stages. It identifies key architectural components, relevant datasets, and pressing challenges, aiming to guide future development of interpretable and robust autonomous vehicles.

#autonomous-vehicles #computer-science #artificial-intelligence

Paper thumbnail

VinePPO: Refining Credit Assignment in RL Training of LLMs

03 Jun 2025

Université de Montréal McGill University logo

McGill University

VinePPO refines credit assignment in reinforcement learning for large language models by replacing PPO's learned value network with unbiased Monte Carlo estimations, leveraging the inherent ability to reset to intermediate states in language environments. This approach consistently outperforms standard PPO and other baselines on mathematical reasoning tasks, achieving higher accuracy in less wall-clock time and demonstrating improved generalization.

#computer-science #computation-and-language #machine-learning

Paper thumbnail

PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation

26 Sep 2025

Université de Montréal McGill University logo

McGill University

PipelineRL accelerates on-policy reinforcement learning for large language models, particularly in long sequence generation tasks, achieving approximately 2x faster learning and throughput compared to conventional methods. This approach, validated on 128 H100 GPUs, maintains comparable sample efficiency and on-policyness, enhancing the scalability of LLM training for complex reasoning tasks.

#computer-science #machine-learning #distributed-learning

Paper thumbnail

GFlowNet Foundations

10 Jul 2023

Université de Montréal Stanford University logo

Stanford University

This paper from Mila establishes a comprehensive theoretical foundation for Generative Flow Networks (GFlowNets), a framework for amortized probabilistic inference over structured, combinatorial spaces. It introduces a novel "detailed balance" training objective and demonstrates how GFlowNets can be used to estimate intractable quantities like free energy, entropy, and mutual information, providing an alternative to MCMC for diverse sampling.

#active-learning #computer-science #artificial-intelligence

Paper thumbnail

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

02 Apr 2024

Google DeepMind McGill University logo

McGill University

Mixture-of-Depths (MoD) is a novel approach for transformer-based language models that dynamically allocates computational resources based on token importance. This method allows MoD models to match or exceed the performance of vanilla transformers while reducing FLOPs by 20-50%.

#attention-mechanisms #computer-science #computation-and-language

Paper thumbnail

Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time

03 Nov 2025

University College London McGill University logo

McGill University

Tan et al. introduce "inoculation prompting," a training-time technique that prepends a system prompt describing an undesirable trait to finetuning data, thereby suppressing the expression of that trait at test time. This method effectively mitigates emergent misalignment, defends against backdoor attacks, and prevents subliminal learning, achieving near-zero expression of unwanted behaviors without impacting desired model capabilities.

#adversarial-robustness #ai-for-cybersecurity #computer-science

Paper thumbnail

MMTEB: Massive Multilingual Text Embedding Benchmark

13 Nov 2025

mareksuppa

Marek Suppa

University of Washington University of Amsterdam logo

University of Amsterdam

A collaborative effort produced MMTEB, the Massive Multilingual Text Embedding Benchmark, which offers over 500 quality-controlled evaluation tasks across more than 250 languages and 10 categories. The benchmark incorporates significant computational optimizations to enable accessible evaluation and reveals that instruction tuning enhances model performance, with smaller, broadly multilingual models often outperforming larger, English-centric models in low-resource contexts.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

02 Nov 2025

nicolas-chapados

Nicolas Chapados

ahmed-masry

Ahmed Masry

University of Waterloo Université de Montréal logo

Université de Montréal

Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models hinges on having a good connector that maps visual features generated by a vision encoder to a shared embedding space with the LLM while preserving semantic similarity. Existing connectors, such as multilayer perceptrons (MLPs), lack inductive bias to constrain visual features within the linguistic structure of the LLM's embedding space, making them data-hungry and prone to cross-modal misalignment. In this work, we propose a novel vision-text alignment method, AlignVLM, that maps visual features to a weighted average of LLM text embeddings. Our approach leverages the linguistic priors encoded by the LLM to ensure that visual features are mapped to regions of the space that the LLM can effectively interpret. AlignVLM is particularly effective for document understanding tasks, where visual and textual modalities are highly correlated. Our extensive experiments show that AlignVLM achieves state-of-the-art performance compared to prior alignment methods, with larger gains on document understanding tasks and under low-resource setups. We provide further analysis demonstrating its efficiency and robustness to noise.

#computer-science #computation-and-language #embedding-methods

Paper thumbnail

AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving

28 Sep 2025

Tsinghua University McGill University logo

McGill University

AgentThink unifies Chain-of-Thought reasoning with dynamic, agent-style tool invocation for Vision-Language Models in autonomous driving. The framework achieves state-of-the-art performance on the DriveLMM-o1 benchmark, improving overall reasoning scores by 53.91% and final answer accuracy by 33.54% while enhancing interpretability and robustness.

#agents #autonomous-vehicles #chain-of-thought

Paper thumbnail

There are no more papers matching your filters at the moment.