alphaXiv

History

Papers Benchmarks

Shanda AI Research Institute

432

01 Mar 2025

computer-science computation-and-language machine-learning

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Alibaba Group

National University of Singapore Hupan Lab Shanda AI Research Institute

Guanzheng Chen

LongPO is a novel alignment method that enables Large Language Models to self-evolve their long-context capabilities by leveraging internal knowledge and self-generated short-to-long preference data. This approach circumvents the need for expensive human annotation of long-context data while simultaneously preserving short-context performance. LongPO-trained Mistral-7B-128K achieved an ∞Bench score of 39.27, surpassing GLM-4-9B and showing comparable or superior performance to GPT-4-128K, while fully retaining the base model's short-context capabilities.

28 Jun 2025

computer-science computation-and-language

Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization

National University of Singapore

the University of Tokyo Singapore University of Technology and Design Institute for Infocomm Research, A*STAR Shanda AI Research Institute

Researchers from SUTD, NUS, The University of Tokyo, MiroMind, and A*STAR developed a scalable strategy for constructing preference data for Direct Preference Optimization (DPO), overcoming the limitations of conventional methods that fail to improve performance with increased sampling. Their approach, which selects a rejected response from a statistically meaningful point (e.g., lowest of 5 samples) and a chosen response from the highest among all available samples, consistently enhances LLM alignment, achieving up to a 3 percentage point gain in AlpacaEval 2 LC win rate for Llama-3-8B-Instruct.

01 Mar 2025

computer-science computation-and-language machine-learning

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Alibaba Group

National University of Singapore Hupan Lab DAMO Academy Shanda AI Research Institute

Large Language Models (LLMs) have demonstrated remarkable capabilities through pretraining and alignment. However, superior short-context LLMs may underperform in long-context scenarios due to insufficient long-context alignment. This alignment process remains challenging due to the impracticality of human annotation for extended contexts and the difficulty in balancing short- and long-context performance. To address these challenges, we introduce LongPO, that enables short-context LLMs to self-evolve to excel on long-context tasks by internally transferring short-context capabilities. LongPO harnesses LLMs to learn from self-generated short-to-long preference data, comprising paired responses generated for identical instructions with long-context inputs and their compressed short-context counterparts, respectively. This preference reveals capabilities and potentials of LLMs cultivated during short-context alignment that may be diminished in under-aligned long-context scenarios. Additionally, LongPO incorporates a short-to-long KL constraint to mitigate short-context performance decline during long-context alignment. When applied to Mistral-7B-Instruct-v0.2 from 128K to 512K context lengths, LongPO fully retains short-context performance and largely outperforms naive SFT and DPO in both long- and short-context tasks. Specifically, LongPO-trained models can achieve results on long-context benchmarks comparable to, or even surpassing, those of superior LLMs (e.g., GPT-4-128K) that involve extensive long-context annotation and larger parameter scales. Our code is available at this https URL.

25 Mar 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Zhejiang University DAMO Academy, Alibaba Group Shanda AI Research Institute

Video Large Language Models (Video LLMs) have recently exhibited remarkable capabilities in general video understanding. However, they mainly focus on holistic comprehension and struggle with capturing fine-grained spatial and temporal details. Besides, the lack of high-quality object-level video instruction data and a comprehensive benchmark further hinders their advancements. To tackle these challenges, we introduce the VideoRefer Suite to empower Video LLM for finer-level spatial-temporal video understanding, i.e., enabling perception and reasoning on any objects throughout the video. Specially, we thoroughly develop VideoRefer Suite across three essential aspects: dataset, model, and benchmark. Firstly, we introduce a multi-agent data engine to meticulously curate a large-scale, high-quality object-level video instruction dataset, termed VideoRefer-700K. Next, we present the VideoRefer model, which equips a versatile spatial-temporal object encoder to capture precise regional and sequential representations. Finally, we meticulously create a VideoRefer-Bench to comprehensively assess the spatial-temporal understanding capability of a Video LLM, evaluating it across various aspects. Extensive experiments and analyses demonstrate that our VideoRefer model not only achieves promising performance on video referring benchmarks but also facilitates general video understanding capabilities.

330

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Events

AI for Law

Personalize Your Feed