alphaXiv

History

Papers Benchmarks

Skywork AI

7,971

28 Oct 2025

agentic-frameworks agents chain-of-thought

Group-in-Group Policy Optimization for LLM Agent Training

Nanyang Technological University Skywork AI

GiGPO introduces a two-level advantage estimation structure to improve Large Language Model agent training in complex, multi-turn environments. This reinforcement learning algorithm consistently surpasses prior RL methods and prompting techniques across diverse benchmarks like ALFWorld and WebShop, demonstrating enhanced success rates and tool efficiency with negligible additional computational cost.

705

10 Dec 2025

agents computer-science computer-vision-and-pattern-recognition

Matrix-game 2.0: An open-source real-time and streaming interactive world model

Skywork AI

Boyang Wang

Recent advances in interactive video generations have demonstrated diffusion model's potential as world models by capturing complex physical dynamics and interactive behaviors. However, existing interactive world models depend on bidirectional attention and lengthy inference steps, severely limiting real-time performance. Consequently, they are hard to simulate real-world dynamics, where outcomes must update instantaneously based on historical context and current actions. To address this, we present Matrix-Game 2.0, an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion. Our framework consists of three key components: (1) A scalable data production pipeline for Unreal Engine and GTA5 environments to effectively produce massive amounts (about 1200 hours) of video data with diverse interaction annotations; (2) An action injection module that enables frame-level mouse and keyboard inputs as interactive conditions; (3) A few-step distillation based on the casual architecture for real-time and streaming video generation. Matrix Game 2.0 can generate high-quality minute-level videos across diverse scenes at an ultra-fast speed of 25 FPS. We open-source our model weights and codebase to advance research in interactive world modeling.

1,658

1,022

03 Jul 2025

agents computer-science artificial-intelligence

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Skywork AI 2050 Research

sicer

Skywork AI developed Skywork-Reward-V2, a suite of state-of-the-art open reward models, by curating SynPref-40M, the largest preference dataset to date, through a two-stage human-AI synergistic pipeline. The models demonstrate improved alignment with human preferences, objective correctness, and resistance to stylistic biases across seven major benchmarks.

113

243

08 Dec 2025

agents computer-science computer-vision-and-pattern-recognition

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Skywork AI

Despite recent progress in multimodal agentic systems, existing approaches often treat image manipulation and web search as disjoint capabilities, rely heavily on costly reinforcement learning, and lack planning grounded in real tool-execution traces. To address these limitations, we present Skywork-R1V4, a 30B (A3B) parameter multimodal agentic model that unifies multimodal planning, active image manipulation ("thinking with images"), deep multimodal search, and, most critically, interleaved reasoning that dynamically alternates between visual operations and external knowledge retrieval. Trained solely via supervised fine-tuning on fewer than 30,000 high-quality, planning-execution-consistent trajectories and validated through stepwise consistency filtering, Skywork-R1V4 achieves state-of-the-art results across perception and multimodal search benchmarks: it scores 66.1 on MMSearch and 67.2 on FVQA, surpassing Gemini 2.5 Flash on all 11 metrics. Skywork-R1V4 exhibits emergent long-horizon reasoning at inference time, successfully orchestrating more than 10 tool calls to solve complex, multi-step tasks. Our results demonstrate that sophisticated agentic multimodal intelligence can be achieved through carefully curated supervised learning alone, without any reliance on reinforcement learning.

3,105

7,056

21 Apr 2025

computer-science computer-vision-and-pattern-recognition generative-models

SkyReels-V2: Infinite-length Film Generative Model

Skywork AI

Skywork AI's SkyReels-V2 introduces an open-source film generative model that achieves state-of-the-art performance in prompt adherence for cinematic grammar and motion quality. The model uniquely enables infinite-length video synthesis through a diffusion-forcing architecture, surpassing existing open-source benchmarks and approaching the quality of leading closed-source models.

4,714

1,110

28 Jun 2024

agent-based-systems computer-science artificial-intelligence

A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist

National University of Singapore

Zhejiang University

Nanyang Technological University Skywork AI

Longtao Zheng

Researchers at Nanyang Technological University and collaborators developed FinAgent, an advanced multimodal foundation agent for financial trading that integrates numerical, textual, and visual market data. This agent consistently outperformed 12 state-of-the-art baselines, achieving over a 36% average improvement in Annual Rate of Return by employing diversified information retrieval, dual-level reflection, and tool augmentation.

1,389

20 Sep 2025

chain-of-thought computer-science artificial-intelligence

Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

National University of Singapore

Nanyang Technological University Skywork AI Beĳing Institute of Technology

Audio-Reasoner, a large audio language model from Nanyang Technological University and National University of Singapore researchers, integrates structured Chain-of-Thought reasoning using a novel 1.2 million-sample CoTA dataset to enhance deep audio reasoning. The model establishes new state-of-the-art performance across diverse audio understanding benchmarks, surpassing leading closed-source models such as GPT-4o and Gemini-1.5-Pro.

262

410

23 Jun 2025

agents computer-science artificial-intelligence

Matrix-Game: Interactive World Foundation Model

Skywork AI

Skywork AI developed Matrix-Game, a 17-billion-parameter interactive world foundation model that generates game worlds from a reference image and user actions. The model achieves superior action controllability and physical consistency, outperforming previous models on a new comprehensive benchmark for Minecraft environments.

731

512

25 Jan 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

Xiamen University Skywork AI

Zhihang Lin

A novel strategy, Visual Tokens Withdrawal (VTW), accelerates Multimodal Large Language Models (MLLMs) inference by removing vision tokens in deep layers, leveraging observed information migration. This approach reduces computational overhead by over 40% and GPU memory by 35% while maintaining performance across various MLLMs and tasks.

863

10 Jul 2025

computer-science computation-and-language computer-vision-and-pattern-recognition

Skywork-R1V3 Technical Report

Skywork AI Kunlun Inc.

Skywork-R1V3, an open-source 38B parameter vision-language model by Skywork AI, Kunlun Inc., employs an innovative reinforcement learning-powered post-training approach to achieve visual reasoning capabilities competitive with leading proprietary models. It demonstrates scores of 76.0% on MMMU and 142/150 on the 2025 GAOKAO math exam, showcasing the transfer of mathematical reasoning skills to broader scientific tasks.

2,940

2,590

08 Nov 2025

agents chain-of-thought computer-science

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Xiamen University East China Normal University Skywork AI China

This paper introduces Completion Pruning Policy Optimization (CPPO) to accelerate the training of reasoning models based on Group Relative Policy Optimization (GRPO). GRPO, while effective, incurs high training costs due to the need to sample multiple completions for each question. Our experiment and theoretical analysis reveal that the number of completions impacts model accuracy yet increases training time multiplicatively, and not all completions contribute equally to policy training -- their contribution depends on their relative advantage. To address these issues, we propose CPPO, which prunes completions with low absolute advantages, significantly reducing the number needed for gradient calculation and updates. Additionally, we introduce a dynamic completion allocation strategy to maximize GPU utilization by incorporating additional questions, further enhancing training efficiency. Experiments show that CPPO achieves up to

7.98\times

speedup on GSM8K and

3.48\times

on Math while preserving or even enhancing the accuracy compared to the original GRPO. We release our code at \href{this https URL}{this https URL}.

152

408

26 Feb 2025

computer-science computer-vision-and-pattern-recognition

Towards Semantic Equivalence of Tokenization in Multimodal LLM

National University of Singapore

Nanyang Technological University Skywork AI

李祥泰

This research introduces SeTok, a Semantic-Equivalent Vision Tokenizer, and integrates it into SETOKIM, a Multimodal Large Language Model, to generate semantically complete visual tokens that dynamically adapt to image content. The approach yields superior performance across tasks like visual understanding, generation, editing, and referring segmentation by enhancing fine-grained vision-language alignment.

424

24 Dec 2024

computer-science artificial-intelligence deep-reinforcement-learning

Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization

Fudan University Skywork AI Kunlun Inc.

The role of reinforcement learning (RL) in enhancing the reasoning of large language models (LLMs) is becoming increasingly significant. Despite the success of RL in many scenarios, there are still many challenges in improving the reasoning of LLMs. One challenge is the sparse reward, which makes optimization difficult for RL and necessitates a large amount of data samples. Another challenge stems from the inherent instability of RL, particularly when using Actor-Critic (AC) methods to derive optimal policies, which often leads to unstable training processes. To address these issues, we introduce Direct Advantage Policy Optimization (DAPO), an novel step-level offline RL algorithm. Unlike standard alignment that rely solely outcome rewards to optimize policies (such as DPO), DAPO employs a critic function to predict the reasoning accuracy at each step, thereby generating dense signals to refine the generation strategy. Additionally, the Actor and Critic components in DAPO are trained independently, avoiding the co-training instability observed in standard AC algorithms like PPO. We train DAPO on mathematical and code query datasets and then evaluate its performance on multiple benchmarks. Our results show that DAPO can effectively enhance the mathematical and code capabilities on both SFT models and RL models, demonstrating the effectiveness of DAPO.

205

23 May 2024

computer-science artificial-intelligence computer-vision-and-pattern-recognition

EditWorld: Simulating World Dynamics for Instruction-Following Image Editing

Peking University Skywork AI Mila - Québec AI Institute Tiamat AI

Researchers from Peking University, Mila - Québec AI Institute, Tiamat AI, and Skywork AI introduced "world-instructed image editing," a task where generative models understand and execute dynamic commands involving causality and temporal changes. They developed EDITWORLD, a dataset and diffusion-based model capable of simulating complex scenarios within images, achieving superior performance over existing baselines in producing logically consistent and visually plausible edits.

126

316

11 Aug 2025

computer-science computer-vision-and-pattern-recognition graphics

Matrix-3D: Omnidirectional Explorable 3D World Generation

Chinese Academy of Sciences Skywork AI Hong Kong University of Science and Technology (Guangzhou)School of Artificial Intelligence, Beijing Normal University

Researchers from Skywork AI and collaborating institutions developed Matrix-3D, a framework that generates geometrically consistent, omnidirectional, and explorable 3D worlds from a single image or text prompt. This approach leverages trajectory-guided panoramic video diffusion and dual 3D reconstruction pipelines, achieving superior visual quality and introducing the Matrix-Pano dataset with detailed camera poses and depth annotations.

533

303

10 Jun 2025

chain-of-thought computer-science artificial-intelligence

A Survey on Large Language Models for Mathematical Reasoning

Nanjing University The Chinese University of Hong Kong, Shenzhen

Nanyang Technological University Skywork AI Polixir.ai

A comprehensive survey analyzes the rapid advancements of Large Language Models in mathematical reasoning, structuring the field by cognitive phases of comprehension and answer generation. It consolidates diverse enhancement methods, identifies persistent challenges, and suggests future research, observing that current improvements largely activate latent reasoning capacities rather than instilling entirely new ones.

341

24 Jun 2025

agentic-frameworks agents computer-science

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Skywork AI Kunlun Inc.

Research from Skywork AI empirically validates data scaling laws for software engineering tasks, demonstrating that performance of Large Language Models continues to improve with increasing volumes of high-quality training data. The work introduces Skywork-SWE, a dataset of 10,169 real-world Python task instances, and fine-tunes Skywork-SWE-32B, which achieves 38.0% pass@1 on SWE-bench Verified, improving to 47.0% with test-time scaling.

390

05 Aug 2025

computer-science computer-vision-and-pattern-recognition image-generation

Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation

Skywork AI

Skywork UniPic introduces a unified autoregressive model, leveraging a 1.5 billion parameter backbone, to seamlessly integrate image understanding, text-to-image generation, and image editing. The model achieves highly competitive performance across diverse multimodal benchmarks while demonstrating exceptional parameter efficiency, enabling deployment on commodity hardware with under 15 GB of GPU memory.

800

4,225

09 Jun 2025

chain-of-thought computer-science computation-and-language

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Skywork AI Kunlun Inc.

We introduce Skywork R1V, a multimodal reasoning model extending the an R1-series Large language models (LLM) to visual modalities via an efficient multimodal transfer method. Leveraging a lightweight visual projector, Skywork R1V facilitates seamless multimodal adaptation without necessitating retraining of either the foundational language model or the vision encoder. To strengthen visual-text alignment, we propose a hybrid optimization strategy that combines Iterative Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO), significantly enhancing cross-modal integration efficiency. Additionally, we introduce an adaptive-length Chain-of-Thought distillation approach for reasoning data generation. This approach dynamically optimizes reasoning chain lengths, thereby enhancing inference efficiency and preventing excessive reasoning overthinking. Empirical evaluations demonstrate that Skywork R1V, with only 38B parameters, delivers competitive performance, achieving a score of 69.0 on the MMMU benchmark and 67.5 on MathVista. Meanwhile, it maintains robust textual reasoning performance, evidenced by impressive scores of 72.0 on AIME and 94.0 on MATH500. The Skywork R1V model weights have been publicly released to promote openness and reproducibility.

1,509

22 Jul 2024

computer-science artificial-intelligence deep-reinforcement-learning

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Nanyang Technological University Skywork AI

DENG YANCHEN

Q* introduces a deliberative planning framework that enhances large language models' multi-step reasoning by integrating a learned Q-value model as a plug-and-play heuristic. This approach guides LLM generation through an A*-like search, significantly improving accuracy on complex math and code generation tasks without requiring fine-tuning of the base LLM.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Group-in-Group Policy Optimization for LLM Agent Training

Matrix-game 2.0: An open-source real-time and streaming interactive world model

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

SkyReels-V2: Infinite-length Film Generative Model

A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist

Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

Matrix-Game: Interactive World Foundation Model

Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

Skywork-R1V3 Technical Report

CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization

EditWorld: Simulating World Dynamics for Instruction-Following Image Editing

Matrix-3D: Omnidirectional Explorable 3D World Generation

A Survey on Large Language Models for Mathematical Reasoning

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Events

AI for Law

Personalize Your Feed