Ask or search anything...

History

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Hot

MiroMind

DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving

22 Sep 2025

Ye Lee

Chinese Academy of Sciences MiroMind

DriveDPO introduces a two-stage policy learning framework for end-to-end autonomous driving, integrating unified policy distillation with a novel Safety Direct Preference Optimization (DPO). This approach yields a new state-of-the-art PDMS of 90.0 on the NAVSIM benchmark, outperforming DiffusionDrive by 1.9 points and WOTE by 2.0 points.

View blog

#autonomous-vehicles #computer-science #computer-vision-and-pattern-recognition

Resources

292

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

15 May 2025

National University of Singapore Fudan University logo

Fudan University

This survey by MiroMind and affiliated universities synthesizes open-source replication studies of DeepSeek-R1, detailing methodologies for reasoning language models (RLMs) through supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR). It highlights the critical role of high-quality data and adapted RL algorithms in achieving strong reasoning capabilities, also identifying emerging research directions and challenges.

View blog

#chain-of-thought #computer-science #computation-and-language

Resources

53,141

Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective

22 Jun 2025

Alibaba Group

University of California, San Diego

A novel prompt design paradigm demonstrates that pruning in-context learning examples into seemingly incoherent "gibberish" can consistently improve large language model performance across various tasks, challenging conventional prompt engineering wisdom. The PROMPTQUINE evolutionary search framework effectively discovers these unconventional prompts, providing insights into LLM behavior and highlighting vulnerabilities in current AI alignment techniques.

View blog

#computer-science #artificial-intelligence #computation-and-language

Resources

185

Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding

12 May 2025

Alibaba Group

Nanyang Technological University

A neural psychoacoustic coding framework called MUFFIN enables high-fidelity audio compression through multi-band spectral quantization and modified snake activation functions, achieving state-of-the-art reconstruction quality at 12.5 Hz compression rates while preserving critical acoustic details across speech, music and environmental sound domains.

View blog

#computer-science #sound #audio-and-speech-processing

Resources 19

234

MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback

25 Oct 2025

Shanghai Artificial Intelligence Laboratory

University of Science and Technology of China

This research introduces experiment-guided hypothesis ranking for scientific discovery, developing a high-fidelity simulator (CSX-Sim) and an in-context reinforcement learning (ICRL) framework with the CSX-Rank agent. This system significantly reduces the number of experimental trials needed to identify optimal hypotheses, requiring 15.196 trials compared to 28.608 for pre-experiment ranking on the TOMATO-chem dataset, thus enabling more efficient and cost-effective scientific exploration.

View blog

#active-learning #causal-inference #computer-science

Resources

On the Role of Difficult Prompts in Self-Play Preference Optimization

07 Oct 2025

Singapore University of Technology and Design Institute for Infocomm Research, A*STAR, Singapore

Self-play preference optimization has emerged as a prominent paradigm for aligning large language models (LLMs). It typically involves a language model to generate on-policy responses for prompts and a reward model (RM) to guide the selection of chosen and rejected responses, which can be further trained with direct preference optimization (DPO). However, the role of prompts remains underexplored, despite being a core component in this pipeline. In this work, we investigate how prompts of varying difficulty influence self-play preference optimization. We first use the mean reward of

N

sampled responses of a prompt as a proxy for its difficulty. We find that difficult prompts exhibit substantially inferior self-play optimization performance in comparison to easy prompts for language models. Moreover, incorporating difficult prompts into training fails to enhance overall performance and, in fact, leads to slight degradation compared to training on easy prompts alone. We also observe that the performance gap between difficult and easy prompts closes as the model capacity increases, suggesting that difficulty interacts with the model capacity. Building on these findings, we explore strategies to mitigate the negative effect of difficult prompts on final performance. We demonstrate that selectively removing an appropriate portion of challenging prompts enhances overall self-play performance, while also reporting failed attempts and lessons learned.

View blog

#computer-science #computation-and-language #data-curation

Resources

NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025

04 Jul 2025

Shanghai Jiao Tong University Peking University logo

Peking University

This report details the NTU Speechlab system developed for the Interspeech 2025 Multilingual Conversational Speech and Language Model (MLC-SLM) Challenge (Task I), where we achieved 5th place. We present comprehensive analyses of our multilingual automatic speech recognition system, highlighting key advancements in model architecture, data selection, and training strategies. In particular, language-specific prompts and model averaging techniques were instrumental in boosting system performance across diverse languages. Compared to the initial baseline system, our final model reduced the average Mix Error Rate from 20.2% to 10.6%, representing an absolute improvement of 9.6% (a relative improvement of 48%) on the evaluation set. Our results demonstrate the effectiveness of our approach and offer practical insights for future Speech Large Language Models.

View blog

#computer-science #conversational-ai #computation-and-language

Resources

IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models

12 Nov 2025

Institute for Infocomm Research (I2R), A*STAR Nanyang Technological University (NTU)

Large language models (LLMs) have demonstrated strong instruction-following capabilities in text-based tasks. However, this ability often deteriorates in multimodal models after alignment with non-text modalities such as images or audio. While several recent efforts have investigated instruction-following performance in text and vision-language models, instruction-following in audio-based large language models remains largely unexplored. To bridge this gap, we introduce IFEval-Audio, a novel evaluation dataset designed to assess the ability to follow instructions in an audio LLM. IFEval-Audio contains 280 audio-instruction-answer triples across six diverse dimensions: Content, Capitalization, Symbol, List Structure, Length, and Format. Each example pairs an audio input with a text instruction, requiring the model to generate an output that follows a specified structure. We benchmark state-of-the-art audio LLMs on their ability to follow audio-involved instructions. The dataset is released publicly to support future research in this emerging area.

View blog

#computer-science #computation-and-language #instruction-tuning

Resources

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

15 May 2025

MiroMind

The recent development of reasoning language models (RLMs) represents a novel evolution in large language models. In particular, the recent release of DeepSeek-R1 has generated widespread social impact and sparked enthusiasm in the research community for exploring the explicit reasoning paradigm of language models. However, the implementation details of the released models have not been fully open-sourced by DeepSeek, including DeepSeek-R1-Zero, DeepSeek-R1, and the distilled small models. As a result, many replication studies have emerged aiming to reproduce the strong performance achieved by DeepSeek-R1, reaching comparable performance through similar training procedures and fully open-source data resources. These works have investigated feasible strategies for supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR), focusing on data preparation and method design, yielding various valuable insights. In this report, we provide a summary of recent replication studies to inspire future research. We primarily focus on SFT and RLVR as two main directions, introducing the details for data construction, method design and training procedure of current replication studies. Moreover, we conclude key findings from the implementation details and experimental results reported by these studies, anticipating to inspire future research. We also discuss additional techniques of enhancing RLMs, highlighting the potential of expanding the application scope of these models, and discussing the challenges in development. By this survey, we aim to help researchers and developers of RLMs stay updated with the latest advancements, and seek to inspire new ideas to further enhance RLMs.

View blog

#computer-science #computation-and-language #data-curation

Resources

There are no more papers matching your filters at the moment.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

Ask or search anything...

Events