DriveDPO introduces a two-stage policy learning framework for end-to-end autonomous driving, integrating unified policy distillation with a novel Safety Direct Preference Optimization (DPO). This approach yields a new state-of-the-art PDMS of 90.0 on the NAVSIM benchmark, outperforming DiffusionDrive by 1.9 points and WOTE by 2.0 points.
View blogThis survey by MiroMind and affiliated universities synthesizes open-source replication studies of DeepSeek-R1, detailing methodologies for reasoning language models (RLMs) through supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR). It highlights the critical role of high-quality data and adapted RL algorithms in achieving strong reasoning capabilities, also identifying emerging research directions and challenges.
View blogA novel prompt design paradigm demonstrates that pruning in-context learning examples into seemingly incoherent "gibberish" can consistently improve large language model performance across various tasks, challenging conventional prompt engineering wisdom. The PROMPTQUINE evolutionary search framework effectively discovers these unconventional prompts, providing insights into LLM behavior and highlighting vulnerabilities in current AI alignment techniques.
View blogA neural psychoacoustic coding framework called MUFFIN enables high-fidelity audio compression through multi-band spectral quantization and modified snake activation functions, achieving state-of-the-art reconstruction quality at 12.5 Hz compression rates while preserving critical acoustic details across speech, music and environmental sound domains.
View blogThis research introduces experiment-guided hypothesis ranking for scientific discovery, developing a high-fidelity simulator (CSX-Sim) and an in-context reinforcement learning (ICRL) framework with the CSX-Rank agent. This system significantly reduces the number of experimental trials needed to identify optimal hypotheses, requiring 15.196 trials compared to 28.608 for pre-experiment ranking on the TOMATO-chem dataset, thus enabling more efficient and cost-effective scientific exploration.
View blog