Nanjing UniversityChinaNanjing University researchers developed Evidence Pattern Retrieval (EPR) to improve Knowledge Graph Question Answering by explicitly modeling structural dependencies among evidence facts during subgraph extraction. The method achieved a new state-of-the-art for Information Retrieval (IR) approaches on the ComplexWebQuestions benchmark, reaching 60.6% Hits@1 and 61.2% F1 score.
View blogResearchers from MMLab, Tsinghua University, Kuaishou Technology, and Shanghai AI Lab developed Flow-GRPO, a framework that integrates online policy gradient reinforcement learning into flow matching models. This method significantly enhances capabilities in compositional image generation, visual text rendering, and human preference alignment, achieving up to 95% GenEval accuracy on SD3.5-M by addressing challenges of determinism and sampling efficiency through an ODE-to-SDE conversion and denoising reduction.
View blogResearchers from multiple institutions provide a comprehensive analysis of Vertical Federated Learning (VFL), establishing a general framework, identifying its distinct challenges, and evaluating solutions. The work empirically quantifies the trade-offs between privacy, communication efficiency, computational load distribution, and model performance in VFL systems.
View blogThis paper identifies and characterizes a universal policy entropy collapse in reinforcement learning for large language models (LLMs), revealing an empirical law that links performance to entropy. It further provides a mechanistic understanding of this phenomenon through covariance analysis and proposes two covariance-aware regularization methods, Clip-Cov and KL-Cov, which successfully maintain higher entropy and improve LLM reasoning performance on math and coding tasks.
View blogLUFFY introduces a framework that enhances Large Reasoning Models (LRMs) by integrating off-policy guidance into Reinforcement Learning with Verifiable Rewards (RLVR). This approach enables LRMs to acquire new reasoning capabilities from stronger external policies, achieving state-of-the-art performance on math benchmarks, superior generalization on out-of-distribution tasks, and successfully training weaker foundation models where on-policy methods fail.
View blogResearchers systematically analyzed reinforcement learning techniques for enhancing large language model reasoning, demonstrating that a minimalist combination of two empirically validated methods can consistently outperform more complex, multi-trick algorithms. The work clarifies the conditional effectiveness of various RL components across different model scales, alignment statuses, and data difficulties.
View blogThe SpatialVID dataset provides 7,089 hours of real-world dynamic video annotated with explicit per-frame camera poses, depth maps, dynamic object masks, and detailed semantic descriptions including structured camera motion instructions. This large-scale, multimodal dataset bridges the gap between video content and 3D geometry, foundational for training advanced 3D-aware video generation and embodied AI models.
View blogA comprehensive synthesis of Large Language Models for automated software development covers the entire model lifecycle, from data curation to autonomous agents, and offers practical guidance derived from empirical experiments on pre-training, fine-tuning, and reinforcement learning, alongside a detailed analysis of challenges and future directions.
View blogThis comprehensive survey from a large multi-institutional collaboration examines "Latent Reasoning" in Large Language Models, an emerging paradigm that performs multi-step inference entirely within the model's high-bandwidth continuous hidden states to overcome the limitations of natural language-based explicit reasoning. It highlights the significant bandwidth advantage of latent representations (approximately 2700x higher) and provides a unified taxonomy of current methodologies.
View blogInternVL3 establishes a new native multimodal pre-training paradigm for MLLMs, allowing the model to jointly acquire visual and linguistic capabilities from the outset. This approach achieves state-of-the-art performance among open-source models, reaching 72.2 on the MMMU benchmark, and demonstrates strong competitiveness with leading proprietary models across a wide range of multimodal tasks.
View blogThyme introduces a paradigm for multimodal large language models (MLLMs) to enhance reasoning and perception by autonomously generating and executing code for image manipulation and computation. This approach achieves substantial performance improvements across nearly 20 benchmarks, frequently outperforming larger models in high-resolution perception tasks.
View blogResearchers from Alibaba Group provide a comprehensive overview of unified models capable of both understanding and generating multimodal content, primarily focusing on vision and language. The work systematically classifies architectural paradigmsdiffusion-based, autoregressive-based, and fused approacheswhile detailing current challenges, emerging solutions, and future research opportunities.
View blogAFLOW introduces an automated framework for generating and optimizing agentic workflows for Large Language Models, reformulating workflow optimization as a search problem over code-represented workflows. The system leverages Monte Carlo Tree Search with LLM-based optimization to iteratively refine workflows, yielding a 19.5% average performance improvement over existing automated methods while enabling smaller, more cost-effective LLMs to achieve performance parity with larger models.
View blogSegFormer presents an efficient and robust Transformer-based framework for semantic segmentation, outperforming prior methods in accuracy while significantly reducing model size and computational cost. The model achieves state-of-the-art results on ADE20K, Cityscapes, and COCO-Stuff, showcasing superior efficiency and robustness to common corruptions.
View blogA comprehensive survey by researchers from Shanghai AI Lab and various global institutions outlines the intricate relationship between scientific large language models (Sci-LLMs) and their data foundations, tracing their evolution towards autonomous agents for scientific discovery. The paper establishes a taxonomy for scientific data and knowledge, meticulously reviews over 270 datasets and 190 benchmarks, and identifies critical data challenges alongside future paradigms.
View blogMetaGPT introduces a meta-programming framework that simulates a software company with specialized LLM agents following Standardized Operating Procedures (SOPs) and an assembly line paradigm. The system significantly improves the coherence, accuracy, and executability of generated code for complex software development tasks, achieving state-of-the-art results on benchmarks like HumanEval and MBPP, and outperforming other multi-agent systems on a comprehensive software development dataset.
View blog