HuaweiA comprehensive synthesis of Large Language Models for automated software development covers the entire model lifecycle, from data curation to autonomous agents, and offers practical guidance derived from empirical experiments on pre-training, fine-tuning, and reinforcement learning, alongside a detailed analysis of challenges and future directions.
View blogSWE-Exp introduces an experience-enhanced framework that enables Large Language Model (LLM) agents to learn from past software issue resolution attempts, transforming problem-solving into a continuous learning process. This approach achieved a 41.6% Pass@1 score on SWE-bench-Verified with DeepSeek-V3-0324, representing a 7.2% relative improvement over prior methods using the same model.
View blogThis paper introduces "Diffusion of Thought (DoT)", a method for integrating Chain-of-Thought (CoT) reasoning into diffusion language models. The approach demonstrates high accuracy and significant efficiency gains on reasoning tasks, outperforming autoregressive counterparts while offering a flexible computation-accuracy trade-off.
View blogStreamForest introduces a novel multimodal large language model architecture designed for efficient online video understanding, leveraging a persistent event memory and a fine-grained spatiotemporal window. The model demonstrates state-of-the-art performance on streaming video benchmarks, including a new autonomous driving benchmark, and maintains robust accuracy even under extreme visual token compression.
View blogSRUM is a post-training framework that enables Unified Multimodal Models (UMMs) to improve their image generation capabilities by using their internal understanding module as a self-sufficient evaluator. It achieves an 88.37 overall score on T2I-CompBench, representing a +3.91 point improvement over the baseline, and demonstrates strong generalization across various complex compositional and reasoning tasks.
View blogResearchers from Shanghai Jiao Tong University, Huawei, and Xidian University developed SWE-Debate, a framework that leverages competitive multi-agent debate and graph-guided fault localization for resolving repository-level software issues. This approach achieved a 41.4% Pass@1 success rate on the SWE-Bench-Verified dataset and an 81.67% file-level localization accuracy on SWE-Bench-Lite.
View blogThe AutoSchemaKG framework constructs knowledge graphs autonomously from web-scale corpora by dynamically inducing schemas, integrating entity, event, and concept nodes. This approach generates billion-scale KGs that enhance the factuality and multi-hop question answering capabilities of large language models.
View blogdLLM-Var, developed at Shanghai Jiao Tong University's EPIC Lab, enables diffusion-based large language models to generate text with native variable lengths, achieving a 30.1x speedup over traditional dLLMs and a 2.4x speedup over autoregressive models, while maintaining competitive accuracy and demonstrating self-correction capabilities.
View blogVTimeCoT equips multimodal large language models with a "visual clock" to perform precise video temporal grounding and reasoning by allowing them to "think by drawing" directly on video timelines. The framework, developed by researchers from Shanghai Jiao Tong University, Noah’s Ark Lab, and Imperial College London, achieved substantial performance gains on temporal grounding and long-video question answering benchmarks without requiring additional training.
View blogThe OSUM-EChat system developed by the Audio, Speech and Language Processing Group at Northwestern Polytechnical University enhances end-to-end empathetic spoken chatbots by integrating an understanding-driven training strategy and a linguistic-paralinguistic dual think mechanism. It achieved a GPT-4 score of 72.0 on a new EChat-eval benchmark for multi-label empathy, demonstrating improved empathetic responsiveness and efficient speech understanding without relying on massive, proprietary datasets.
View blogDeepFM integrates Factorization Machines (FM) and Deep Neural Networks (DNN) into a single, end-to-end trainable model to predict Click-Through Rate (CTR) by simultaneously capturing both low-order and high-order feature interactions without manual feature engineering. The model consistently outperformed nine baseline models on large datasets, achieving up to 0.48% higher AUC than Wide & Deep variants on a commercial app store dataset.
View blogDiffKV introduces a framework for large language models that optimizes Key-Value (KV) cache memory through differentiated compression and an on-GPU parallel compaction manager. This approach achieves substantial memory compression and throughput improvements with minimal accuracy degradation, particularly for complex reasoning and long-context generation tasks.
View blogAsyncFlow introduces an asynchronous streaming reinforcement learning framework for efficient large language model post-training, achieving an average throughput gain of 1.59x and up to 2.03x peak improvement over state-of-the-art baselines on Huawei Ascend NPU clusters. The framework maintains algorithmic stability with negligible differences in reward scores compared to synchronous methods.
View blog