From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence
BibTex
Copy
@misc{yangMon Dec 01 2025 16:38:23 GMT+0000 (Coordinated Universal Time)codefoundationmodels,
title={From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence},
author={Jian Yang and Xianglong Liu and Weifeng Lv and Ken Deng and Shawn Guo and Lin Jing and Yizhi Li and Shark Liu and Xianzhen Luo and Yuyu Luo and Changzai Pan and Ensheng Shi and Yingshui Tan and Renshuai Tao and Jiajun Wu and Xianjie Wu and Zhenhe Wu and Daoguang Zan and Chenchen Zhang and Wei Zhang and He Zhu and Terry Yue Zhuo and Kerui Cao and Xianfu Cheng and Jun Dong and Shengjie Fang and Zhiwei Fei and Xiangyuan Guan and Qipeng Guo and Zhiguang Han and Joseph James and Tianqi Luo and Renyuan Li and Yuhang Li and Yiming Liang and Congnan Liu and Jiaheng Liu and Qian Liu and Ruitong Liu and Tyler Loakman and Xiangxin Meng and Chuang Peng and Tianhao Peng and Jiajun Shi and Mingjie Tang and Boyang Wang and Haowen Wang and Yunli Wang and Fanglin Xu and Zihan Xu and Fei Yuan and Ge Zhang and Jiayi Zhang and Xinhao Zhang and Wangchunshu Zhou and Hualei Zhu and King Zhu and Brown Dai and Aishan Liu and Zhoujun Li and Chenghua Lin and Tianyu Liu and Chao Peng and Kai Shen and Libo Qin and Shuangyong Song and Zizheng Zhan and Jiajun Zhang and Jie Zhang and Zhaoxiang Zhang and Bo Zheng},
year={Mon Dec 01 2025 16:38:23 GMT+0000 (Coordinated Universal Time)},
eprint={2511.18538},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.18538},
}
AI Audio Lecture + Q&A
0:00 / 0:00
From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence
Transcript
John: In our course on Advanced Topics in AI for Software Engineering, we've seen a surge of survey papers trying to make sense of the rapid progress in code generation. Today's lecture is on "From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence."
John: While we've discussed works like "A Survey on Large Language Models for Code Generation," this paper, a massive collaboration involving Beihang University, Alibaba, and Huawei, aims to provide a more holistic roadmap as the field shifts from simple code completion to fully autonomous software engineering agents. It’s an attempt to structure the entire lifecycle. Yes, Noah?
Noah: Excuse me, Professor. You mentioned other surveys. How does this one differentiate itself? Is it just a broader scope or is there a different contribution?
John: That's the central question. It's not just about breadth. The key differentiator is its focus on the complete model lifecycle and its inclusion of original empirical experiments. It aims to bridge what the authors call the 'research-practice gap'—the disconnect between academic benchmarks and the challenges of real-world deployment.
John: The authors systematically trace the entire pipeline, from data curation and pre-training strategies to supervised fine-tuning, reinforcement learning, and finally, safety considerations. They categorize models, from general-purpose LLMs like GPT-4 to code-specialized ones like Code LLaMA, and analyze the architectural trade-offs.
Noah: So it's less of a passive literature review and more of an active guide. What about the empirical side you mentioned? That seems unusual for a survey.
John: Exactly. This is where it becomes a practical guide. The authors conducted their own experiments to derive training recipes. For instance, they investigated scaling laws for pre-training and found significant differences between programming languages. Python, being an interpreted language, benefits more aggressively from increased model and data size but also has a higher irreducible loss compared to compiled languages like Rust or Go.
Noah: So the optimal data-to-parameter ratio isn't universal across languages. Did their experiments offer similar insights for the fine-tuning stage?
John: They did. For supervised fine-tuning, they found performance is highly sensitive to the global batch size, with smaller batches often preserving the gradient signal better. They also observed that Mixture-of-Experts, or MoE, architectures tend to require more training epochs to stabilize and reach peak performance compared to traditional dense models, which converge faster.
Noah: Hold on, if MoE models are more sensitive and need longer tuning, does that limit their practical use for teams without massive compute resources for hyperparameter sweeps?
John: That's a very practical consideration and a valid point. It suggests a trade-off: MoE models can be more parameter-efficient during inference, but may demand more careful and prolonged training. Their guide also extends to reinforcement learning, where they found different advantage estimators optimize for different outcomes. For instance, one might be better for single-pass correctness, while another enhances the diversity of generated solutions.
John: This all ties into the paper's main implication: it's an effort to shift the field toward a more scientific, reproducible approach to building code models. By providing these actionable training guidelines, it offers a blueprint for development. It directly supports the push towards sophisticated agentic systems, a trend also mapped out in surveys like "AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities."
Noah: It sounds like they’re trying to standardize the engineering of these models, rather than leaving it as an exploratory art form.
John: That's an excellent way to frame it. A significant part of that engineering is safety. The paper argues that code LLMs are 'insecure by default' because they learn from vast amounts of flawed public code. It proposes a multi-layered safety framework, covering everything from data provenance and auditing during pre-training to secure execution sandboxes and runtime oversight for agents. This emphasis is critical for moving these tools into production environments.
John: So, the main takeaway is that this work functions as both a comprehensive map of the code intelligence landscape and a practical, evidence-based guide for building better, safer systems. It effectively bridges the gap between high-level academic surveys and niche implementation papers by offering both a systematic review and original, actionable training insights.
John: Thanks for listening. If you have any further questions, ask our AI assistant or drop a comment.