Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges
BibTex
Copy
@misc{wang2025alignmentsafetylarge,
title={Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges},
author={Tao Wang and Wei Zhang and Chao Huang and Ziyu Liu and Jing Zhang and Zhengliang Liu and Tianming Liu and Lin Zhao and Wei Liu and Yifan Zhou and Dajiang Zhu and Lu Zhang and Junhao Chen and Hanqi Jiang and Yi Pan and Jincheng Yu and Weidi Luo and Ruidong Zhang and Peilong Wang and Wei Ruan and Arif Hassan Zidan and Afrar Jahin and Bolun Sun and Zhen Xiang and Haoran Lu and Jiahui Li and Ke Deng and Zeliang Sun and Luyang Fang and Ping Ma and Qin Lu and Xin Xing and Xinliang Li and Rongjie Liu and Yiwen Liu and Fei Dou and Xiaoxiao Sun and Huimin Cheng and Shushan Wu and Lin Tang and Jinwen Xu and Haotian Xiang and Jiazhang Cai and Terry Ma and Wenxuan Zhong and Mengrui Zhang and Xilin Gong and Yongkai Chen and Yingchuan Zhang and Meizhi Yu},
year={2025},
eprint={2507.19672},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.19672},
}
Transform this paper into an audio lecture
Get an engaging lecture and Q&A format to quickly understand the paper in minutes, perfect for learning on the go.