MiraclePlus
A Chinese Dataset for Evaluating the Safeguards in Large Language Models

Researchers developed an open-source Chinese dataset to evaluate Large Language Model safeguards, revealing that region-specific cultural and political sensitivities are critical determinants of safety performance, particularly for models not aligned with these contexts, and demonstrating LLMs' vulnerability to subtly disguised harmful prompts.

View blog
Resources
There are no more papers matching your filters at the moment.