Self-improving large language models (LLMs) -- i.e., to improve the
performance of an LLM by fine-tuning it with synthetic data generated by itself
-- is a promising way to advance the capabilities of LLMs while avoiding
extensive supervision. Existing approaches to self-improvement often rely on
external supervision signals in the form of seed data and/or assistance from
third-party models. This paper presents Crescent -- a simple yet effective
framework for generating high-quality synthetic question-answer data in a fully
autonomous manner. Crescent first elicits the LLM to generate raw questions via
a bait prompt, then diversifies these questions leveraging a rejection
sampling-based self-deduplication, and finally feeds the questions to the LLM
and collects the corresponding answers by means of majority voting. We show
that Crescent sheds light on the potential of true self-improvement with zero
external supervision signals for math reasoning; in particular,
Crescent-generated question-answer pairs suffice to (i) improve the reasoning
capabilities of an LLM while preserving its general performance (especially in
the 0-shot setting); and (ii) distil LLM knowledge to weaker models more
effectively than existing methods based on seed-dataset augmentation.