Task-oriented dialogue systems often assist users with personal or
confidential matters. For this reason, the developers of such a system are
generally prohibited from observing actual usage. So how can they know where
the system is failing and needs more training data or new functionality? In
this work, we study ways in which realistic user utterances can be generated
synthetically, to help increase the linguistic and functional coverage of the
system, without compromising the privacy of actual users. To this end, we
propose a two-stage Differentially Private (DP) generation method which first
generates latent semantic parses, and then generates utterances based on the
parses. Our proposed approach improves MAUVE by 2.5
× and parse tree
function type overlap by 1.3
× relative to current approaches for private
synthetic data generation, improving both on fluency and semantic coverage. We
further validate our approach on a realistic domain adaptation task of adding
new functionality from private user data to a semantic parser, and show overall
gains of 8.5% points in accuracy with the new feature.