With the advances in speech communication systems such as online conferencing
applications, we can seamlessly work with people regardless of where they are.
However, during online meetings, speech quality can be significantly affected
by background noise, reverberation, packet loss, network jitter, etc. Because
of its nature, speech quality is traditionally assessed in subjective tests in
laboratories and lately also in crowdsourcing following the international
standards from ITU-T Rec. P.800 series. However, those approaches are costly
and cannot be applied to customer data. Therefore, an effective objective
assessment approach is needed to evaluate or monitor the speech quality of the
ongoing conversation. The ConferencingSpeech 2022 challenge targets the
non-intrusive deep neural network models for the speech quality assessment
task. We open-sourced a training corpus with more than 86K speech clips in
different languages, with a wide range of synthesized and live degradations and
their corresponding subjective quality scores through crowdsourcing. 18 teams
submitted their models for evaluation in this challenge. The blind test sets
included about 4300 clips from wide ranges of degradations. This paper
describes the challenge, the datasets, and the evaluation methods and reports
the final results.