alphaXiv

History

Papers Benchmarks

Tencent Ethereal Audio Lab

01 Apr 2022

computer-science sound audio-and-speech-processing

ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

Technical University of Berlin Xidian University

Southern University of Science and Technology Indiana University Bloomington Microsoft Corp.Tencent Ethereal Audio Lab

With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is traditionally assessed in subjective tests in laboratories and lately also in crowdsourcing following the international standards from ITU-T Rec. P.800 series. However, those approaches are costly and cannot be applied to customer data. Therefore, an effective objective assessment approach is needed to evaluate or monitor the speech quality of the ongoing conversation. The ConferencingSpeech 2022 challenge targets the non-intrusive deep neural network models for the speech quality assessment task. We open-sourced a training corpus with more than 86K speech clips in different languages, with a wide range of synthesized and live degradations and their corresponding subjective quality scores through crowdsourcing. 18 teams submitted their models for evaluation in this challenge. The blind test sets included about 4300 clips from wide ranges of degradations. This paper describes the challenge, the datasets, and the evaluation methods and reports the final results.

02 Apr 2021

computer-science sound audio-and-speech-processing

INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing

Northwestern Polytechnical University

Carnegie Mellon University

Aalborg University

Johns Hopkins University Beijing Shell Shell Technology Co., Ltd.Tencent Ethereal Audio Lab

The ConferencingSpeech 2021 challenge is proposed to stimulate research on far-field multi-channel speech enhancement for video conferencing. The challenge consists of two separate tasks: 1) Task 1 is multi-channel speech enhancement with single microphone array and focusing on practical application with real-time requirement and 2) Task 2 is multi-channel speech enhancement with multiple distributed microphone arrays, which is a non-real-time track and does not have any constraints so that participants could explore any algorithms to obtain high speech quality. Targeting the real video conferencing room application, the challenge database was recorded from real speakers and all recording facilities were located by following the real setup of conferencing room. In this challenge, we open-sourced the list of open source clean speech and noise datasets, simulation scripts, and a baseline system for participants to develop their own system. The final ranking of the challenge will be decided by the subjective evaluation which is performed using Absolute Category Ratings (ACR) to estimate Mean Opinion Score (MOS), speech MOS (S-MOS), and noise MOS (N-MOS). This paper describes the challenge, tasks, datasets, and subjective evaluation. The baseline system which is a complex ratio mask based neural network and its experimental results are also presented.

04 May 2023

computer-science sound audio-and-speech-processing

Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

Tencent AI Lab Tencent Ethereal Audio Lab

Deep learning has been recently introduced for efficient acoustic howling suppression (AHS). However, the recurrent nature of howling creates a mismatch between offline training and streaming inference, limiting the quality of enhanced speech. To address this limitation, we propose a hybrid method that combines a Kalman filter with a self-attentive recurrent neural network (SARNN) to leverage their respective advantages for robust AHS. During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN). During streaming inference, the DNN's parameters are fixed while its output serves as a reference signal for updating the Kalman filter. Evaluation in both offline and streaming inference scenarios using simulated and real-recorded data shows that the proposed method efficiently suppresses howling and consistently outperforms baselines.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing

Hybrid AHS: A Hybrid of Kalman Filter and Deep Learning for Acoustic Howling Suppression

Events

AI for Law

Personalize Your Feed