alphaXiv

AppTek GmbH

02 Apr 2025

ai-for-health computer-science computation-and-language

Medical Spoken Named Entity Recognition

University of Toronto University of Cincinnati University Health Network FPT Software AI Center Hanoi University of Science and Technology AppTek GmbH College of William and Mary Indiana State University RWTH Aachen University

The VietMed-NER dataset, the first publicly available medical spoken Named Entity Recognition (NER) resource, was developed from real-world Vietnamese medical conversations and features 18 distinct entity types. This work established baseline performance using state-of-the-art models, with XLM-R_large achieving an F1 score of 0.58 on ASR output, and provided detailed error analysis specific to medical entity extraction.

21 Jun 2023

computer-science computation-and-language machine-learning

Mixture Encoder for Joint Speech Separation and Recognition

Paderborn University AppTek GmbH RWTH Aachen University

Multi-speaker automatic speech recognition (ASR) is crucial for many real-world applications, but it requires dedicated modeling techniques. Existing approaches can be divided into modular and end-to-end methods. Modular approaches separate speakers and recognize each of them with a single-speaker ASR system. End-to-end models process overlapped speech directly in a single, powerful neural network. This work proposes a middle-ground approach that leverages explicit speech separation similarly to the modular approach but also incorporates mixture speech information directly into the ASR module in order to mitigate the propagation of errors made by the speech separator. We also explore a way to exchange cross-speaker context information through a layer that combines information of the individual speakers. Our system is optimized through separate and joint training stages and achieves a relative improvement of 7% in word error rate over a purely modular setup on the SMS-WSJ task.

08 Aug 2022

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Medical Spoken Named Entity Recognition

Mixture Encoder for Joint Speech Separation and Recognition

Efficient Training of Neural Transducer for Speech Recognition

HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch

Regularizing Learnable Feature Extraction for Automatic Speech Recognition

Analyzing the Importance of Blank for CTC-Based Knowledge Distillation

Label-Context-Dependent Internal Language Model Estimation for CTC

End-to-End Training of a Neural HMM with Label and Transition Probabilities

Error Analysis in a Modular Meeting Transcription System

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition

Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

The Conformer Encoder May Reverse the Time Dimension

Efficient Retrieval Augmented Generation from Unstructured Knowledge for Task-Oriented Dialog

Efficient Supernet Training with Orthogonal Softmax for Scalable ASR Model Compression

Comparative Analysis of the wav2vec 2.0 Feature Extractor

Why does CTC result in peaky behavior?

Task-oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10

Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model

Events

AI for Law

Personalize Your Feed