Ask or search anything...

History

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Hot

Key Laboratory of Intelligent Information Processing

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

01 Mar 2025

Shoutao Guo

Qingkai Fang

Chinese Academy of Sciences Key Laboratory of Intelligent Information Processing

Developed by researchers at ICT/CAS, LLaMA-Omni is an end-to-end model enabling low-latency, high-quality speech interaction with open-source Large Language Models, achieving a response latency of 236ms and strong instruction-following performance while training in under 3 days on 4 GPUs. It addresses the gap in open-source solutions for simultaneous speech and text generation by employing a non-autoregressive streaming speech decoder and an efficient two-stage training strategy.

View blog

#computer-science #conversational-ai #artificial-intelligence

Resources 3,076

1,899

LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers

25 Feb 2025

Chinese Academy of Sciences Harbin Institute of Technology

LevelRAG introduces a hierarchical architecture for Retrieval-Augmented Generation (RAG) systems that decouples retrieval logic from specific retriever optimizations, enabling flexible multi-hop question answering by combining sparse, dense, and web searchers. It demonstrated strong performance on multi-hop QA datasets, matching larger models in efficiency by utilizing significantly fewer parameters.

View blog

#chain-of-thought #computer-science #computation-and-language

Resources 5

322

Wait-info Policy: Balancing Source and Target at Information Level for Simultaneous Machine Translation

20 Oct 2022

Shoutao Guo

Chinese Academy of Sciences Key Laboratory of Intelligent Information Processing

Simultaneous machine translation (SiMT) outputs the translation while receiving the source inputs, and hence needs to balance the received source information and translated target information to make a reasonable decision between waiting for inputs or outputting translation. Previous methods always balance source and target information at the token level, either directly waiting for a fixed number of tokens or adjusting the waiting based on the current token. In this paper, we propose a Wait-info Policy to balance source and target at the information level. We first quantify the amount of information contained in each token, named info. Then during simultaneous translation, the decision of waiting or outputting is made based on the comparison results between the total info of previous target outputs and received source inputs. Experiments show that our method outperforms strong baselines under and achieves better balance via the proposed info.

View blog

#computer-science #artificial-intelligence #computation-and-language

Resources 7

Back Translation for Speech-to-text Translation Without Transcripts

15 May 2023

Chinese Academy of Sciences Key Laboratory of Intelligent Information Processing

The success of end-to-end speech-to-text translation (ST) is often achieved by utilizing source transcripts, e.g., by pre-training with automatic speech recognition (ASR) and machine translation (MT) tasks, or by introducing additional ASR and MT data. Unfortunately, transcripts are only sometimes available since numerous unwritten languages exist worldwide. In this paper, we aim to utilize large amounts of target-side monolingual data to enhance ST without transcripts. Motivated by the remarkable success of back translation in MT, we develop a back translation algorithm for ST (BT4ST) to synthesize pseudo ST data from monolingual target data. To ease the challenges posed by short-to-long generation and one-to-many mapping, we introduce self-supervised discrete units and achieve back translation by cascading a target-to-unit model and a unit-to-speech model. With our synthetic ST data, we achieve an average boost of 2.3 BLEU on MuST-C En-De, En-Fr, and En-Es datasets. More experiments show that our method is especially effective in low-resource scenarios.

View blog

#computer-science #computation-and-language #sound

Resources 13

There are no more papers matching your filters at the moment.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

Ask or search anything...

Events