Ask or search anything...

History

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Hot

Qualcomm Korea YH

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

02 Oct 2024

Minsoo Kim

Hanyang University Qualcomm AI Research

Researchers from Hanyang University and Qualcomm AI Research developed InfiniPot, a framework enabling pre-trained Large Language Models to process arbitrarily long input contexts within strict, fixed memory constraints, without additional training. The method achieved state-of-the-art performance on long-context benchmarks like LongBench and Needle In A Haystack, extending effective context windows by over 30 times while maintaining memory and computational efficiency.

View blog

#computer-science #computation-and-language #machine-learning

Resources

121

Think Straight, Stop Smart: Structured Reasoning for Efficient Multi-Hop RAG

22 Oct 2025

Qualcomm AI Research Qualcomm Korea YH

Qualcomm AI Research introduces Think Straight, Stop Smart (TSSS), a training-free framework designed for efficient multi-hop Retrieval-Augmented Generation (RAG) on on-device Large Language Models (LLMs). This approach achieves state-of-the-art accuracy in multi-hop question answering while substantially reducing inference time and token generation compared to existing RAG methods.

View blog

#agents #chain-of-thought #computer-science

Resources

Orthogonality Constrained Multi-Head Attention For Keyword Spotting

10 Oct 2019

Qualcomm AI Research Qualcomm Korea YH

Multi-head attention mechanism is capable of learning various representations from sequential data while paying attention to different subsequences, e.g., word-pieces or syllables in a spoken word. From the subsequences, it retrieves richer information than a single-head attention which only summarizes the whole sequence into one context vector. However, a naive use of the multi-head attention does not guarantee such richness as the attention heads may have positional and representational redundancy. In this paper, we propose a regularization technique for multi-head attention mechanism in an end-to-end neural keyword spotting system. Augmenting regularization terms which penalize positional and contextual non-orthogonality between the attention heads encourages to output different representations from separate subsequences, which in turn enables leveraging structured information without explicit sequence models such as hidden Markov models. In addition, intra-head contextual non-orthogonality regularization encourages each attention head to have similar representations across keyword examples, which helps classification by reducing feature variability. The experimental results demonstrate that the proposed regularization technique significantly improves the keyword spotting performance for the keyword "Hey Snapdragon".

View blog

#attention-mechanisms #computer-science #machine-learning

Resources

Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP

16 Oct 2024

Seoul National University Qualcomm AI Research

A text encoder within Vision-Language Models (VLMs) like CLIP plays a crucial role in translating textual input into an embedding space shared with images, thereby facilitating the interpretative analysis of vision tasks through natural language. Despite the varying significance of different textual elements within a sentence depending on the context, efforts to account for variation of importance in constructing text embeddings have been lacking. We propose a framework of Semantic Token Reweighting to build Interpretable text embeddings (SToRI), which incorporates controllability as well. SToRI refines the text encoding process in CLIP by differentially weighting semantic elements based on contextual importance, enabling finer control over emphasis responsive to data-driven insights and user preferences. The efficacy of SToRI is demonstrated through comprehensive experiments on few-shot image classification and image retrieval tailored to user preferences.

View blog

#attention-mechanisms #computer-science #computation-and-language

Resources

Revisiting Architecture-aware Knowledge Distillation: Smaller Models and Faster Search

27 Jun 2022

KAIST Qualcomm Korea YH

Knowledge Distillation (KD) has recently emerged as a popular method for compressing neural networks. In recent studies, generalized distillation methods that find parameters and architectures of student models at the same time have been proposed. Still, this search method requires a lot of computation to search for architectures and has the disadvantage of considering only convolutional blocks in their search space. This paper introduces a new algorithm, coined as Trust Region Aware architecture search to Distill knowledge Effectively (TRADE), that rapidly finds effective student architectures from several state-of-the-art architectures using trust region Bayesian optimization approach. Experimental results show our proposed TRADE algorithm consistently outperforms both the conventional NAS approach and pre-defined architecture under KD training.

View blog

#bayesian-optimization #computer-science #machine-learning

Resources

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

31 Aug 2023

Qualcomm AI Research Qualcomm Korea YH

Streaming automatic speech recognition (ASR) models are restricted from accessing future context, which results in worse performance compared to the non-streaming models. To improve the performance of streaming ASR, knowledge distillation (KD) from the non-streaming to streaming model has been studied, mainly focusing on aligning the output token probabilities. In this paper, we propose a layer-to-layer KD from the teacher encoder to the student encoder. To ensure that features are extracted using the same context, we insert auxiliary non-streaming branches to the student and perform KD from the non-streaming teacher layer to the non-streaming auxiliary layer. We design a special KD loss that leverages the autoregressive predictive coding (APC) mechanism to encourage the streaming model to predict unseen future contexts. Experimental results show that the proposed method can significantly reduce the word error rate compared to previous token probability distillation methods.

View blog

#computer-science #computation-and-language #audio-and-speech-processing

Resources

Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

31 Aug 2023

Qualcomm AI Research Qualcomm Korea YH

Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning has been widely adopted for learning representations from unlabeled data; however, it is known to be suitable for large models with enough capacity and is not practical for training a small footprint FS-KWS model. Instead, we automatically annotate and filter the data to construct a keyword-like dataset, LibriWord, enabling supervision on auxiliary data. We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data. Our method notably improves the performance over competitive methods in the FS-KWS benchmark.

View blog

#computer-science #machine-learning #sound

Resources

Dummy Prototypical Networks for Few-Shot Open-Set Keyword Spotting

28 Jun 2022

Seoul National University Qualcomm AI Research

Keyword spotting is the task of detecting a keyword in streaming audio. Conventional keyword spotting targets predefined keywords classification, but there is growing attention in few-shot (query-by-example) keyword spotting, e.g., N-way classification given M-shot support samples. Moreover, in real-world scenarios, there can be utterances from unexpected categories (open-set) which need to be rejected rather than classified as one of the N classes. Combining the two needs, we tackle few-shot open-set keyword spotting with a new benchmark setting, named splitGSC. We propose episode-known dummy prototypes based on metric learning to detect an open-set better and introduce a simple and powerful approach, Dummy Prototypical Networks (D-ProtoNets). Our D-ProtoNets shows clear margins compared to recent few-shot open-set recognition (FSOSR) approaches in the suggested splitGSC. We also verify our method on a standard benchmark, miniImageNet, and D-ProtoNets shows the state-of-the-art open-set detection rate in FSOSR.

View blog

#computer-science #contrastive-learning #machine-learning

Resources

Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation

11 Nov 2021

Seoul National University Qualcomm AI Research

While hand pose estimation is a critical component of most interactive extended reality and gesture recognition systems, contemporary approaches are not optimized for computational and memory efficiency. In this paper, we propose a tiny deep neural network of which partial layers are recursively exploited for refining its previous estimations. During its iterative refinements, we employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model. Our network is trained to be aware of the uncertainty in its current predictions to efficiently gate at each iteration, estimating variances after each loop for its keypoint estimates. Additionally, we investigate the effectiveness of end-to-end and progressive training protocols for our recursive structure on maximizing the model capacity. With the proposed setting, our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.

View blog

#computer-science #computer-vision-and-pattern-recognition #human-ai-interaction

Resources

Domain Generalization on Efficient Acoustic Scene Classification using Residual Normalization

12 Nov 2021

Seoul National University Qualcomm AI Research

It is a practical research topic how to deal with multi-device audio inputs by a single acoustic scene classification system with efficient design. In this work, we propose Residual Normalization, a novel feature normalization method that uses frequency-wise normalization % instance normalization with a shortcut path to discard unnecessary device-specific information without losing useful information for classification. Moreover, we introduce an efficient architecture, BC-ResNet-ASC, a modified version of the baseline architecture with a limited receptive field. BC-ResNet-ASC outperforms the baseline architecture even though it contains the small number of parameters. Through three model compression schemes: pruning, quantization, and knowledge distillation, we can reduce model complexity further while mitigating the performance degradation. The proposed system achieves an average test accuracy of 76.3% in TAU Urban Acoustic Scenes 2020 Mobile, development dataset with 315k parameters, and average test accuracy of 75.3% after compression to 61.0KB of non-zero parameters. The proposed method won the 1st place in DCASE 2021 challenge, TASK1A.

View blog

#computer-science #machine-learning #sound

Resources

There are no more papers matching your filters at the moment.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

Ask or search anything...

Events