alphaXiv

History

Papers Benchmarks

Jiangsu Key Lab of Language Computing

218

30 Jul 2025

computer-science artificial-intelligence computational-engineering-finance-and-science

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

Shanghai Jiao Tong University Suzhou Laboratory Jiangsu Key Lab of Language Computing

James Zhu

Researchers from Shanghai Jiao Tong University developed ChemDFM-R, a 14B parameter Large Language Model for chemistry, by integrating functional-group-level "atomized chemical knowledge" and a specialized rationale learning pipeline. This model achieved superior performance on chemical benchmarks and generates interpretable, step-by-step chemical rationales, facilitating human-AI collaboration.

22 Oct 2025

agentic-frameworks agents computer-science

DiSRouter: Distributed Self-Routing for LLM Selections

Shanghai Jiao Tong University Suzhou Laboratory Jiangsu Key Lab of Language Computing

DiSRouter introduces a distributed self-routing framework for Large Language Model (LLM) query selection, where individual LLM agents autonomously decide whether to answer a query or delegate it based on their self-awareness. This framework achieves superior utility, strong generalization, and enhanced modularity compared to centralized routing methods, effectively balancing performance and cost.

23 Oct 2025

ai-for-health computer-science machine-learning

MS-BART: Unified Modeling of Mass Spectra and Molecules for Structure Elucidation

Shanghai Jiao Tong University Shanghai Innovation Institute Suzhou Laboratory Jiangsu Key Lab of Language Computing

Mass spectrometry (MS) plays a critical role in molecular identification, significantly advancing scientific discovery. However, structure elucidation from MS data remains challenging due to the scarcity of annotated spectra. While large-scale pretraining has proven effective in addressing data scarcity in other domains, applying this paradigm to mass spectrometry is hindered by the complexity and heterogeneity of raw spectral signals. To address this, we propose MS-BART, a unified modeling framework that maps mass spectra and molecular structures into a shared token vocabulary, enabling cross-modal learning through large-scale pretraining on reliably computed fingerprint-molecule datasets. Multi-task pretraining objectives further enhance MS-BART's generalization by jointly optimizing denoising and translation task. The pretrained model is subsequently transferred to experimental spectra through finetuning on fingerprint predictions generated with MIST, a pre-trained spectral inference model, thereby enhancing robustness to real-world spectral variability. While finetuning alleviates the distributional difference, MS-BART still suffers molecular hallucination and requires further alignment. We therefore introduce a chemical feedback mechanism that guides the model toward generating molecules closer to the reference structure. Extensive evaluations demonstrate that MS-BART achieves SOTA performance across 5/12 key metrics on MassSpecGym and NPLIB1 and is faster by one order of magnitude than competing diffusion-based methods, while comprehensive ablation studies systematically validate the model's effectiveness and robustness.

06 Jun 2025

computer-science continual-learning computation-and-language

Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning

Shanghai Jiao Tong University

Huazhong University of Science and Technology AISpeech Co., Ltd.Jiangsu Key Lab of Language Computing

Recent advances in automatic speech recognition (ASR) have combined speech encoders with large language models (LLMs) through projection, forming Speech LLMs with strong performance. However, adapting them to new domains remains challenging, especially in low-resource settings where paired speech-text data is scarce. We propose a text-only fine-tuning strategy for Speech LLMs using unpaired target-domain text without requiring additional audio. To preserve speech-text alignment, we introduce a real-time evaluation mechanism during fine-tuning. This enables effective domain adaptation while maintaining source-domain performance. Experiments on LibriSpeech, SlideSpeech, and Medical datasets show that our method achieves competitive recognition performance, with minimal degradation compared to full audio-text fine-tuning. It also improves generalization to new domains without catastrophic forgetting, highlighting the potential of text-only fine-tuning for low-resource domain adaptation of ASR.

12 Aug 2025

audio-and-speech-processing electrical-engineering

Joint decoding method for controllable contextual speech recognition based on Speech LLM

Shanghai Jiao Tong University

Huazhong University of Science and Technology AISpeech Ltd Jiangsu Key Lab of Language Computing

Contextual speech recognition refers to the ability to identify preferences for specific content based on contextual information. Recently, leveraging the contextual understanding capabilities of Speech LLM to achieve contextual biasing by injecting contextual information through prompts have emerged as a research this http URL, the direct information injection method via prompts relies on the internal attention mechanism of the model, making it impossible to explicitly control the extent of information injection. To address this limitation, we propose a joint decoding method to control the contextual information. This approach enables explicit control over the injected contextual information and achieving superior recognition performance. Additionally, Our method can also be used for sensitive word suppression this http URL, experimental results show that even Speech LLM not pre-trained on long contextual data can acquire long contextual capabilities through our method.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode