alphaXiv

History

Papers Benchmarks

Tencent Jarvis Lab

379

23 May 2025

chain-of-thought computer-science artificial-intelligence

Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens

Westlake University

Renmin University of China Tencent Jarvis Lab

Renmin University of China and Tencent Jarvis Lab researchers apply information theory to analyze reasoning efficiency in Large Reasoning Models, introducing InfoBias and InfoGain metrics to quantify semantic alignment and information contribution of reasoning steps, discovering that longer reasoning chains accumulate deviation from correct paths while initial model confidence serves as a strong predictor of eventual accuracy, leading to their Adaptive Think strategy that dynamically halts reasoning based on entropy thresholds to achieve 58.78% average token reduction while preserving accuracy across mathematical, factual, logical, and commonsense reasoning benchmarks.

10 Oct 2025

agents computer-science artificial-intelligence

Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models

CUHK

Tsinghua University

OpenAI University of Wisconsin

Nanyang Technological University Tencent Hunyuan Tencent Jarvis Lab SLAI

To address the dual challenges of inherent stochasticity and non-differentiable metrics in physical spatiotemporal forecasting, we propose Spatiotemporal Forecasting as Planning (SFP), a new paradigm grounded in Model-Based Reinforcement Learning. SFP constructs a novel Generative World Model to simulate diverse, high-fidelity future states, enabling an "imagination-based" environmental simulation. Within this framework, a base forecasting model acts as an agent, guided by a beam search-based planning algorithm that leverages non-differentiable domain metrics as reward signals to explore high-return future sequences. These identified high-reward candidates then serve as pseudo-labels to continuously optimize the agent's policy through iterative self-training, significantly reducing prediction error and demonstrating exceptional performance on critical domain metrics like capturing extreme events.

20 Jun 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

IQE-CLIP: Instance-aware Query Embedding for Zero-/Few-shot Anomaly Detection in Medical Domain

University of Notre Dame

Westlake University

Shandong University Simon Fraser University Tencent Jarvis Lab

Recently, the rapid advancements of vision-language models, such as CLIP, leads to significant progress in zero-/few-shot anomaly detection (ZFSAD) tasks. However, most existing CLIP-based ZFSAD methods commonly assume prior knowledge of categories and rely on carefully crafted prompts tailored to specific scenarios. While such meticulously designed text prompts effectively capture semantic information in the textual space, they fall short of distinguishing normal and anomalous instances within the joint embedding space. Moreover, these ZFSAD methods are predominantly explored in industrial scenarios, with few efforts conducted to medical tasks. To this end, we propose an innovative framework for ZFSAD tasks in medical domain, denoted as IQE-CLIP. We reveal that query embeddings, which incorporate both textual and instance-aware visual information, are better indicators for abnormalities. Specifically, we first introduce class-based prompting tokens and learnable prompting tokens for better adaptation of CLIP to the medical domain. Then, we design an instance-aware query module (IQM) to extract region-level contextual information from both text prompts and visual features, enabling the generation of query embeddings that are more sensitive to anomalies. Extensive experiments conducted on six medical datasets demonstrate that IQE-CLIP achieves state-of-the-art performance on both zero-shot and few-shot tasks. We release our code and data at this https URL.

01 Oct 2025

computer-science artificial-intelligence computation-and-language

Copy-Paste to Mitigate Large Language Model Hallucinations

Peking University Tianjin University of Technology Tencent Jarvis Lab

Researchers developed CopyPasteLLM, a two-stage framework that trains Large Language Models to reduce contextual faithfulness hallucinations in Retrieval-Augmented Generation by internalizing a preference for directly copying content from provided contexts. The method achieves 92.8% accuracy on the FaithEval counterfactual dataset, outperforming baselines, and requires only 365 training samples, demonstrating high data efficiency.

12 Sep 2025

agents ai-for-health computer-science

Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models

Tsinghua University Central South University

Beijing Jiaotong University Singapore University of Technology and Design Tencent Jarvis Lab Lingxin AI

Recent progress in large language models (LLMs) has opened new possibilities for mental health support, yet current approaches lack realism in simulating specialized psychotherapy and fail to capture therapeutic progression over time. Narrative therapy, which helps individuals transform problematic life stories into empowering alternatives, remains underutilized due to limited access and social stigma. We address these limitations through a comprehensive framework with two core components. First, INT (Interactive Narrative Therapist) simulates expert narrative therapists by planning therapeutic stages, guiding reflection levels, and generating contextually appropriate expert-like responses. Second, IMA (Innovative Moment Assessment) provides a therapy-centric evaluation method that quantifies effectiveness by tracking "Innovative Moments" (IMs), critical narrative shifts in client speech signaling therapy progress. Experimental results on 260 simulated clients and 230 human participants reveal that INT consistently outperforms standard LLMs in therapeutic quality and depth. We further demonstrate the effectiveness of INT in synthesizing high-quality support conversations to facilitate social applications.

21 Oct 2025

computer-science computer-vision-and-pattern-recognition ensemble-methods

Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding

Westlake University

Renmin University of China

McGill University Tencent Jarvis Lab

Large Vision-Language Models (LVLMs) have recently achieved impressive results in multimodal tasks such as image captioning and visual question answering. However, they remain prone to object hallucination -- generating descriptions of nonexistent or misidentified objects. Prior work has partially mitigated this via auxiliary training objectives or external modules, but challenges remain in terms of scalability, adaptability, and model independence. To address these limitations, we propose Adaptive Token Ensemble Decoding (ATED), a training-free, token-level ensemble framework that mitigates hallucination by aggregating predictions from multiple LVLMs during inference. ATED dynamically computes uncertainty-based weights for each model, reflecting their reliability at each decoding step. It also integrates diverse decoding paths to improve contextual grounding and semantic consistency. Experiments on standard hallucination detection benchmarks demonstrate that ATED significantly outperforms state-of-the-art methods, reducing hallucination without compromising fluency or relevance. Our findings highlight the benefits of adaptive ensembling and point to a promising direction for improving LVLM robustness in high-stakes applications. The code is available at this https URL.

14 Jun 2025

computer-science artificial-intelligence computation-and-language

Training-free LLM Merging for Multi-task Learning

City University of Hong Kong

Westlake University University of Technology Sydney Jilin University University of Queensland Tencent Jarvis Lab

Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse natural language processing (NLP) tasks. The release of open-source LLMs like LLaMA and Qwen has triggered the development of numerous fine-tuned models tailored for various tasks and languages. In this paper, we explore an important question: is it possible to combine these specialized models to create a unified model with multi-task capabilities. We introduces Hierarchical Iterative Merging (Hi-Merging), a training-free method for unifying different specialized LLMs into a single model. Specifically, Hi-Merging employs model-wise and layer-wise pruning and scaling, guided by contribution analysis, to mitigate parameter conflicts. Extensive experiments on multiple-choice and question-answering tasks in both Chinese and English validate Hi-Merging's ability for multi-task learning. The results demonstrate that Hi-Merging consistently outperforms existing merging techniques and surpasses the performance of models fine-tuned on combined datasets in most scenarios. Code is available at: this https URL

103

14 Jun 2025

computer-science continual-learning artificial-intelligence

Model Merging for Knowledge Editing

City University of Hong Kong Tencent Jarvis Lab

Researchers from City University of Hong Kong and Tencent Jarvis Lab propose a two-stage framework for knowledge editing in LLMs, combining Robust Supervised Fine-Tuning with model merging. This approach successfully addresses the challenge of sequential knowledge updates while preserving the model's general capabilities, outperforming existing methods without requiring architectural changes.

11 Sep 2021

image-and-video-processing electrical-engineering

InDuDoNet: An Interpretable Dual Domain Network for CT Metal Artifact Reduction

Macau University of Science and Technology Xian Jiaotong University Beijing Information Science and Technology University Tencent Jarvis Lab

For the task of metal artifact reduction (MAR), although deep learning (DL)-based methods have achieved promising performances, most of them suffer from two problems: 1) the CT imaging geometry constraint is not fully embedded into the network during training, leaving room for further performance improvement; 2) the model interpretability is lack of sufficient consideration. Against these issues, we propose a novel interpretable dual domain network, termed as InDuDoNet, which combines the advantages of model-driven and data-driven methodologies. Specifically, we build a joint spatial and Radon domain reconstruction model and utilize the proximal gradient technique to design an iterative algorithm for solving it. The optimization algorithm only consists of simple computational operators, which facilitate us to correspondingly unfold iterative steps into network modules and thus improve the interpretablility of the framework. Extensive experiments on synthesized and clinical data show the superiority of our InDuDoNet. Code is available in \url{this https URL}.%method on the tasks of MAR and downstream multi-class pelvic fracture segmentation.

17 Jun 2025

computer-science computation-and-language ensemble-methods

A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs

City University of Hong Kong Tencent Jarvis Lab Xi 'an Jiaotong University Xi Explanation of thought process: 1. **Identify Affiliations:** I first scanned the affiliations listed under the author names, which are numbered 1 through 5: * 1: Xi’an Jiaotong University * 2: City University of Hong Kong * 3: Singapore Management University * 4: Tencent Jarvis Lab * 5: Westlake University 2. **Identify Author Emails and Domains:** Next, I looked at the provided author emails and extracted their domains: * dymanne@stu.xjtu.edu.cn -> xjtu.edu.cn * guoshuai.zhao@xjtu.edu.cn -> xjtu.edu.cn * zhuli@xjtu.edu.cn -> xjtu.edu.cn * kevinxwu@tencent.com -> tencent.com * xianzhao@cityu.edu.hk -> cityu.edu.hk 3. **Cross-reference with the Strict Rule:** The prompt explicitly states:

Temporal knowledge graph reasoning aims to predict future events with knowledge of existing facts and plays a key role in various downstream tasks. Previous methods focused on either graph structure learning or semantic reasoning, failing to integrate dual reasoning perspectives to handle different prediction scenarios. Moreover, they lack the capability to capture the inherent differences between historical and non-historical events, which limits their generalization across different temporal contexts. To this end, we propose a Multi-Expert Structural-Semantic Hybrid (MESH) framework that employs three kinds of expert modules to integrate both structural and semantic information, guiding the reasoning process for different events. Extensive experiments on three datasets demonstrate the effectiveness of our approach.

24 May 2021

computer-science artificial-intelligence computation-and-language

OntoEA: Ontology-guided Entity Alignment via Joint Knowledge Graph Embedding

University of Oxford Tencent Jarvis Lab

Semantic embedding has been widely investigated for aligning knowledge graph (KG) entities. Current methods have explored and utilized the graph structure, the entity names and attributes, but ignore the ontology (or ontological schema) which contains critical meta information such as classes and their membership relationships with entities. In this paper, we propose an ontology-guided entity alignment method named OntoEA, where both KGs and their ontologies are jointly embedded, and the class hierarchy and the class disjointness are utilized to avoid false mappings. Extensive experiments on seven public and industrial benchmarks have demonstrated the state-of-the-art performance of OntoEA and the effectiveness of the ontologies.

18 Jul 2022

attention-mechanisms computer-science computer-vision-and-pattern-recognition

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

Tencent Jarvis Lab Southeast Univ.

The DCAMA framework introduces a novel approach for few-shot semantic segmentation by directly aggregating support mask values using dense, multi-scale cross-attention between query and all support pixels. It achieves state-of-the-art performance on PASCAL-5i (69.3% mIoU, 1-shot Swin-B), COCO-20i (50.9% mIoU, 1-shot Swin-B), and FSS-1000 benchmarks, demonstrating improved robustness and an efficient one-pass n-shot inference strategy.

25 May 2021

computer-science artificial-intelligence computation-and-language

Guiding the Growth: Difficulty-Controllable Question Generation through Step-by-Step Rewriting

Carnegie Mellon University

Université de Montréal

Peking University The University of Sheffield Tencent Jarvis Lab

Bang Liu

A framework from Tencent Jarvis Lab, CMU, and Université de Montréal redefines question difficulty by the number of inference steps, utilizing a step-by-step rewriting approach to generate questions with controlled reasoning complexity. This method demonstrates the generation of questions with specified inference hops and boosts multi-hop QA system performance when used for data augmentation.

21 Jul 2020

ai-for-health computer-science computer-vision-and-pattern-recognition

GREEN: a Graph REsidual rE-ranking Network for Grading Diabetic Retinopathy

Northwestern Polytechnical University Tencent Jarvis Lab

The automatic grading of diabetic retinopathy (DR) facilitates medical diagnosis for both patients and physicians. Existing researches formulate DR grading as an image classification problem. As the stages/categories of DR correlate with each other, the relationship between different classes cannot be explicitly described via a one-hot label because it is empirically estimated by different physicians with different outcomes. This class correlation limits existing networks to achieve effective classification. In this paper, we propose a Graph REsidual rE-ranking Network (GREEN) to introduce a class dependency prior into the original image classification network. The class dependency prior is represented by a graph convolutional network with an adjacency matrix. This prior augments image classification pipeline by re-ranking classification results in a residual aggregation manner. Experiments on the standard benchmarks have shown that GREEN performs favorably against state-of-the-art approaches.

12 Nov 2022

ai-for-health computer-science computer-vision-security

DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis

Chinese Academy of Sciences

University of Science and Technology of China Tencent Jarvis Lab The First Affiliated Hospital of Anhui Medical University

Fast screening and diagnosis are critical in COVID-19 patient treatment. In addition to the gold standard RT-PCR, radiological imaging like X-ray and CT also works as an important means in patient screening and follow-up. However, due to the excessive number of patients, writing reports becomes a heavy burden for radiologists. To reduce the workload of radiologists, we propose DeltaNet to generate medical reports automatically. Different from typical image captioning approaches that generate reports with an encoder and a decoder, DeltaNet applies a conditional generation process. In particular, given a medical image, DeltaNet employs three steps to generate a report: 1) first retrieving related medical reports, i.e., the historical reports from the same or similar patients; 2) then comparing retrieved images and current image to find the differences; 3) finally generating a new report to accommodate identified differences based on the conditional report. We evaluate DeltaNet on a COVID-19 dataset, where DeltaNet outperforms state-of-the-art approaches. Besides COVID-19, the proposed DeltaNet can be applied to other diseases as well. We validate its generalization capabilities on the public IU-Xray and MIMIC-CXR datasets for chest-related diseases. Code is available at \url{this https URL}.

719

25 Aug 2023

computer-science artificial-intelligence computer-vision-and-pattern-recognition

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning

University of Oxford Peng Cheng Laboratory

Peking University Tencent Jarvis Lab

MultiCapCLIP introduces a zero-shot multilingual visual captioning framework that generates high-quality descriptions in multiple languages without requiring labeled vision-caption pairs. It achieves superior performance compared to existing zero-shot methods on image and video captioning datasets in English, Chinese, German, and French.

22 Dec 2020

ai-for-health computer-science conversational-ai

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Sun Yat-Sen University Salesforce Research Tencent Jarvis Lab DarkMatter AI Inc.

Human doctors with well-structured medical knowledge can diagnose a disease merely via a few conversations with patients about symptoms. In contrast, existing knowledge-grounded dialogue systems often require a large number of dialogue instances to learn as they fail to capture the correlations between different diseases and neglect the diagnostic experience shared among them. To address this issue, we propose a more natural and practical paradigm, i.e., low-resource medical dialogue generation, which can transfer the diagnostic experience from source diseases to target ones with a handful of data for adaptation. It is capitalized on a commonsense knowledge graph to characterize the prior disease-symptom relations. Besides, we develop a Graph-Evolving Meta-Learning (GEML) framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease, which effectively alleviates the needs of a large number of dialogues. More importantly, by dynamically evolving disease-symptom graphs, GEML also well addresses the real-world challenges that the disease-symptom correlations of each disease may vary or evolve along with more diagnostic cases. Extensive experiment results on the CMDD dataset and our newly-collected Chunyu dataset testify the superiority of our approach over state-of-the-art approaches. Besides, our GEML can generate an enriched dialogue-sensitive knowledge graph in an online manner, which could benefit other tasks grounded on knowledge graph.

03 May 2023

ai-for-health computer-science computation-and-language

Exploring Social Media for Early Detection of Depression in COVID-19 Patients

Harvard University

Zhejiang University Tencent Jarvis Lab

The COVID-19 pandemic has caused substantial damage to global health. Even though three years have passed, the world continues to struggle with the virus. Concerns are growing about the impact of COVID-19 on the mental health of infected individuals, who are more likely to experience depression, which can have long-lasting consequences for both the affected individuals and the world. Detection and intervention at an early stage can reduce the risk of depression in COVID-19 patients. In this paper, we investigated the relationship between COVID-19 infection and depression through social media analysis. Firstly, we managed a dataset of COVID-19 patients that contains information about their social media activity both before and after infection. Secondly,We conducted an extensive analysis of this dataset to investigate the characteristic of COVID-19 patients with a higher risk of depression. Thirdly, we proposed a deep neural network for early prediction of depression risk. This model considers daily mood swings as a psychiatric signal and incorporates textual and emotional characteristics via knowledge distillation. Experimental results demonstrate that our proposed framework outperforms baselines in detecting depression risk, with an AUROC of 0.9317 and an AUPRC of 0.8116. Our model has the potential to enable public health organizations to initiate prompt intervention with high-risk patients

21 Jul 2024

agentic-frameworks ai-for-health computer-science

Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine

University of Oxford

Peking University

Yale University Shanghai Jiao Tong University School of Medicine Oxford Suzhou Centre for Advanced Research Tencent Jarvis Lab

Chenyu You

邦杨

Researchers developed Re3Writer, a retrieval-augmented and knowledge-grounded framework that enhances language models to generate faithful and comprehensive patient instructions (PIs) from clinical records. This method improved PI generation across various models, notably increasing physician helpfulness from 32% to 74% and achieving up to 20% relative improvement in BLEU-4 scores on a new benchmark dataset.

09 Oct 2022

computer-science conversational-ai computation-and-language

Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning

Université de Montréal

Sun Yat-Sen University Hong Kong Polytechnic University Baidu Inc Tencent Jarvis Lab

Bang Liu

Providing Emotional Support (ES) to soothe people in emotional distress is an essential capability in social interactions. Most existing researches on building ES conversation systems only considered single-turn interactions with users, which was over-simplified. In comparison, multi-turn ES conversation systems can provide ES more effectively, but face several new technical challenges, including: (1) how to adopt appropriate support strategies to achieve the long-term dialogue goal of comforting the user's emotion; (2) how to dynamically model the user's state. In this paper, we propose a novel system MultiESC to address these issues. For strategy planning, drawing inspiration from the A* search algorithm, we propose lookahead heuristics to estimate the future user feedback after using particular strategies, which helps to select strategies that can lead to the best long-term effects. For user state modeling, MultiESC focuses on capturing users' subtle emotional expressions and understanding their emotion causes. Extensive experiments show that MultiESC significantly outperforms competitive baselines in both dialogue generation and strategy planning. Our codes are available at this https URL

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens

Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models

IQE-CLIP: Instance-aware Query Embedding for Zero-/Few-shot Anomaly Detection in Medical Domain

Copy-Paste to Mitigate Large Language Model Hallucinations

Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models

Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding

Training-free LLM Merging for Multi-task Learning

Model Merging for Knowledge Editing

InDuDoNet: An Interpretable Dual Domain Network for CT Metal Artifact Reduction

A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs

OntoEA: Ontology-guided Entity Alignment via Joint Knowledge Graph Embedding

Dense Cross-Query-and-Support Attention Weighted Mask Aggregation for Few-Shot Segmentation

Guiding the Growth: Difficulty-Controllable Question Generation through Step-by-Step Rewriting

GREEN: a Graph REsidual rE-ranking Network for Grading Diabetic Retinopathy

DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Exploring Social Media for Early Detection of Depression in COVID-19 Patients

Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine

Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning

Events

AI for Law

Personalize Your Feed