alphaXiv

History

Papers Benchmarks

Tianjin University

1,470

30 Jan 2024

physics chemical-physics

Ionomer structure and component transport in the cathode catalyst layer of PEM fuel cells: A molecular dynamics study

Tianjin University Institute of Physics, Polish Academy of Sciences

The transport of water and protons in the cathode catalyst layer (CCL) of proton exchange membrane (PEM) fuel cells is critical for cell performance, but the underlying mechanism is still unclear. Herein, the ionomer structure and the distribution/transport characteristics of water and protons in CCLs are investigated via all-atom molecular dynamics simulations. The results show that at low water contents, isolated water clusters form in ionomer pores, while proton transport is mainly via the charged sites of the ionomer side chains and the Grotthuss mechanism. Moreover, with increasing water content, water clusters are interconnected to form continuous water channels, which provide effective paths for proton transfer via the vehicular and Grotthuss mechanisms. Increasing the ionomer mass content can enhance the dense arrangement of the ionomer, which in turn increases the density of charge sites and improves the proton transport efficiency. When the ionomer mass content is high, the clustering effect reduces the space for water diffusion, increases the proton transport path, and finally decreases the proton transport efficiency. By providing physics insights into the proton transport mechanism, this study is helpful for the structural design and performance improvement of CCLs of PEM fuel cells.

784

01 Feb 2024

computer-science computer-vision-security computer-vision-and-pattern-recognition

Structure-Informed Shadow Removal Networks

Tianjin University

City University of Hong Kong A*STAR InnoPeak Technology, Inc.

Existing deep learning-based shadow removal methods still produce images with shadow remnants. These shadow remnants typically exist in homogeneous regions with low-intensity values, making them untraceable in the existing image-to-image mapping paradigm. We observe that shadows mainly degrade images at the image-structure level (in which humans perceive object shapes and continuous colors). Hence, in this paper, we propose to remove shadows at the image structure level. Based on this idea, we propose a novel structure-informed shadow removal network (StructNet) to leverage the image-structure information to address the shadow remnant problem. Specifically, StructNet first reconstructs the structure information of the input image without shadows and then uses the restored shadow-free structure prior to guiding the image-level shadow removal. StructNet contains two main novel modules: (1) a mask-guided shadow-free extraction (MSFE) module to extract image structural features in a non-shadow-to-shadow directional manner, and (2) a multi-scale feature & residual aggregation (MFRA) module to leverage the shadow-free structure information to regularize feature consistency. In addition, we also propose to extend StructNet to exploit multi-level structure information (MStructNet), to further boost the shadow removal performance with minimum computational overheads. Extensive experiments on three shadow removal benchmarks demonstrate that our method outperforms existing shadow removal methods, and our StructNet can be integrated with existing methods to improve them further.

4,955

18 May 2025

computer-science machine-learning

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

Tianjin University Tencent AI Lab

qingyang zhang

Hai-Tao Wu

A fully unsupervised method called EMPO enhances Large Language Model reasoning by minimizing semantic entropy as an intrinsic reward signal. The approach improves mathematical reasoning accuracy on models like Qwen2.5-Math-7B from 30.7% to 48.1% and boosts natural reasoning on benchmarks like MMLU-Pro for Qwen2.5-7B from 32.1% to 50.1%, achieving performance comparable to or exceeding supervised methods.

708

19 Aug 2025

agents computer-science artificial-intelligence

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Tianjin University

Embodied-R1 introduces a unified 'pointing' representation for general robotic manipulation, leveraging a reinforced fine-tuning curriculum to bridge high-level vision-language understanding with low-level action primitives. The model achieves state-of-the-art results on 11 diverse spatial and pointing benchmarks and demonstrates robust zero-shot generalization with an 87.5% success rate on real-world robot tasks.

1,154

02 Mar 2024

adversarial-attacks computer-science computer-vision-security

Prompt Injection attack against LLM-integrated Applications

Tianjin University University of New South Wales

Nanyang Technological University

Huazhong University of Science and Technology

Southern University of Science and Technology

YI LIU

Researchers from Nanyang Technological University and collaborators developed HOUYI, a systematic black-box prompt injection attack framework for LLM-integrated applications, demonstrating 86.1% susceptibility across 36 real-world commercial applications. The study revealed severe consequences including intellectual property theft via prompt leaking and financial losses from prompt abusing, with a daily financial loss of $259.2 estimated for a single compromised application.

225

23 Sep 2025

computer-science robotics

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

Tianjin University Huawei Noah’s Ark Lab

Chinese Academy of Sciences

Imperial College London

Sun Yat-Sen University

University of Manchester

University College London Tongji University

Shanghai Jiao Tong University

Nanjing University

Tsinghua University

Peking University

King’s College London TU Darmstadt Pengcheng Laboratory Hong Kong University of Science and Technology (Guangzhou)

Shaojin Ma

Researchers from a global consortium, including Tianjin University and Huawei Noah’s Ark Lab, developed Embodied Arena, a comprehensive platform for evaluating Embodied AI agents, featuring a systematic capability taxonomy and an automated, LLM-driven data generation pipeline. This platform integrates over 22 benchmarks and 30 models, revealing that specialized embodied models often outperform general models on targeted tasks and identifying object and spatial perception as key performance bottlenecks.

201

26 Sep 2025

computer-science artificial-intelligence computation-and-language

UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios

Tianjin University

Sun Yat-Sen University

Tsinghua University China Agricultural University

Nanyang Technological University

HKUST Huazhong Agricultural University Didichuxing Co. Ltd

Autonomous agents have recently achieved remarkable progress across diverse domains, yet most evaluations focus on short-horizon, fully observable tasks. In contrast, many critical real-world tasks, such as large-scale software development, commercial investment, and scientific discovery, unfold in long-horizon and partially observable scenarios where success hinges on sustained reasoning, planning, memory management, and tool use. Existing benchmarks rarely capture these long-horizon challenges, leaving a gap in systematic evaluation. To bridge this gap, we introduce \textbf{UltraHorizon} a novel benchmark that measures the foundational capabilities essential for complex real-world challenges. We use exploration as a unifying task across three distinct environments to validate these core competencies. Agents are designed in long-horizon discovery tasks where they must iteratively uncover hidden rules through sustained reasoning, planning, memory and tools management, and interaction with environments. Under the heaviest scale setting, trajectories average \textbf{200k+} tokens and \textbf{400+} tool calls, whereas in standard configurations they still exceed \textbf{35k} tokens and involve more than \textbf{60} tool calls on average. Our extensive experiments reveal that LLM-agents consistently underperform in these settings, whereas human participants achieve higher scores, underscoring a persistent gap in agents' long-horizon abilities. We also observe that simple scaling fails in our task. To better illustrate the failure of agents, we conduct an in-depth analysis of collected trajectories. We identify eight types of errors and attribute them to two primary causes: in-context locking and functional fundamental capability gaps. \href{this https URL}{Our code will be available here.}

368

13 Aug 2025

computer-science robotics

GeoVLA: Empowering 3D Representations in Vision-Language-Action Models

Tianjin University

Tsinghua University Dexmal

GeoVLA introduces an end-to-end framework that integrates 3D point cloud information into Vision-Language-Action (VLA) models, allowing robots to better perceive their physical environment for manipulation tasks. This architecture consistently achieves state-of-the-art performance on manipulation benchmarks and exhibits robust generalization across various real-world spatial, scale, and viewpoint changes.

3,674

13 Aug 2025

computer-science artificial-intelligence computation-and-language

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Tianjin University

Shanghai Jiao Tong University

Zhejiang University Tongyi Speech Lab

Wenxi Chen

Human speech goes beyond the mere transfer of information; it is a profound exchange of emotions and a connection between individuals. While Text-to-Speech (TTS) models have made huge progress, they still face challenges in controlling the emotional expression in the generated speech. In this work, we propose EmoVoice, a novel emotion-controllable TTS model that exploits large language models (LLMs) to enable fine-grained freestyle natural language emotion control, and a phoneme boost variant design that makes the model output phoneme tokens and audio tokens in parallel to enhance content consistency, inspired by chain-of-thought (CoT) and chain-of-modality (CoM) techniques. Besides, we introduce EmoVoice-DB, a high-quality 40-hour English emotion dataset featuring expressive speech and fine-grained emotion labels with natural language descriptions. EmoVoice achieves state-of-the-art performance on the English EmoVoice-DB test set using only synthetic training data, and on the Chinese Secap test set using our in-house data. We further investigate the reliability of existing emotion evaluation metrics and their alignment with human perceptual preferences, and explore using SOTA multimodal LLMs GPT-4o-audio and Gemini to assess emotional speech. Dataset, code, checkpoints, and demo samples are available at this https URL.

431

08 Jul 2025

agents computer-science computation-and-language

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Tianjin University

Tsinghua University Harbin Institute of Technology OpenBMB

This research introduces AUTOTRITON, the first large language model powered by reinforcement learning specifically for automatic Triton kernel generation. The 8B-parameter model demonstrates improved correctness on established benchmarks, outperforming larger, general-purpose LLMs and achieving comparable runtime performance.

2,038

09 May 2025

computer-science artificial-intelligence computation-and-language

Estimating LLM Uncertainty with Evidence

Tianjin University Beijing University of Posts and Telecommunications

Researchers from Tianjin University, Beijing University of Posts and Telecommunications, and A*STAR introduce Logits-induced Token Uncertainty (LogTokU), a framework that estimates LLM uncertainty by leveraging raw logits as evidence, providing a more accurate and efficient alternative to existing methods. LogTokU significantly improves dynamic decoding strategies by up to 11.4% and enhances response reliability estimation by up to 7.5% on various LLMs.

825

10 Sep 2025

computer-science machine-learning robotics

Dexterous Manipulation through Imitation Learning: A Survey

Tianjin University

ETH Zurich

Chinese Academy of Sciences University of Edinburgh University of Hong Kong

Southern University of Science and Technology

Shandong University Macau University of Science and Technology

This survey provides a comprehensive review of dexterous manipulation using imitation learning, categorizing algorithmic advancements, analyzing end-effector designs and data acquisition methods, and outlining key challenges and future research directions. It consolidates a rapidly evolving interdisciplinary field to guide researchers and practitioners toward more capable robotic systems.

137

28 Nov 2025

attention-mechanisms computer-science computer-vision-and-pattern-recognition

Group Relative Attention Guidance for Image Editing

Tianjin University Kuaishou Technology Kolors Team, Kuaishou Technology

Recently, image editing based on Diffusion-in-Transformer models has undergone rapid development. However, existing editing methods often lack effective control over the degree of editing, limiting their ability to achieve more customized results. To address this limitation, we investigate the MM-Attention mechanism within the DiT model and observe that the Query and Key tokens share a bias vector that is only layer-dependent. We interpret this bias as representing the model's inherent editing behavior, while the delta between each token and its corresponding bias encodes the content-specific editing signals. Based on this insight, we propose Group Relative Attention Guidance, a simple yet effective method that reweights the delta values of different tokens to modulate the focus of the model on the input image relative to the editing instruction, enabling continuous and fine-grained control over editing intensity without any tuning. Extensive experiments conducted on existing image editing frameworks demonstrate that GRAG can be integrated with as few as four lines of code, consistently enhancing editing quality. Moreover, compared to the commonly used Classifier-Free Guidance, GRAG achieves smoother and more precise control over the degree of editing. Our code will be released at this https URL.

594

14 Feb 2025

agents computer-science conversational-ai

A Survey on LLM-powered Agents for Recommender Systems

Tianjin University Du Xiaoman Financial Technology

The survey by Peng et al. presents the first comprehensive categorization of LLM-powered agents in recommender systems, classifying them into recommender-oriented, interaction-oriented, and simulation-oriented paradigms. It also proposes a unified four-module architecture (Profile, Memory, Planning, Action) to structure the understanding of these systems, addressing challenges like nuanced user understanding and transparency.

299

26 Mar 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

Tianjin University

ByteDance

Peking University Intel Labs Xian Jiaotong University

MC-LLaVA introduces the first multi-concept personalization paradigm for Vision-Language Models, enabling them to understand and generate responses involving multiple user-defined concepts simultaneously. The model achieves superior recognition accuracy (0.845 for multi-concept) and competitive VQA performance (BLEU 0.658) on a newly contributed dataset, outperforming prior single-concept approaches.

140

298

20 Feb 2025

computer-science computation-and-language machine-learning

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

Tianjin University

Tsinghua University Harbin Institute of Technology The Hong Kong University of Science and Technology (Guangzhou)

TRITONBENCH introduces the first comprehensive benchmark to evaluate large language models' capabilities in generating high-performance Triton operators for GPUs. The benchmark reveals that current LLMs generally struggle to produce efficient Triton code, though domain-specific fine-tuning and one-shot examples can improve execution accuracy for certain models and tasks.

450

03 Aug 2025

computer-science computer-vision-and-pattern-recognition efficient-transformers

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models

Tianjin University Nankai University Shanghai Innovation Institute vivo Mobile Communication Co., Ltd.

GlimpsePrune introduces a dynamic visual token pruning framework for Large Vision-Language Models, enabling efficient processing of high-resolution visual inputs. The method achieves an average 92.6% visual token pruning rate while fully retaining baseline performance, and reduces peak GPU memory usage by 72.8% during generation, demonstrating robustness in free-form visual question answering.

125

08 Nov 2025

computer-science artificial-intelligence computation-and-language

Revisiting Entropy in Reinforcement Learning for Large Reasoning Models

Tianjin University Tianjin Normal University

A study systematically explores entropy dynamics in Reinforcement Learning with Verifiable Rewards (RLVR) for large language models, identifying that positive advantages drive entropy collapse and proposing the Progressive Advantage Reweighting method to regulate it. The findings show a strong correlation between entropy and response diversity, while also improving model calibration and achieving competitive performance.

1,145

31 Oct 2024

computer-science computer-vision-and-pattern-recognition machine-learning

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Tianjin University Huawei Noah’s Ark Lab

Tsinghua University

Researchers from Tsinghua University and Huawei Noah’s Ark Lab developed iVideoGPT, a scalable autoregressive transformer that functions as an interactive world model for embodied AI. The model leverages a novel compressive video tokenization technique and large-scale pre-training on diverse manipulation datasets, demonstrating strong performance in video prediction, visual planning, and sample-efficient model-based reinforcement learning.

115

181

01 Oct 2025

clustering-algorithms computer-science machine-learning

Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift

Tianjin University Agency for Science, Technology and Research

The TFPS framework addresses time series forecasting under patch-level distribution shifts by employing a dual-domain encoder and dynamically identified pattern-specific experts. It achieves top-1 performance in 57 out of 72 experimental configurations, significantly reducing MSE by an average of 9.5% over time-domain methods and 16.9% over frequency-domain methods.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Ionomer structure and component transport in the cathode catalyst layer of PEM fuel cells: A molecular dynamics study

Structure-Informed Shadow Removal Networks

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

Prompt Injection attack against LLM-integrated Applications

Embodied Arena: A Comprehensive, Unified, and Evolving Evaluation Platform for Embodied AI

UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios

GeoVLA: Empowering 3D Representations in Vision-Language-Action Models

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Estimating LLM Uncertainty with Evidence

Dexterous Manipulation through Imitation Learning: A Survey

Group Relative Attention Guidance for Image Editing

A Survey on LLM-powered Agents for Recommender Systems

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models

Revisiting Entropy in Reinforcement Learning for Large Reasoning Models

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift

Events

AI for Law

Personalize Your Feed