alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Nanjing UniversityChina

Enhancing Complex Question Answering over Knowledge Graphs through Evidence Pattern Retrieval

03 Feb 2024

Nanjing University

Nanjing University researchers developed Evidence Pattern Retrieval (EPR) to improve Knowledge Graph Question Answering by explicitly modeling structural dependencies among evidence facts during subgraph extraction. The method achieved a new state-of-the-art for Information Retrieval (IR) approaches on the ComplexWebQuestions benchmark, reaching 60.6% Hits@1 and 61.2% F1 score.

#computer-science #computation-and-language #information-retrieval

Paper thumbnail

Flow-GRPO: Training Flow Matching Models via Online RL

27 Oct 2025

Shanghai AI Laboratory Nanjing University logo

Nanjing University

Researchers from MMLab, Tsinghua University, Kuaishou Technology, and Shanghai AI Lab developed Flow-GRPO, a framework that integrates online policy gradient reinforcement learning into flow matching models. This method significantly enhances capabilities in compositional image generation, visual text rendering, and human preference alignment, achieving up to 95% GenEval accuracy on SD3.5-M by addressing challenges of determinism and sampling efficiency through an ODE-to-SDE conversion and denoising reduction.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

Vertical Federated Learning: Challenges, Methodologies and Experiments

05 Aug 2024

Nanjing University of Science and Technology

Shanghai Jiao Tong University

Researchers from multiple institutions provide a comprehensive analysis of Vertical Federated Learning (VFL), establishing a general framework, identifying its distinct challenges, and evaluating solutions. The work empirically quantifies the trade-offs between privacy, communication efficiency, computational load distribution, and model performance in VFL systems.

#computer-science #distributed-parallel-and-cluster-computing #machine-learning

Paper thumbnail

Strong spectral features from asymptotic giant branch stars in distant quiescent galaxies

03 Nov 2024

University of Cambridge

Dating the ages and weighting the stellar populations in galaxies are essential steps when studying galaxy formation through cosmic times. Evolutionary population synthesis models with different input physics are used for this purpose. Moreover, the contribution from the thermally pulsing asymptotic giant branch (TP-AGB) stellar phase, which peaks for intermediate-age 0.6-2 Gyr, has been debated for decades. Here we report the detection of strong cool-star signatures in the rest-frame near-infrared spectra of three young (~1Gyr), massive (~10^10Msun) quiescent galaxies at large look-back time, z=1-2, using JWST/NIRSpec. The coexistence of oxygen- and carbon-type absorption features, spectral edges and features from rare species, such as vanadium and possibly zirconium, reveal a strong contribution from TP-AGB stars. Population synthesis models with a significant TP-AGB contribution reproduce the observations better than those with a weak TP-AGB, which are commonly used. These findings call for revisions of published stellar population fitting results, as they point to populations with lower masses and younger ages and have further implications for cosmic dust production and chemical enrichment. New generations of improved models are needed, informed by these and future observations.

#astrophysics-of-galaxies #physics

Paper thumbnail

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

28 May 2025

University of Illinois at Urbana-Champaign CUHK logo

This paper identifies and characterizes a universal policy entropy collapse in reinforcement learning for large language models (LLMs), revealing an empirical law that links performance to entropy. It further provides a mechanistic understanding of this phenomenon through covariance analysis and proposes two covariance-aware regularization methods, Clip-Cov and KL-Cov, which successfully maintain higher entropy and improve LLM reasoning performance on math and coding tasks.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Fossil Image Identification using Deep Learning Ensembles of Data Augmented Multiviews

02 Feb 2024

Nanjing University Tsinghua University logo

Tsinghua University

Identification of fossil species is crucial to evolutionary studies. Recent advances from deep learning have shown promising prospects in fossil image identification. However, the quantity and quality of labeled fossil images are often limited due to fossil preservation, conditioned sampling, and expensive and inconsistent label annotation by domain experts, which pose great challenges to training deep learning based image classification models. To address these challenges, we follow the idea of the wisdom of crowds and propose a multiview ensemble framework, which collects Original (O), Gray (G), and Skeleton (S) views of each fossil image reflecting its different characteristics to train multiple base models, and then makes the final decision via soft voting. Experiments on the largest fusulinid dataset with 2400 images show that the proposed OGS consistently outperforms baselines (using a single model for each view), and obtains superior or comparable performance compared to OOO (using three base models for three the same Original views). Besides, as the training data decreases, the proposed framework achieves more gains. While considering the identification consistency estimation with respect to human experts, OGS receives the highest agreement with the original labels of dataset and with the re-identifications of two human experts. The validation performance provides a quantitative estimation of consistency across different experts and genera. We conclude that the proposed framework can present state-of-the-art performance in the fusulinid fossil identification case study. This framework is designed for general fossil identification and it is expected to see applications to other fossil datasets in future work. The source code is publicly available at this https URL to benefit future research in fossil image identification.

#computer-science #computer-vision-security #artificial-intelligence

Paper thumbnail

Learning to Reason under Off-Policy Guidance

22 Jun 2025

yafu817

Yafu Li

jianhao-yan

Jianhao Yan

Shanghai AI Laboratory Nanjing University logo

Nanjing University

LUFFY introduces a framework that enhances Large Reasoning Models (LRMs) by integrating off-policy guidance into Reinforcement Learning with Verifiable Rewards (RLVR). This approach enables LRMs to acquire new reasoning capabilities from stronger external policies, achieving state-of-the-art performance on math benchmarks, superior generalization on out-of-distribution tasks, and successfully training weaker foundation models where on-policy methods fail.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

27 Oct 2025

Alibaba Group Nanjing University logo

Nanjing University

Researchers systematically analyzed reinforcement learning techniques for enhancing large language model reasoning, demonstrating that a minimalist combination of two empirically validated methods can consistently outperform more complex, multi-trick algorithms. The work clarifies the conditional effectiveness of various RL components across different model scales, alignment statuses, and data difficulties.

#agents #computer-science #computation-and-language

Paper thumbnail

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

11 Sep 2025

Nanjing University Institute of Automation, Chinese Academy of Science

The SpatialVID dataset provides 7,089 hours of real-world dynamic video annotated with explicit per-frame camera poses, depth maps, dynamic object masks, and detailed semantic descriptions including structured camera motion instructions. This large-scale, multimodal dataset bridges the gap between video content and 3D geometry, foundational for training advanced 3D-aware video generation and embodied AI models.

#computer-science #computer-vision-and-pattern-recognition

Paper thumbnail

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

06 Dec 2025

Monash University CSIRO

A comprehensive synthesis of Large Language Models for automated software development covers the entire model lifecycle, from data curation to autonomous agents, and offers practical guidance derived from empirical experiments on pre-training, fine-tuning, and reinforcement learning, alongside a detailed analysis of challenges and future directions.

#agentic-frameworks #agents #ai-for-cybersecurity

Paper thumbnail

A Survey on Latent Reasoning

10 Jul 2025

University of Manchester Fudan University logo

Fudan University

This comprehensive survey from a large multi-institutional collaboration examines "Latent Reasoning" in Large Language Models, an emerging paradigm that performs multi-step inference entirely within the model's high-bandwidth continuous hidden states to overcome the limitations of natural language-based explicit reasoning. It highlights the significant bandwidth advantage of latent representations (approximately 2700x higher) and provides a unified taxonomy of current methodologies.

#chain-of-thought #computer-science #computation-and-language

Paper thumbnail

A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN Inference

31 Jan 2024

Nanjing University

RISC-V processors encounter substantial challenges in deploying multi-precision deep neural networks (DNNs) due to their restricted precision support, constrained throughput, and suboptimal dataflow design. To tackle these challenges, a scalable RISC-V vector (RVV) processor, namely SPEED, is proposed to enable efficient multi-precision DNN inference by innovations from customized instructions, hardware architecture, and dataflow mapping. Firstly, dedicated customized RISC-V instructions are proposed based on RVV extensions, providing SPEED with fine-grained control over processing precision ranging from 4 to 16 bits. Secondly, a parameterized multi-precision systolic array unit is incorporated within the scalable module to enhance parallel processing capability and data reuse opportunities. Finally, a mixed multi-precision dataflow strategy, compatible with different convolution kernels and data precision, is proposed to effectively improve data utilization and computational efficiency. We perform synthesis of SPEED in TSMC 28nm technology. The experimental results demonstrate that SPEED achieves a peak throughput of 287.41 GOPS and an energy efficiency of 1335.79 GOPS/W at 4-bit precision condition, respectively. Moreover, when compared to the pioneer open-source vector processor Ara, SPEED provides an area efficiency improvement of 2.04

\times

and 1.63

\times

under 16-bit and 8-bit precision conditions, respectively, which shows SPEED's significant potential for efficient multi-precision DNN inference.

#computer-science #hardware-architecture

Paper thumbnail

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

19 Apr 2025

xuehui-wang

Huiser Wang

jie-shao

jie shao

haomin-wang

Haomin Wang

Shanghai AI Laboratory Fudan University logo

Fudan University

InternVL3 establishes a new native multimodal pre-training paradigm for MLLMs, allowing the model to jointly acquire visual and linguistic capabilities from the outset. This approach achieves state-of-the-art performance among open-source models, reaching 72.2 on the MMMU benchmark, and demonstrates strong competitiveness with leading proprietary models across a wide range of multimodal tasks.

#computer-science #computer-vision-and-pattern-recognition #few-shot-learning

Resources 7,602

Paper thumbnail

Molecular Bubble and Outflow in S Mon Revealed by Multiband Datasets

31 Jan 2024

Chinese Academy of Sciences

University of Science and Technology of China

We identify a molecular bubble, and study the star formation and its feedback in the S Mon region, using multiple molecular lines, young stellar objects (YSOs), and infrared data. We revisit the distance to S Mon, ~722+/-9 pc, using Gaia Data Release 3 parallaxes of the associated Class II YSOs. The bubble may be mainly driven by a massive binary system (namely 15 Mon), the primary of which is an O7V-type star. An outflow is detected in the shell of the bubble, suggesting ongoing star formation activities in the vicinity of the bubble. The total wind energy of the massive binary star is three orders of magnitude higher than the sum of the observed turbulent energy in the molecular gas and the kinetic energy of the bubble, indicating that stellar winds help to maintain the turbulence in the S Mon region and drive the bubble. We conclude that the stellar winds of massive stars have an impact on their surrounding environment.

#astrophysics-of-galaxies #physics

Paper thumbnail

Thyme: Think Beyond Images

15 Aug 2025

University of Science and Technology of China Nanjing University logo

Nanjing University

Thyme introduces a paradigm for multimodal large language models (MLLMs) to enhance reasoning and perception by autonomously generating and executing code for image manipulation and computation. This approach achieves substantial performance improvements across nearly 20 benchmarks, frequently outperforming larger models in high-resolution perception tasks.

#agents #computer-science #computer-vision-and-pattern-recognition

Paper thumbnail

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

17 Aug 2025

Wuhan University Alibaba Group logo

Researchers from Alibaba Group provide a comprehensive overview of unified models capable of both understanding and generating multimodal content, primarily focusing on vision and language. The work systematically classifies architectural paradigmsdiffusion-based, autoregressive-based, and fused approacheswhile detailing current challenges, emerging solutions, and future research opportunities.

#computer-science #computer-vision-and-pattern-recognition #generative-models

Paper thumbnail

AFlow: Automating Agentic Workflow Generation

15 Apr 2025

xionghuichen

Xiong-Hui Chen

bangliu

Bang Liu

jinyu-xiang

Jinyu Xiang

Fudan University Nanjing University logo

Nanjing University

AFLOW introduces an automated framework for generating and optimizing agentic workflows for Large Language Models, reformulating workflow optimization as a search problem over code-represented workflows. The system leverages Monte Carlo Tree Search with LLM-based optimization to iteratively refine workflows, yielding a 19.5% average performance improvement over existing automated methods while enabling smaller, more cost-effective LLMs to achieve performance parity with larger models.

#agent-based-systems #computer-science #artificial-intelligence

Paper thumbnail

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

28 Oct 2021

California Institute of Technology Nanjing University logo

Nanjing University

SegFormer presents an efficient and robust Transformer-based framework for semantic segmentation, outperforming prior methods in accuracy while significantly reducing model size and computational cost. The model achieves state-of-the-art results on ADE20K, Cityscapes, and COCO-Stuff, showcasing superior efficiency and robustness to common corruptions.

#computer-science #computer-vision-security #computer-vision-and-pattern-recognition

Resources 2,939

Paper thumbnail

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

18 Oct 2025

xu-zhongxing

Zhongxing Xu

South China University of Technology

California Institute of Technology

A comprehensive survey by researchers from Shanghai AI Lab and various global institutions outlines the intricate relationship between scientific large language models (Sci-LLMs) and their data foundations, tracing their evolution towards autonomous agents for scientific discovery. The paper establishes a taxonomy for scientific data and knowledge, meticulously reviews over 270 datasets and 190 benchmarks, and identifies critical data challenges alongside future paradigms.

#agentic-frameworks #agents #computer-science

Paper thumbnail

MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

01 Nov 2024

Nanjing University The Chinese University of Hong Kong, Shenzhen

MetaGPT introduces a meta-programming framework that simulates a software company with specialized LLM agents following Standardized Operating Procedures (SOPs) and an assembly line paradigm. The system significantly improves the coherence, accuracy, and executability of generated code for complex software development tasks, achieving state-of-the-art results on benchmarks like HumanEval and MBPP, and outperforming other multi-agent systems on a comprehensive software development dataset.

#computer-science #conversational-ai #artificial-intelligence

Resources 55,360

Paper thumbnail

There are no more papers matching your filters at the moment.