alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Zhejiang UniversityChina

SDiT: Spiking Diffusion Model with Transformer

24 Feb 2024

Zhejiang University ZJU-UIUC Institute

Researchers at the ZJU-UIUC Institute developed SDiT, a Spiking Diffusion Model with a Transformer backbone, which substantially advances the quality and efficiency of SNN-based image generation. The model achieves significantly improved FID scores on datasets like MNIST and Fashion-MNIST while demonstrating reduced computational cost compared to ANN counterparts.

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

Small-scale dynamics and structure of free-surface turbulence

05 Dec 2024

ETH Zurich Zhejiang University logo

Zhejiang University

The dynamics of small-scale structures in free-surface turbulence is crucial to large-scale phenomena in natural and industrial environments. Here we conduct experiments on the quasi-flat free surface of a zero-mean-flow turbulent water tank over the Reynolds number range

Re_{\lambda} = 207\textrm{--}312

. By seeding microscopic floating particles at high concentrations, the fine scales of the flow and the velocity gradient tensor are resolved. A kinematic relation is derived expressing the contribution of surface divergence and vorticity to the dissipation rate. The probability density functions of divergence, vorticity and strain-rate collapse once normalized by the Kolmogorov scales. Their magnitude displays strong intermittency and follows chi-square distributions with power-law tails at small values. The topology of high-intensity events and two-point statistics indicate that the surface divergence is characterized by dissipative spatial and temporal scales, while the high-vorticity and high-strain-rate regions are larger, long-lived, concurrent, and elongated. The second-order velocity structure functions obey the classic Kolmogorov scaling in the inertial range when the dissipation rate on the surface is considered, with a different numerical constant than in 3D turbulence. The cross-correlation among divergence, vorticity and strain-rate indicates that the surface-attached vortices are strengthened during downwellings and diffuse when those dissipate. Sources (sinks) in the surface velocity fields are associated with strong (weak) surface-parallel stretching and compression along perpendicular directions. The floating particles cluster over spatial and temporal scales larger than those of the sinks. These results demonstrate that, compared to 3D turbulence, in free-surface turbulence the energetic scales leave a stronger imprint on the small-scale quantities.

#physics #fluid-dynamics

Paper thumbnail

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

30 Mar 2024

matenure

Tengfei Ma

Zhejiang University IBM Research

Researchers at Zhejiang University developed Tram, a token-level retrieval-augmented mechanism that guides the decoder's generation of source code summaries by incorporating external knowledge. This approach established new state-of-the-art BLEU scores on four public benchmarks, including a 1.39 point improvement on Java and 1.53 points on Python, while also enhancing the generation of low-frequency, domain-specific terms.

#computer-science #artificial-intelligence #information-extraction

Paper thumbnail

On Privacy, Security, and Trustworthiness in Distributed Wireless Large AI Models (WLAM)

04 Dec 2024

Tsinghua University Zhejiang University logo

Zhejiang University

This paper presents a comprehensive conceptual framework for addressing privacy, security, and trustworthiness challenges in Distributed Wireless Large AI Models (WLAM). It synthesizes existing knowledge and proposes layered protection strategies and integrated approaches across various technical and ethical dimensions to enable robust and responsible deployment.

#autonomous-vehicles #computer-science #information-theory

Paper thumbnail

Dual-species Optical tweezer for Rb and K atoms

28 Oct 2024

Zhejiang University

The optical tweezer experiment with neutral atoms is a focal topic in cold atom physics due to its significant potential in quantum computing and simulation. Here, we present the realization of a dual-species optical tweezer for both Rb and K atoms, marking the first step towards creating a polar molecule optical tweezer array. Initially, Rb and K atoms are collected using a dual magneto-optical trap (MOT) and further cooled to 7

\mu

K for Rb and 10

\mu

K for K. By employing 850 nm tweezer beams, we demonstrate the ability to capture individual Rb or K atoms. The filling ratios of Rb and K can be finely adjusted by controlling the atomic densities of both species. Utilizing the post-selection technique, we can create a deterministic array of two-species atoms, paving the way for future polar molecule array formation.

#quantum-gases #physics #atomic-physics

Paper thumbnail

LightMem: Lightweight and Efficient Memory-Augmented Generation

26 Nov 2025

National University of Singapore Zhejiang University logo

Zhejiang University

LightMem introduces a lightweight, three-stage memory system for LLM agents, inspired by human cognition, to manage long-term interactions efficiently. This system achieves state-of-the-art accuracy on conversational benchmarks while significantly reducing computational costs, including up to a 38x reduction in total token usage and a 55x reduction in API calls across various LLM backbones.

#agents #computer-science #artificial-intelligence

Paper thumbnail

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

24 Sep 2025

Zhejiang University Shanghai AI Lab

Researchers from Shanghai AI Lab and Zhejiang University developed OmniWorld, a multi-domain and multi-modal dataset for 4D world modeling, offering over 300 million frames with comprehensive annotations including RGB, depth, and camera poses. This dataset, which combines a novel synthetic game environment with curated real-world data, establishes a benchmark that demonstrates current 3D geometric and camera-controlled video generation models exhibit limitations in complex dynamic scenarios, but fine-tuning with OmniWorld consistently improves their performance across various tasks.

#computer-science #computer-vision-and-pattern-recognition #data-curation

Paper thumbnail

Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory

09 Oct 2025

Shanghai Jiao Tong University Zhejiang University logo

Zhejiang University

M3-Agent, developed by ByteDance Seed, introduces a multimodal AI agent framework that processes continuous video and audio streams, constructs an entity-centric long-term memory, and performs multi-turn reasoning. The system achieves 30.7% accuracy on the M3-Bench-robot dataset and 48.9% on M3-Bench-web, demonstrating enhanced person understanding and cross-modal reasoning capabilities.

#agentic-frameworks #agents #computer-science

Paper thumbnail

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

22 Sep 2025

Zhejiang University Westlake University logo

Westlake University

Researchers from Beijing University of Posts and Telecommunications, Westlake University, and Zhejiang University, along with the OpenHelix Team, introduce VLA-Adapter, an efficient method to bridge vision-language representations to robotic actions. The approach enables state-of-the-art level performance with a tiny-scale 0.5B parameter backbone without robotic data pre-training, achieving a 97.3% average success rate on the LIBERO benchmark and providing a 3x faster inference speed (219.2 Hz) than comparable methods.

#computer-science #robotics

Paper thumbnail

Seismic Traveltime Tomography with Label-free Learning

13 Apr 2024

Zhejiang University Zhejiang Wanli University

Deep learning techniques have been used to build velocity models (VMs) for seismic traveltime tomography and have shown encouraging performance in recent years. However, they need to generate labeled samples (i.e., pairs of input and label) to train the deep neural network (NN) with end-to-end learning, and the real labels for field data inversion are usually missing or very expensive. Some traditional tomographic methods can be implemented quickly, but their effectiveness is often limited by prior assumptions. To avoid generating and/or collecting labeled samples, we propose a novel method by integrating deep learning and dictionary learning to enhance the VMs with low resolution by using the traditional tomography-least square method (LSQR). We first design a type of shallow and simple NN to reduce computational cost followed by proposing a two-step strategy to enhance the VMs with low resolution: (1) Warming up. An initial dictionary is trained from the estimation by LSQR through dictionary learning method; (2) Dictionary optimization. The initial dictionary obtained in the warming-up step will be optimized by the NN, and then it will be used to reconstruct high-resolution VMs with the reference slowness and the estimation by LSQR. Furthermore, we design a loss function to minimize traveltime misfit to ensure that NN training is label-free, and the optimized dictionary can be obtained after each epoch of NN training. We demonstrate the effectiveness of the proposed method through the numerical tests on both synthetic and field data.

#computer-science #machine-learning #neural-coding

Paper thumbnail

WorldVLA: Towards Autoregressive Action World Model

26 Jun 2025

Zhejiang University Hupan Lab

Researchers from DAMO Academy, Hupan Lab, and Zhejiang University developed WorldVLA, an autoregressive framework that unifies robot action generation and environmental state forecasting. This approach integrates vision, language, and action modeling with a world model, yielding superior performance on robotic manipulation tasks and improving both action execution and future state prediction.

#computer-science #artificial-intelligence #robotics

Paper thumbnail

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence

06 Dec 2025

Monash University CSIRO

A comprehensive synthesis of Large Language Models for automated software development covers the entire model lifecycle, from data curation to autonomous agents, and offers practical guidance derived from empirical experiments on pre-training, fine-tuning, and reinforcement learning, alongside a detailed analysis of challenges and future directions.

#agentic-frameworks #agents #ai-for-cybersecurity

Paper thumbnail

Thinking with Programming Vision: Towards a Unified View for Thinking with Images

03 Dec 2025

Zhejiang University ByteDance logo

Researchers from Zhejiang University and ByteDance introduced CodeVision, a "code-as-tool" framework that equips Multimodal Large Language Models (MLLMs) to programmatically interact with images. The approach significantly improves MLLM robustness by correcting common image corruptions and enables state-of-the-art multi-tool reasoning through emergent tool use and error recovery.

#adversarial-robustness #agents #computer-science

Paper thumbnail

OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects

01 Feb 2024

Zhejiang University

University of California, San Diego

We introduce OpenIllumination, a real-world dataset containing over 108K images of 64 objects with diverse materials, captured under 72 camera views and a large number of different illuminations. For each image in the dataset, we provide accurate camera parameters, illumination ground truth, and foreground segmentation masks. Our dataset enables the quantitative evaluation of most inverse rendering and material decomposition methods for real objects. We examine several state-of-the-art inverse rendering methods on our dataset and compare their performances. The dataset and code can be found on the project page: this https URL.

#computer-science #computer-vision-security #computer-vision-and-pattern-recognition

Paper thumbnail

Gradient-Free Score-Based Sampling Methods with Ensembles

31 May 2025

Zhejiang University

Recent developments in generative modeling have utilized score-based methods coupled with stochastic differential equations to sample from complex probability distributions. However, these and other performant sampling methods generally require gradients of the target probability distribution, which can be unavailable or computationally prohibitive in many scientific and engineering applications. Here, we introduce ensembles within score-based sampling methods to develop gradient-free approximate sampling techniques that leverage the collective dynamics of particle ensembles to compute approximate reverse diffusion drifts. We introduce the underlying methodology, emphasizing its relationship with generative diffusion models and the previously introduced F\"ollmer sampler. We demonstrate the efficacy of the ensemble strategies through various examples, ranging from low- to medium-dimensionality sampling problems, including multi-modal and highly non-Gaussian probability distributions, and provide comparisons to traditional methods like the No-U-Turn Sampler. Additionally, we showcase these strategies in the context of a high-dimensional Bayesian inversion problem within the geophysical sciences. Our findings highlight the potential of ensemble strategies for modeling complex probability distributions in situations where gradients are unavailable.

#computer-science #machine-learning #computation

Paper thumbnail

It Takes Two: Your GRPO Is Secretly DPO

01 Oct 2025

Huawei Noah’s Ark Lab Zhejiang University logo

Zhejiang University

Research establishes a theoretical link between Group Relative Policy Optimization (GRPO) and Direct Preference Optimization (DPO) by reinterpreting GRPO as a contrastive learning objective. This insight leads to "2-GRPO," a variant that achieves comparable mathematical reasoning performance to standard GRPO while reducing training time by over 70% and requiring only 1/8 of the rollouts.

#agents #computer-science #contrastive-learning

Paper thumbnail

A Physics-Informed Indirect Method for Trajectory Optimization

20 Aug 2024

Zhejiang University Huanjiang Lab

This work presents a Physics-Informed Indirect Method (PIIM) that propagates the dynamics of both states and co-states backward in time for trajectory optimization problems. In the case of a Time-Optimal Soft Landing Problem (TOSLP), based on the initial co-state vector normalization technique, we show that the initial guess of the mass co-state and the numerical factor can be eliminated from the shooting procedure. As a result, the initial guess of the unknown co-states can be constrained to lie on a unit 3-D hypersphere. Then, using the PIIM allows one to exploit the physical significance of the optimal control law, which further narrows down the solution space to a unit 3-D octant sphere. Meanwhile, the analytical estimations of the fuel consumption and final time are provided. Additionally, a usually overlooked issue that results in an infeasible solution with a negative final time, is fixed by a simple remedy strategy. Consequently, the reduced solution space becomes sufficiently small to ensure fast, robust, and guaranteed convergence for the TOSLP. Then, we extend the PIIM to solve the Fuel-Optimal Soft Landing Problem (FOSLP) with a homotopy approach. The numerical simulations show that compared with the conventional indirect method with a success rate of 89.35%, it takes a shorter time for the proposed method to find the feasible solution to the FOSLP with a success rate of 100%.

#mathematics #optimization-and-control

Paper thumbnail

Uniform Discrete Diffusion with Metric Path for Video Generation

28 Oct 2025

Chinese Academy of Sciences Zhejiang University logo

Zhejiang University

URSA presents a uniform discrete diffusion framework that incorporates a metric probability path for video generation, enabling iterative global refinement in discrete token space. This framework achieves performance competitive with state-of-the-art continuous diffusion models across text-to-video, image-to-video, and text-to-image benchmarks, while enhancing scalability and multi-task capabilities.

#computer-science #computer-vision-and-pattern-recognition #fine-tuning

Paper thumbnail

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

23 Sep 2025

Tongji University Zhejiang University logo

Zhejiang University

The ReSearch framework enables Large Language Models to integrate multi-step reasoning with external search, learning interactively via reinforcement learning without supervised intermediate steps. It yields substantial performance gains on complex multi-hop question answering benchmarks and reveals emergent self-correction capabilities.

#agents #chain-of-thought #computer-science

Paper thumbnail

MemOS: A Memory OS for AI System

03 Dec 2025

jiahao-huo

Jiahao Huo

Tongji University

University of Science and Technology of China

MemOS, a memory operating system for AI systems, redefines memory as a first-class system resource to address current Large Language Model limitations in long-context reasoning, continuous personalization, and knowledge evolution. This framework unifies heterogeneous memory types (plaintext, activation, parameter) using a standardized MemCube unit, achieving superior performance on benchmarks like LoCoMo and PreFEval, and demonstrating robust, low-latency memory operations.

#computer-science #continual-learning #computation-and-language

Resources 2,562

Paper thumbnail

There are no more papers matching your filters at the moment.