alphaXiv

History

Papers Benchmarks

RIKEN

2,048

05 Dec 2024

Simulating matrix models with tensor networks

RIKEN Deutsches Elektronen-Synchrotron DESY The Cyprus Institute Quantinuum K.K.

Matrix models, as quantum mechanical systems without explicit spatial dependence, provide valuable insights into higher-dimensional gauge and gravitational theories, especially within the framework of string theory, where they can describe quantum black holes via the holographic principle. Simulating these models allows for exploration of their kinematic and dynamic properties, particularly in parameter regimes that are analytically intractable. In this study, we examine the potential of tensor network techniques for such simulations. Specifically, we construct ground states as matrix product states and analyse features such as their entanglement structure.

5,197

14 May 2025

computer-science robotics

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Google DeepMind

University of Illinois at Urbana-Champaign University of Freiburg

Carnegie Mellon University

Imperial College London

University of Southern California

New York University

Shanghai Jiao Tong University

the University of Tokyo

Stanford University

The University of Texas at Austin University of Technology Nuremberg

ETH Zürich

University of California, San Diego

RIKEN

Google Research

Columbia University

Arizona State University urich German Aerospace Center Istituto Italiano di Tecnologia Max Planck Institute Queensland University of Technology at Darmstadt Korea Advanced Institute of Science & Technology Intrinsic LLC Flexiv Robotics Technische Universit

Anikait Singh

Haochen Shi

The OpenX-Embodiment Collaboration released the Open X-Embodiment (OXE) Dataset, a consolidated collection of over 1 million real robot trajectories from 22 embodiments. This work demonstrates that large RT-X models trained on such diverse data achieve positive transfer and emergent skills across different robot platforms.

226

1,257

06 Feb 2025

adversarial-robustness computer-science artificial-intelligence

On Effects of Steering Latent Representation for Large Language Model Unlearning

RIKEN JAIST Vietnam National University, Hanoi

Tien Dang Huu

This study provides a theoretical foundation for the Representation Misdirection for Unlearning (RMU) method, explaining its impact on token confidence and adversarial robustness. It introduces Adaptive RMU, which overcomes the original method's ineffectiveness in deeper LLM layers by dynamically adjusting the unlearning target, leading to improved and more consistent unlearning performance across various layers.

2,797

25 Sep 2025

adversarial-robustness computer-science computation-and-language

Improving LLM Unlearning Robustness via Random Perturbations

RIKEN JAIST VNU University of Engineering and Technology

Researchers from JAIST, VNU-UET, Monash University, and RIKEN demonstrate that unlearned Large Language Models exhibit fragility, misbehaving when benign queries inadvertently contain forget-tokens. They introduce Random Noise Augmentation (RNA), a solution that recovers an average of 66.3% and 51.7% accuracy for Representation Misdirection and Preference Optimization unlearning methods, respectively, on perturbed evaluation tasks while preserving core model performance.

2,730

02 Aug 2025

adversarial-attacks adversarial-robustness computer-science

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

This comprehensive survey systematically reviews current safety research across six major large AI model paradigms and autonomous agents, presenting a detailed taxonomy of 10 attack types and corresponding defense strategies. The review identifies a predominant focus on attack methodologies (60% of papers) over defenses and outlines key open challenges for advancing AI safety.

783

27 Feb 2025

computer-science artificial-intelligence computation-and-language

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Tohoku University

Kyoto University

RIKEN Sakana AI NINJAL

Researchers at Sakana AI and collaborating institutions introduced Temporally Adaptive Interpolated Distillation (TAID), a method that dynamically interpolates student and teacher distributions to overcome challenges in knowledge transfer. This approach enabled the creation of TAID-LLM-1.5B, which achieved a new state-of-the-art score (52.27 on LightEval) among models under 2B parameters, and TAID-VLM-2B, outperforming larger vision-language models.

114

30 Sep 2025

mesoscale-and-nanoscale-physics strongly-correlated-electrons general-relativity-and-quantum-cosmology

Quantum Metric Corrections to Liouville's Theorem and Chiral Kinetic Theory

Tokyo University of Science

RIKEN Keio University

A theoretical framework demonstrates how the quantum metric, a core concept in quantum geometry, modifies Liouville's theorem and the dynamics of chiral kinetic theory, expanding its implications across various physical systems.

383

25 Mar 2025

attention-mechanisms computer-science artificial-intelligence

Frequency Dynamic Convolution for Dense Image Prediction

Chinese Academy of Sciences

the University of Tokyo

Tsinghua University

RIKEN Hangzhou Dianzi University Beĳing Institute of Technology

Researchers at the Beijing Institute of Technology and collaborators developed Frequency Dynamic Convolution (FDConv), a method for adaptive deep learning models that constructs diverse convolution kernel weights directly in the Fourier domain. FDConv achieves competitive or superior performance across object detection, instance segmentation, and semantic segmentation benchmarks while significantly reducing the parameter overhead compared to prior dynamic convolution techniques.

107

27 Oct 2025

clustering-algorithms computer-science computation-and-language

Detecting and Rectifying Noisy Labels: A Similarity-based Approach

RIKEN JAIST

A similarity-based method leverages penultimate feature representations to detect and rectify noisy labels in deep learning datasets. This post-hoc, model-agnostic approach demonstrates superior robustness compared to confidence and gradient-based methods, particularly against systematic ambiguity and concentrated noise, leading to improved model generalization.

322

02 Nov 2024

computer-science machine-learning efficient-transformers

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining

the University of Tokyo

RIKEN

Microsoft

University of Minnesota

Andi Han

SLTrain proposes a reparameterization for pretraining large language models (LLMs) by representing weight matrices as a sum of low-rank and sparse components. This method, developed by researchers from RIKEN AIP, University of Minnesota, and Microsoft, reduces memory requirements by up to 73% for LLaMA 7B models and halves trainable parameters for LLaMA 1B while maintaining perplexity performance comparable to full-rank pretraining.

22 Oct 2025

physics quantum-physics

Variational Quantum Algorithm for Unitary Dilation

University of Michigan

RIKEN Shenzhen University Northeast Normal University Fuzhou University Agency for Science Technology and Research (A*STAR)Quantum Science Center of Guangdong-Hong Kong-Macao Greater Bay Area Yanbian University

We introduce a hybrid quantum-classical framework for efficiently implementing approximate unitary dilations of non-unitary operators with enhanced noise resilience. The method embeds a target non-unitary operator into a subblock of a unitary matrix generated by a parameterized quantum circuit with universal expressivity, while a classical optimizer adjusts circuit parameters under the global unitary constraint. As a representative application, we consider the non-unitary propagator of a Lindbladian superoperator acting on the vectorized density matrix, which is relevant for simulating open quantum systems. We further validate the approach experimentally on superconducting devices in the Quafu quantum cloud computing cluster. Compared with standard dilation protocols, our method significantly reduces quantum resource requirements and improves robustness against device noise, achieving high-fidelity simulation. Its generality also enables compatibility with non-Markovian dynamics and Kraus-operator-based evolutions, providing a practical pathway for the noise-resilient simulation of non-unitary processes on near-term quantum hardware.

412

13 Aug 2024

computer-science computer-vision-and-pattern-recognition few-shot-learning

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Ant Group Nankai University

RIKEN

PromptKD introduces an unsupervised prompt distillation framework for Vision-Language Models like CLIP, enabling knowledge transfer from large teachers to lightweight students using unlabeled domain images. It achieves state-of-the-art performance, improving harmonic mean accuracy by 3.76% over PromptSRC across 11 datasets, while significantly reducing inference costs by pre-storing teacher text features.

268

288

15 Mar 2025

computer-science artificial-intelligence computation-and-language

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Tohoku University

RIKEN Sakana AI Institute of Science Tokyo NII LLMC

Taishi Nakamura

The Mixture of Experts (MoE) architecture reduces the training and inference cost significantly compared to a dense model of equivalent capacity. Upcycling is an approach that initializes and trains an MoE model using a pre-trained dense model. While upcycling leads to initial performance gains, the training progresses slower than when trained from scratch, leading to suboptimal performance in the long term. We propose Drop-Upcycling - a method that effectively addresses this problem. Drop-Upcycling combines two seemingly contradictory approaches: utilizing the knowledge of pre-trained dense models while statistically re-initializing some parts of the weights. This approach strategically promotes expert specialization, significantly enhancing the MoE model's efficiency in knowledge acquisition. Extensive large-scale experiments demonstrate that Drop-Upcycling significantly outperforms previous MoE construction methods in the long term, specifically when training on hundreds of billions of tokens or more. As a result, our MoE model with 5.9B active parameters achieves comparable performance to a 13B dense model in the same model family, while requiring approximately 1/4 of the training FLOPs. All experimental resources, including source code, training data, model checkpoints and logs, are publicly available to promote reproducibility and future research on MoE.

27 Sep 2022

physics chemical-physics quantum-physics

Variational Quantum Computation of Molecular Linear Response Properties on a Superconducting Quantum Processor

Chinese Academy of Sciences

Beijing Normal University Nankai University

RIKEN Northwest University

Simulating response properties of molecules is crucial for interpreting experimental spectroscopies and accelerating materials design. However, it remains a long-standing computational challenge for electronic structure methods on classical computers. While quantum computers hold the promise to solve this problem more efficiently in the long run, existing quantum algorithms requiring deep quantum circuits are infeasible for near-term noisy quantum processors. Here, we introduce a pragmatic variational quantum response (VQR) algorithm for response properties, which circumvents the need for deep quantum circuits. Using this algorithm, we report the first simulation of linear response properties of molecules including dynamic polarizabilities and absorption spectra on a superconducting quantum processor. Our results indicate that a large class of important dynamical properties such as Green's functions are within the reach of near-term quantum hardware using this algorithm in combination with suitable error mitigation techniques.

133

30 Jan 2025

statistical-mechanics strongly-correlated-electrons high-energy-physics-theory

Threefold Way for Typical Entanglement

RIKEN

University of Tokyo

Haruki YAGI

This work establishes a classification for the typical entanglement spectra of symmetric quantum states by providing a physical realization for the Laguerre Symplectic Ensemble (LSE) using the concept of symmetry fractionalization. It demonstrates that the entanglement spectrum of any such state universally decomposes into blocks governed by the three fundamental Laguerre ensembles, based on properties of the symmetry group's irreducible representations.

304

07 May 2022

computer-science computer-vision-security computer-vision-and-pattern-recognition

ScanQA: 3D Question Answering for Spatial Scene Understanding

Kyoto University

RIKEN RIKEN AIP ATR JST Presto

Taiki Miyanishi

This work introduces 3D Question Answering (3D-QA), a task where models answer free-form questions about 3D indoor scenes and localize relevant objects. Researchers at Kyoto University, ATR, and RIKEN AIP created ScanQA, a large-scale human-annotated dataset of over 41,000 question-answer pairs with 3D object groundings, and proposed an end-to-end baseline model that outperforms 2D and pipeline-based 3D approaches.

141

13 Nov 2025

statistical-mechanics general-relativity-and-quantum-cosmology high-energy-physics-theory

Work distribution and fluctuation theorem in AdS/CFT

RIKEN

The paper establishes a holographic duality for the quantum work distribution and the Tasaki-Crooks fluctuation theorem (TC-FT) within the AdS/CFT correspondence. It provides a bulk prescription for non-equilibrium work by mapping boundary characteristic functions to gravitational Schwinger-Keldysh path integrals, verifying the TC-FT in a perturbative AdS3/CFT2 model where average work correlates with Brown-York energy changes.

26 Nov 2025

computer-science artificial-intelligence computation-and-language

Mechanism of Task-oriented Information Removal in In-context Learning

University of Chicago

RIKEN

University of Tokyo JAIST

In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.

193

25 Sep 2025

computer-science machine-learning geometric-deep-learning

Why and When Deep is Better than Shallow: An Implementation-Agnostic State-Transition View of Depth Supremacy

Kyoto University

RIKEN NTT Corporation The University of Osaka

Why and when is deep better than shallow? We answer this question in a framework that is agnostic to network implementation. We formulate a deep model as an abstract state-transition semigroup acting on a general metric space, and separate the implementation (e.g., ReLU nets, transformers, and chain-of-thought) from the abstract state transition. We prove a bias-variance decomposition in which the variance depends only on the abstract depth-

k

network and not on the implementation (Theorem 1). We further split the bounds into output and hidden parts to tie the depth dependence of the variance to the metric entropy of the state-transition semigroup (Theorem 2). We then investigate implementation-free conditions under which the variance grow polynomially or logarithmically with depth (Section 4). Combining these with exponential or polynomial bias decay identifies four canonical bias-variance trade-off regimes (EL/EP/PL/PP) and produces explicit optimal depths

k^\ast

. Across regimes,

k^\ast>1

typically holds, giving a rigorous form of depth supremacy. The lowest generalization error bound is achieved under the EL regime (exp-decay bias + log-growth variance), explaining why and when deep is better, especially for iterative or hierarchical concept classes such as neural ODEs, diffusion/score models, and chain-of-thought reasoning.

188

09 Jun 2025

computer-science computer-vision-security computer-vision-and-pattern-recognition

Frequency-Adaptive Dilated Convolution for Semantic Segmentation

the University of Tokyo

RIKEN Beĳing Institute of Technology

Dilated convolution, which expands the receptive field by inserting gaps between its consecutive elements, is widely employed in computer vision. In this study, we propose three strategies to improve individual phases of dilated convolution from the view of spectrum analysis. Departing from the conventional practice of fixing a global dilation rate as a hyperparameter, we introduce Frequency-Adaptive Dilated Convolution (FADC), which dynamically adjusts dilation rates spatially based on local frequency components. Subsequently, we design two plug-in modules to directly enhance effective bandwidth and receptive field size. The Adaptive Kernel (AdaKern) module decomposes convolution weights into low-frequency and high-frequency components, dynamically adjusting the ratio between these components on a per-channel basis. By increasing the high-frequency part of convolution weights, AdaKern captures more high-frequency components, thereby improving effective bandwidth. The Frequency Selection (FreqSelect) module optimally balances high- and low-frequency components in feature representations through spatially variant reweighting. It suppresses high frequencies in the background to encourage FADC to learn a larger dilation, thereby increasing the receptive field for an expanded scope. Extensive experiments on segmentation and object detection consistently validate the efficacy of our approach. The code is publicly available at this https URL

130

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Simulating matrix models with tensor networks

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

On Effects of Steering Latent Representation for Large Language Model Unlearning

Improving LLM Unlearning Robustness via Random Perturbations

Safety at Scale: A Comprehensive Survey of Large Model and Agent Safety

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Quantum Metric Corrections to Liouville's Theorem and Chiral Kinetic Theory

Frequency Dynamic Convolution for Dense Image Prediction

Detecting and Rectifying Noisy Labels: A Similarity-based Approach

SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining

Variational Quantum Algorithm for Unitary Dilation

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Variational Quantum Computation of Molecular Linear Response Properties on a Superconducting Quantum Processor

Threefold Way for Typical Entanglement

ScanQA: 3D Question Answering for Spatial Scene Understanding

Work distribution and fluctuation theorem in AdS/CFT

Mechanism of Task-oriented Information Removal in In-context Learning

Why and When Deep is Better than Shallow: An Implementation-Agnostic State-Transition View of Depth Supremacy

Frequency-Adaptive Dilated Convolution for Semantic Segmentation

Events

AI for Law

Personalize Your Feed