alphaXiv

History

Papers Benchmarks

Institute of Science Tokyo

532

05 Sep 2025

computer-science computer-vision-and-pattern-recognition generative-models

High-resolution efficient image generation from WiFi CSI using a pretrained latent diffusion model

Institute of Science Tokyo

We present LatentCSI, a novel method for generating images of the physical environment from WiFi CSI measurements that leverages a pretrained latent diffusion model (LDM). Unlike prior approaches that rely on complex and computationally intensive techniques such as GANs, our method employs a lightweight neural network to map CSI amplitudes directly into the latent space of an LDM. We then apply the LDM's denoising diffusion model to the latent representation with text-based guidance before decoding using the LDM's pretrained decoder to obtain a high-resolution image. This design bypasses the challenges of pixel-space image generation and avoids the explicit image encoding stage typically required in conventional image-to-image pipelines, enabling efficient and high-quality image synthesis. We validate our approach on two datasets: a wide-band CSI dataset we collected with off-the-shelf WiFi devices and cameras; and a subset of the publicly available MM-Fi dataset. The results demonstrate that LatentCSI outperforms baselines of comparable complexity trained directly on ground-truth images in both computational efficiency and perceptual quality, while additionally providing practical advantages through its unique capacity for text-guided controllability.

5,630

23 Oct 2025

attention-mechanisms computer-science artificial-intelligence

UMoE: Unifying Attention and FFN with Shared Experts

The Chinese University of Hong Kong Institute of Science Tokyo Hong Kong Polytechnic University

Yuanhang Yang

By reformulating multi-head attention to reveal an intrinsic FFN-like structure, UMoE introduces a unified Mixture-of-Experts architecture that integrates shared experts across both attention and FFN layers. This approach consistently improves language modeling perplexity and zero-shot performance across various tasks, while enhancing parameter efficiency in large language models.

332

26 Sep 2025

agents computer-science artificial-intelligence

TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them

Google DeepMind

National University of Singapore

Nanjing University

Westlake University

Peking University Southeast University Institute of Science Tokyo

TrustJudge introduces a probabilistic framework to systematically mitigate two fundamental inconsistencies—score-comparison and pairwise transitivity—within LLM-as-a-judge evaluation. The method significantly reduces conflict ratios and non-transitivity rates by employing distribution-sensitive scoring and likelihood-aware aggregation, while maintaining or enhancing evaluation accuracy across various large language models and tasks.

170

01 Oct 2025

computer-science artificial-intelligence machine-learning

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

Imperial College London Institute of Science Tokyo

AdaBlock-dLLM introduces a training-free scheduler that dynamically adjusts block sizes in diffusion-based Large Language Models (dLLMs) during inference. This approach improves generation accuracy by up to 5.3% while maintaining or enhancing throughput, particularly when integrated with Key-Value caching.

6,926

04 Jul 2025

computer-science artificial-intelligence machine-learning

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

National Institute of Advanced Industrial Science and Technology Institute of Science Tokyo

Taishi Nakamura

Researchers at Institute of Science Tokyo and AIST introduced a "transform-and-retain" paradigm for LLM pre-training data, actively rewriting existing corpora with LLMs to enhance quality. This approach led to a 17.0 pass@1 point increase on HumanEval for code and a 12.4 accuracy point increase on GSM8K for math in continual pre-training of Llama-3.1-8B.

6,091

24 Jan 2025

computer-science artificial-intelligence computation-and-language

Transformer-Squared: Self-adaptive LLMs

Sakana AI Institute of Science Tokyo

Qi Sun

Researchers at Sakana AI developed Transformer-Squared, a framework enabling large language models to self-adapt dynamically to diverse tasks in real-time. It leverages Singular Value Fine-tuning (SVF) to create highly efficient, composable "expert" vectors and employs a two-pass inference mechanism, demonstrating performance gains over LoRA and the ability to transfer experts across different base models.

1,150

124

01 Dec 2025

computer-science contrastive-learning computer-vision-and-pattern-recognition

PowerCLIP: Powerset Alignment for Contrastive Pre-Training

University of Oxford Institute of Science Tokyo AIST

Kawamura et al. introduce PowerCLIP, a pre-training framework that aligns combinations of image regions with structured textual phrases to enhance compositional understanding in vision-language models. It achieves state-of-the-art performance, including a 7.1% average Top-1 accuracy gain over CLIP on zero-shot classification and a 4.3% average Recall@1 gain on image-text retrieval benchmarks.

115

29 Sep 2025

computer-science computer-vision-and-pattern-recognition embedding-methods

FreeRet: MLLMs as Training-Free Retrievers

Shanghai AI Laboratory

Nanjing University

Zhejiang University

Shanghai Jiaotong University Institute of Science Tokyo

FreeRet is a training-free framework that transforms any off-the-shelf Multimodal Large Language Model (MLLM) into a competitive two-stage retriever, achieving state-of-the-art performance on multimodal benchmarks without requiring additional training or data. The approach demonstrates that MLLMs can efficiently serve both as embedders for candidate search and as precise rerankers.

109

02 Oct 2025

physics quantum-physics

Quantum State Recovery via Direct Sum Formalism Without Measurement Outcomes

Tohoku University Institute of Science Tokyo Sigma-i Co., Ltd.Kumamoto University

Taiga Suzuki

This study proposes a new approach to quantum state recovery following measurement. Specifically, we introduce a special operation that transfers the probability amplitude of the quantum state into its orthogonal complement. This operation is followed by a measurement performed on this orthogonal subspace, enabling the undisturbed original quantum state to be regained. Remarkably, this recovery is achieved without dependence of the post-measurement operation on the measurement outcome, thus allowing the recovery without historical dependence. This constitutes a highly nontrivial phenomenon. From the operational perspective, as the no-cloning theorem forbids perfect and probabilistic cloning of arbitrary quantum states, and traditional post-measurement reversal methods typically rely on operations contingent on the measurement outcomes, it questions fundamental assumptions regarding the necessity of historic dependence. From an informational perspective, since this recovery method erases the information about the measurement outcome, it's intriguing that the information can be erased without accessing the measurement outcome. These results imply the operational and informational non-triviality formulated in a direct-sum Hilbert space framework.

101

23 Sep 2025

image-and-video-processing electrical-engineering

FlashGMM: Fast Gaussian Mixture Entropy Model for Learned Image Compression

Institute of Science Tokyo

Waseda University

FlashGMM presents a redesigned entropy coding algorithm for learned image compression that resolves the computational bottleneck of Gaussian Mixture Models (GMMs). This approach eliminates the need for CDF lookup tables, achieving up to a 90x speedup over prior GMM implementations while slightly improving rate-distortion performance by 0.26% BD-Rate.

29 Sep 2025

computer-science artificial-intelligence computation-and-language

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

University of Freiburg

Northeastern University

Carnegie Mellon University Institute of Science Tokyo University of Montreal IP Paris NASK Salesforce Montreal Institute for Learning Algorithms LAION Juelich Supercomputing Center Open-Ψ(Open-Sci) Collective Research Center Juelich DeepTensor AB Ontocord Detomo Inc ELLIS Institute Tuebingen RSS Lab cole Polytechnique

Victor May

MixtureVitae introduces an open, web-scale pretraining dataset that minimizes legal and ethical risks by using permissive-first text sources, augmented with high-quality instruction and reasoning data. Models trained on this corpus achieve performance competitive with those trained on non-permissive data, and demonstrate an order-of-magnitude improvement in math and coding abilities over other permissive datasets.

30 Sep 2025

computer-science machine-learning domain-adaptation

How Does Preconditioning Guide Feature Learning in Deep Neural Networks?

Nanyang Technological University Institute of Science Tokyo Agency for Science Technology and Research (A*STAR)

Preconditioning is widely used in machine learning to accelerate convergence on the empirical risk, yet its role on the expected risk remains underexplored. In this work, we investigate how preconditioning affects feature learning and generalization performance. We first show that the input information available to the model is conveyed solely through the Gram matrix defined by the preconditioner's metric, thereby inducing a controllable spectral bias on feature learning. Concretely, instantiating the preconditioner as the

p

-th power of the input covariance matrix and within a single-index teacher model, we prove that in generalization, the exponent

p

and the alignment between the teacher and the input spectrum are crucial factors. We further investigate how the interplay between these factors influences feature learning from three complementary perspectives: (i) Robustness to noise, (ii) Out-of-distribution generalization, and (iii) Forward knowledge transfer. Our results indicate that the learned feature representations closely mirror the spectral bias introduced by the preconditioner -- favoring components that are emphasized and exhibiting reduced sensitivity to those that are suppressed. Crucially, we demonstrate that generalization is significantly enhanced when this spectral bias is aligned with that of the teacher.

13 Oct 2025

computer-science computer-vision-and-pattern-recognition multi-modal-learning

ExpVid: A Benchmark for Experiment Video Understanding & Reasoning

Shanghai AI Laboratory

Nanjing University Institute of Science Tokyo

Researchers from Shanghai AI Laboratory, Institute of Science Tokyo, and Nanjing University developed EXPVID, the first benchmark for scientific experiment video understanding and reasoning, leveraging JoVE videos and peer-reviewed papers. The benchmark evaluates Multimodal Large Language Models (MLLMs) across perception, procedural understanding, and scientific reasoning tasks, revealing that proprietary models like GPT-5 and Gemini-2.5 significantly outperform open-source counterparts in complex scientific contexts.

288

15 Mar 2025

computer-science artificial-intelligence computation-and-language

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Tohoku University

RIKEN Sakana AI Institute of Science Tokyo NII LLMC

Taishi Nakamura

The Mixture of Experts (MoE) architecture reduces the training and inference cost significantly compared to a dense model of equivalent capacity. Upcycling is an approach that initializes and trains an MoE model using a pre-trained dense model. While upcycling leads to initial performance gains, the training progresses slower than when trained from scratch, leading to suboptimal performance in the long term. We propose Drop-Upcycling - a method that effectively addresses this problem. Drop-Upcycling combines two seemingly contradictory approaches: utilizing the knowledge of pre-trained dense models while statistically re-initializing some parts of the weights. This approach strategically promotes expert specialization, significantly enhancing the MoE model's efficiency in knowledge acquisition. Extensive large-scale experiments demonstrate that Drop-Upcycling significantly outperforms previous MoE construction methods in the long term, specifically when training on hundreds of billions of tokens or more. As a result, our MoE model with 5.9B active parameters achieves comparable performance to a 13B dense model in the same model family, while requiring approximately 1/4 of the training FLOPs. All experimental resources, including source code, training data, model checkpoints and logs, are publicly available to promote reproducibility and future research on MoE.

30 Apr 2025

cosmology-and-nongalactic-astrophysics general-relativity-and-quantum-cosmology physics

Towards model-independent identification of lensed gravitational waves using Kramers-Kronig relation

Institute of Science Tokyo Inter-University Centre for Astronomy and Astrophysics

Observations of microlensed gravitational waves (GWs) emanated by compact binary coalescences (CBCs) are essential for studying the mass density distribution in the universe, including black holes and dark matter halos. However, no confident detection of microlensed GWs have been reported to date. There are two important challenges in the identification of microlensed GWs. The first is that the source waveform and lens structure models are not known a-priori. The second is that certain classes of unlensed GWs could mimic microlensed GWs, resulting in undesirable false alarms. In this work, we propose to use the Kramers-Kronig relation for gravitational lensing systems. We argue that such systems are essentially linear response systems obeying causality, where KK relation must hold. The power of this method lies in the fact that microlensed GWs, regardless of the lens structure, must obey KK relation, while unlensed GW events are not in general expected to obey it. This, in principle, allows us to identify microlensed GWs while dismissing microlensing mimickers. We provide the first important steps towards a methodology that exploits KK relation, and test its usefulness under idealized conditions.

163

25 Sep 2025

computer-science artificial-intelligence computation-and-language

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Institute of Science Tokyo

Empirical scaling laws have driven the evolution of large language models (LLMs), yet their coefficients shift whenever the model architecture or data pipeline changes. Mixture-of-Experts (MoE) models, now standard in state-of-the-art systems, introduce a new sparsity dimension that current dense-model frontiers overlook. We investigate how MoE sparsity influences two distinct capability regimes: memorization skills and reasoning skills. By training MoE families that vary total parameters, active parameters, and top-

k

routing under fixed compute budgets, we disentangle pre-training loss from downstream accuracy. Our results reveal two principles. First, Active FLOPs: models with identical training loss but greater active compute achieve higher reasoning accuracy. Second, Total tokens per parameter (TPP): memorization tasks improve with more parameters, while reasoning tasks benefit from optimal TPP, indicating that reasoning is data-hungry. Neither reinforcement learning post-training (GRPO) nor increased test-time compute alters these trends. We therefore argue that optimal MoE sparsity must be determined jointly by active FLOPs and TPP, revising the classical picture of compute-optimal scaling. Our model checkpoints, code and logs are open-source at this https URL.

08 Oct 2025

computer-science computation-and-language cryptography-and-security

Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

Institute of Science Tokyo NEC Corporation

The DP-SynRAG framework generates a differentially private synthetic RAG database using LLMs and a multi-stage process involving private clustering and text generation. This approach enables RAG systems to process an unlimited number of queries under a fixed privacy budget, outperforming previous per-query DP methods in scalability and achieving robust privacy against leakage while maintaining high accuracy for RAG tasks.

25 Sep 2025

computer-science computer-vision-and-pattern-recognition generative-models

SimDiff: Simulator-constrained Diffusion Model for Physically Plausible Motion Generation

Nanyang Technological University Institute of Science Tokyo

Waseda University

Generating physically plausible human motion is crucial for applications such as character animation and virtual reality. Existing approaches often incorporate a simulator-based motion projection layer to the diffusion process to enforce physical plausibility. However, such methods are computationally expensive due to the sequential nature of the simulator, which prevents parallelization. We show that simulator-based motion projection can be interpreted as a form of guidance, either classifier-based or classifier-free, within the diffusion process. Building on this insight, we propose SimDiff, a Simulator-constrained Diffusion Model that integrates environment parameters (e.g., gravity, wind) directly into the denoising process. By conditioning on these parameters, SimDiff generates physically plausible motions efficiently, without repeated simulator calls at inference, and also provides fine-grained control over different physical coefficients. Moreover, SimDiff successfully generalizes to unseen combinations of environmental parameters, demonstrating compositional generalization.

412

08 Sep 2025

computer-science information-theory physics

Quantum Error Correction near the Coding Theoretical Bound

Institute of Science Tokyo

Kenta Kasai

Recent progress in quantum computing has enabled systems with tens of reliable logical qubits, built from thousands of noisy physical qubits. However, many impactful applications demand quantum computations with millions of logical qubits, necessitating highly scalable quantum error correction. In classical information theory, low-density parity-check (LDPC) codes can approach channel capacity efficiently. Yet, no quantum error-correcting codes with efficient decoding have been shown to approach the hashing bound - a fundamental limit on quantum capacity - despite decades of research. Here, we present quantum LDPC codes that not only approach the hashing bound but also allow decoding with computational cost linear in the number of physical qubits. This breakthrough paves the way for large-scale, fault-tolerant quantum computation. Combined with emerging hardware that manages many qubits, our approach brings quantum solutions to important real-world problems significantly closer to reality.

06 Oct 2025

computer-science computation-and-language data-curation

Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages

Tohoku University

National University of Singapore

Georgia Institute of Technology Sungkyunkwan University

University of Michigan

University of Copenhagen Indian Institute of Technology Delhi Institute of Science Tokyo Samsung R&D Institute Philippines Takenote.ai

Trung tt

Camellia introduces a new benchmark to quantify entity-centric cultural biases in Large Language Models across nine Asian languages and six distinct Asian cultures. The evaluation reveals that current LLMs exhibit a 30-40% preference for Western entities in culturally-grounded contexts, demonstrate varied sentiment associations, and show significant performance disparities (12-20% accuracy gaps) in extracting Asian-associated entities.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

High-resolution efficient image generation from WiFi CSI using a pretrained latent diffusion model

UMoE: Unifying Attention and FFN with Shared Experts

TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them

AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Transformer-Squared: Self-adaptive LLMs

PowerCLIP: Powerset Alignment for Contrastive Pre-Training

FreeRet: MLLMs as Training-Free Retrievers

Quantum State Recovery via Direct Sum Formalism Without Measurement Outcomes

FlashGMM: Fast Gaussian Mixture Entropy Model for Learned Image Compression

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

How Does Preconditioning Guide Feature Learning in Deep Neural Networks?

ExpVid: A Benchmark for Experiment Video Understanding & Reasoning

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Towards model-independent identification of lensed gravitational waves using Kramers-Kronig relation

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)

SimDiff: Simulator-constrained Diffusion Model for Physically Plausible Motion Generation

Quantum Error Correction near the Coding Theoretical Bound

Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages

Events

AI for Law

Personalize Your Feed