Matrix models, as quantum mechanical systems without explicit spatial dependence, provide valuable insights into higher-dimensional gauge and gravitational theories, especially within the framework of string theory, where they can describe quantum black holes via the holographic principle. Simulating these models allows for exploration of their kinematic and dynamic properties, particularly in parameter regimes that are analytically intractable. In this study, we examine the potential of tensor network techniques for such simulations. Specifically, we construct ground states as matrix product states and analyse features such as their entanglement structure.
1
Google DeepMind logoGoogle DeepMindUniversity of Illinois at Urbana-Champaign logoUniversity of Illinois at Urbana-ChampaignUniversity of FreiburgCarnegie Mellon University logoCarnegie Mellon UniversityImperial College London logoImperial College LondonUniversity of Southern California logoUniversity of Southern CaliforniaNew York University logoNew York UniversityShanghai Jiao Tong University logoShanghai Jiao Tong Universitythe University of Tokyo logothe University of TokyoStanford University logoStanford UniversityThe University of Texas at Austin logoThe University of Texas at AustinUniversity of Technology NurembergETH Zürich logoETH ZürichUniversity of California, San Diego logoUniversity of California, San DiegoRIKEN logoRIKENGoogle Research logoGoogle ResearchColumbia University logoColumbia UniversityArizona State University logoArizona State UniversityurichGerman Aerospace CenterIstituto Italiano di TecnologiaMax Planck InstituteQueensland University of Technologyat DarmstadtKorea Advanced Institute of Science & TechnologyIntrinsic LLCFlexiv RoboticsTechnische Universit
·
The OpenX-Embodiment Collaboration released the Open X-Embodiment (OXE) Dataset, a consolidated collection of over 1 million real robot trajectories from 22 embodiments. This work demonstrates that large RT-X models trained on such diverse data achieve positive transfer and emergent skills across different robot platforms.
226
This study provides a theoretical foundation for the Representation Misdirection for Unlearning (RMU) method, explaining its impact on token confidence and adversarial robustness. It introduces Adaptive RMU, which overcomes the original method's ineffectiveness in deeper LLM layers by dynamically adjusting the unlearning target, leading to improved and more consistent unlearning performance across various layers.
1
Researchers from JAIST, VNU-UET, Monash University, and RIKEN demonstrate that unlearned Large Language Models exhibit fragility, misbehaving when benign queries inadvertently contain forget-tokens. They introduce Random Noise Augmentation (RNA), a solution that recovers an average of 66.3% and 51.7% accuracy for Representation Misdirection and Preference Optimization unlearning methods, respectively, on perturbed evaluation tasks while preserving core model performance.
1
Researchers at Sakana AI and collaborating institutions introduced Temporally Adaptive Interpolated Distillation (TAID), a method that dynamically interpolates student and teacher distributions to overcome challenges in knowledge transfer. This approach enabled the creation of TAID-LLM-1.5B, which achieved a new state-of-the-art score (52.27 on LightEval) among models under 2B parameters, and TAID-VLM-2B, outperforming larger vision-language models.
85
A theoretical framework demonstrates how the quantum metric, a core concept in quantum geometry, modifies Liouville's theorem and the dynamics of chiral kinetic theory, expanding its implications across various physical systems.
Researchers at the Beijing Institute of Technology and collaborators developed Frequency Dynamic Convolution (FDConv), a method for adaptive deep learning models that constructs diverse convolution kernel weights directly in the Fourier domain. FDConv achieves competitive or superior performance across object detection, instance segmentation, and semantic segmentation benchmarks while significantly reducing the parameter overhead compared to prior dynamic convolution techniques.
7
A similarity-based method leverages penultimate feature representations to detect and rectify noisy labels in deep learning datasets. This post-hoc, model-agnostic approach demonstrates superior robustness compared to confidence and gradient-based methods, particularly against systematic ambiguity and concentrated noise, leading to improved model generalization.
SLTrain proposes a reparameterization for pretraining large language models (LLMs) by representing weight matrices as a sum of low-rank and sparse components. This method, developed by researchers from RIKEN AIP, University of Minnesota, and Microsoft, reduces memory requirements by up to 73% for LLaMA 7B models and halves trainable parameters for LLaMA 1B while maintaining perplexity performance comparable to full-rank pretraining.
30
We introduce a hybrid quantum-classical framework for efficiently implementing approximate unitary dilations of non-unitary operators with enhanced noise resilience. The method embeds a target non-unitary operator into a subblock of a unitary matrix generated by a parameterized quantum circuit with universal expressivity, while a classical optimizer adjusts circuit parameters under the global unitary constraint. As a representative application, we consider the non-unitary propagator of a Lindbladian superoperator acting on the vectorized density matrix, which is relevant for simulating open quantum systems. We further validate the approach experimentally on superconducting devices in the Quafu quantum cloud computing cluster. Compared with standard dilation protocols, our method significantly reduces quantum resource requirements and improves robustness against device noise, achieving high-fidelity simulation. Its generality also enables compatibility with non-Markovian dynamics and Kraus-operator-based evolutions, providing a practical pathway for the noise-resilient simulation of non-unitary processes on near-term quantum hardware.
PromptKD introduces an unsupervised prompt distillation framework for Vision-Language Models like CLIP, enabling knowledge transfer from large teachers to lightweight students using unlabeled domain images. It achieves state-of-the-art performance, improving harmonic mean accuracy by 3.76% over PromptSRC across 11 datasets, while significantly reducing inference costs by pre-storing teacher text features.
268
The Mixture of Experts (MoE) architecture reduces the training and inference cost significantly compared to a dense model of equivalent capacity. Upcycling is an approach that initializes and trains an MoE model using a pre-trained dense model. While upcycling leads to initial performance gains, the training progresses slower than when trained from scratch, leading to suboptimal performance in the long term. We propose Drop-Upcycling - a method that effectively addresses this problem. Drop-Upcycling combines two seemingly contradictory approaches: utilizing the knowledge of pre-trained dense models while statistically re-initializing some parts of the weights. This approach strategically promotes expert specialization, significantly enhancing the MoE model's efficiency in knowledge acquisition. Extensive large-scale experiments demonstrate that Drop-Upcycling significantly outperforms previous MoE construction methods in the long term, specifically when training on hundreds of billions of tokens or more. As a result, our MoE model with 5.9B active parameters achieves comparable performance to a 13B dense model in the same model family, while requiring approximately 1/4 of the training FLOPs. All experimental resources, including source code, training data, model checkpoints and logs, are publicly available to promote reproducibility and future research on MoE.
Simulating response properties of molecules is crucial for interpreting experimental spectroscopies and accelerating materials design. However, it remains a long-standing computational challenge for electronic structure methods on classical computers. While quantum computers hold the promise to solve this problem more efficiently in the long run, existing quantum algorithms requiring deep quantum circuits are infeasible for near-term noisy quantum processors. Here, we introduce a pragmatic variational quantum response (VQR) algorithm for response properties, which circumvents the need for deep quantum circuits. Using this algorithm, we report the first simulation of linear response properties of molecules including dynamic polarizabilities and absorption spectra on a superconducting quantum processor. Our results indicate that a large class of important dynamical properties such as Green's functions are within the reach of near-term quantum hardware using this algorithm in combination with suitable error mitigation techniques.
This work establishes a classification for the typical entanglement spectra of symmetric quantum states by providing a physical realization for the Laguerre Symplectic Ensemble (LSE) using the concept of symmetry fractionalization. It demonstrates that the entanglement spectrum of any such state universally decomposes into blocks governed by the three fundamental Laguerre ensembles, based on properties of the symmetry group's irreducible representations.
This work introduces 3D Question Answering (3D-QA), a task where models answer free-form questions about 3D indoor scenes and localize relevant objects. Researchers at Kyoto University, ATR, and RIKEN AIP created ScanQA, a large-scale human-annotated dataset of over 41,000 question-answer pairs with 3D object groundings, and proposed an end-to-end baseline model that outperforms 2D and pipeline-based 3D approaches.
141
The paper establishes a holographic duality for the quantum work distribution and the Tasaki-Crooks fluctuation theorem (TC-FT) within the AdS/CFT correspondence. It provides a bulk prescription for non-equilibrium work by mapping boundary characteristic functions to gravitational Schwinger-Keldysh path integrals, verifying the TC-FT in a perturbative AdS3/CFT2 model where average work correlates with Brown-York energy changes.
In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.
Why and when is deep better than shallow? We answer this question in a framework that is agnostic to network implementation. We formulate a deep model as an abstract state-transition semigroup acting on a general metric space, and separate the implementation (e.g., ReLU nets, transformers, and chain-of-thought) from the abstract state transition. We prove a bias-variance decomposition in which the variance depends only on the abstract depth-kk network and not on the implementation (Theorem 1). We further split the bounds into output and hidden parts to tie the depth dependence of the variance to the metric entropy of the state-transition semigroup (Theorem 2). We then investigate implementation-free conditions under which the variance grow polynomially or logarithmically with depth (Section 4). Combining these with exponential or polynomial bias decay identifies four canonical bias-variance trade-off regimes (EL/EP/PL/PP) and produces explicit optimal depths kk^\ast. Across regimes, k>1k^\ast>1 typically holds, giving a rigorous form of depth supremacy. The lowest generalization error bound is achieved under the EL regime (exp-decay bias + log-growth variance), explaining why and when deep is better, especially for iterative or hierarchical concept classes such as neural ODEs, diffusion/score models, and chain-of-thought reasoning.
Dilated convolution, which expands the receptive field by inserting gaps between its consecutive elements, is widely employed in computer vision. In this study, we propose three strategies to improve individual phases of dilated convolution from the view of spectrum analysis. Departing from the conventional practice of fixing a global dilation rate as a hyperparameter, we introduce Frequency-Adaptive Dilated Convolution (FADC), which dynamically adjusts dilation rates spatially based on local frequency components. Subsequently, we design two plug-in modules to directly enhance effective bandwidth and receptive field size. The Adaptive Kernel (AdaKern) module decomposes convolution weights into low-frequency and high-frequency components, dynamically adjusting the ratio between these components on a per-channel basis. By increasing the high-frequency part of convolution weights, AdaKern captures more high-frequency components, thereby improving effective bandwidth. The Frequency Selection (FreqSelect) module optimally balances high- and low-frequency components in feature representations through spatially variant reweighting. It suppresses high frequencies in the background to encourage FADC to learn a larger dilation, thereby increasing the receptive field for an expanded scope. Extensive experiments on segmentation and object detection consistently validate the efficacy of our approach. The code is publicly available at this https URL
130
There are no more papers matching your filters at the moment.