Oak Ridge National Laboratory
Researchers systematically investigated how initial experimental choices and targeted in-loop interventions affect the learning dynamics of Deep Kernel Learning (DKL) in autonomous Scanning Probe Microscopy, focusing on "exploratory stagnation." They found that while initial sampling impacts early DKL uncertainty, dynamic interventions, such as regional exclusion or prioritizing within the latent space, are crucial for escaping stagnation, with their success depending on the chosen acquisition function.
Topological Hall effect (THE) is a hallmark of scalar spin chirality, which is found in static skyrmion lattices. Recent theoretical works have shown that scalar spin chirality could also emerge dynamically from thermal spin fluctuations. Evidence of such a mechanism was found in the kagome magnet YMn6Sn6 where fluctuations arise from frustrated exchange interactions between Mn kagome layers. In YMn6Sn6, the rare-earth ion Y3+ is non-magnetic. When it is replaced by a magnetic ion (Gd3+-Ho3+), the intrinsically antiferromagnetic Mn-Mn interlayer coupling is overwhelmed by the indirect ferromagnetic Mn-R-Mn one, relieving frustration. This generates interesting anomalous Hall conductivity, but not THE. Here we show that Er lies in an intermediate regime where direct and indirect interactions closely compete, so that ErMn6Sn6 can switch from one regime to the other by temperature, i.e., from a collinear ferrimagnetic ground state to a spiral antiferromagnet at 78 K. The AFM phase forms a dome in the temperature-field phase diagram. Close to the boundary of this dome, we find a sizable fluctuations-driven THE, thus underscoring the universality of this chiral fluctuation mechanism for generating non-zero scalar spin chirality.
Artificial Intelligence is reshaping America's \9.4trillionlabormarket,withcascadingeffectsthatextendfarbeyondvisibletechnologysectors.WhenAItransformsqualitycontroltasksinautomotiveplants,consequencesspreadthroughlogisticsnetworks,supplychains,andlocalserviceeconomies.Yettraditionalworkforcemetricscannotcapturetheserippleeffects:theymeasureemploymentoutcomesafterdisruptionoccurs,notwhereAIcapabilitiesoverlapwithhumanskillsbeforeadoptioncrystallizes.ProjectIcebergaddressesthisgapusingLargePopulationModelstosimulatethehumanAIlabormarket,representing151millionworkersasautonomousagentsexecutingover32,000skillsandinteractingwiththousandsofAItools.ItintroducestheIcebergIndex,askillscenteredmetricthatmeasuresthewagevalueofskillsAIsystemscanperformwithineachoccupation.TheIndexcapturestechnicalexposure,whereAIcanperformoccupationaltasks,notdisplacementoutcomesoradoptiontimelines.AnalysisshowsthatvisibleAIadoptionconcentratedincomputingandtechnology(2.29.4 trillion labor market, with cascading effects that extend far beyond visible technology sectors. When AI transforms quality control tasks in automotive plants, consequences spread through logistics networks, supply chains, and local service economies. Yet traditional workforce metrics cannot capture these ripple effects: they measure employment outcomes after disruption occurs, not where AI capabilities overlap with human skills before adoption crystallizes. Project Iceberg addresses this gap using Large Population Models to simulate the human-AI labor market, representing 151 million workers as autonomous agents executing over 32,000 skills and interacting with thousands of AI tools. It introduces the Iceberg Index, a skills-centered metric that measures the wage value of skills AI systems can perform within each occupation. The Index captures technical exposure, where AI can perform occupational tasks, not displacement outcomes or adoption timelines. Analysis shows that visible AI adoption concentrated in computing and technology (2.2% of wage value, approx \211 billion) represents only the tip of the iceberg. Technical capability extends far below the surface through cognitive automation spanning administrative, financial, and professional services (11.7%, approx \$1.2 trillion). This exposure is fivefold larger and geographically distributed across all states rather than confined to coastal hubs. Traditional indicators such as GDP, income, and unemployment explain less than 5% of this skills-based variation, underscoring why new indices are needed to capture exposure in the AI economy. By simulating how these capabilities may spread under scenarios, Iceberg enables policymakers and business leaders to identify exposure hotspots, prioritize investments, and test interventions before committing billions to implementation
NVIDIA, in collaboration with several national labs and universities, introduces NVQLink, a platform architecture designed for tightly integrating high-performance classical computing with quantum processors. This work demonstrates sub-4 microsecond latency for the critical interconnect and extends the CUDA-Q programming model to enable real-time, online classical control workloads essential for scalable quantum error correction and QPU operation.
We propose an ensemble score filter (EnSF) for solving high-dimensional nonlinear filtering problems with superior accuracy. A major drawback of existing filtering methods, e.g., particle filters or ensemble Kalman filters, is the low accuracy in handling high-dimensional and highly nonlinear problems. EnSF attacks this challenge by exploiting the score-based diffusion model, defined in a pseudo-temporal domain, to characterizing the evolution of the filtering density. EnSF stores the information of the recursively updated filtering density function in the score function, instead of storing the information in a set of finite Monte Carlo samples (used in particle filters and ensemble Kalman filters). Unlike existing diffusion models that train neural networks to approximate the score function, we develop a training-free score estimation that uses a mini-batch-based Monte Carlo estimator to directly approximate the score function at any pseudo-spatial-temporal location, which provides sufficient accuracy in solving high-dimensional nonlinear problems as well as saves a tremendous amount of time spent on training neural networks. High-dimensional Lorenz-96 systems are used to demonstrate the performance of our method. EnSF provides surprising performance, compared with the state-of-the-art Local Ensemble Transform Kalman Filter method, in reliably and efficiently tracking extremely high-dimensional Lorenz systems (up to 1,000,000 dimensions) with highly nonlinear observation processes.
9
·
A dataset and framework called HypoGen enables training of large language models for scientific hypothesis generation through structured paper data extraction, achieving comparable feasibility scores to human-generated hypotheses while providing explainable chains of reasoning derived from academic publications.
This survey establishes a unified framework for Prompt-based Adaptation (PA) in large-scale vision models, meticulously distinguishing between Visual Prompting (VP) and Visual Prompt Tuning (VPT) based on injection granularity and generation mechanisms. It provides a comprehensive overview of methodologies, practical efficiencies, diverse applications, and theoretical underpinnings, aiming to standardize terminology and guide future research.
2
Deployment of neural networks on resource-constrained devices demands models that are both compact and robust to adversarial inputs. However, compression and adversarial robustness often conflict. In this work, we introduce a dynamical low-rank training scheme enhanced with a novel spectral regularizer that controls the condition number of the low-rank core in each layer. This approach mitigates the sensitivity of compressed models to adversarial perturbations without sacrificing accuracy on clean data. The method is model- and data-agnostic, computationally efficient, and supports rank adaptivity to automatically compress the network at hand. Extensive experiments across standard architectures, datasets, and adversarial attacks show the regularized networks can achieve over 94% compression while recovering or improving adversarial accuracy relative to uncompressed baselines.
Hybrid quantum-high performance computing (Q-HPC) workflows are emerging as a key strategy for running quantum applications at scale in current noisy intermediate-scale quantum (NISQ) devices. These workflows must operate seamlessly across diverse simulators and hardware backends since no single simulator offers the best performance for every circuit type. Simulation efficiency depends strongly on circuit structure, entanglement, and depth, making a flexible and backend-agnostic execution model essential for fair benchmarking, informed platform selection, and ultimately the identification of quantum advantage opportunities. In this work, we extend the Quantum Framework (QFw), a modular and HPC-aware orchestration layer, to integrate multiple local backends (Qiskit Aer, NWQ-Sim, QTensor, and TN-QVM) and a cloud-based quantum backend (IonQ) under a unified interface. Using this integration, we execute a number of non-variational as well as variational workloads. The results highlight workload-specific backend advantages: while Qiskit Aer's matrix product state excels for large Ising models, NWQ-Sim not only leads on large-scale entanglement and Hamiltonian but also shows the benefits of concurrent subproblem execution in a distributed manner for optimization problems. These findings demonstrate that simulator-agnostic, HPC-aware orchestration is a practical path toward scalable, reproducible, and portable Q-HPC ecosystems, thereby accelerating progress toward demonstrating quantum advantage.
We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM's Multi-Level Intermediate Representation (MLIR) compiler infrastructure, Mojo aims to close performance and productivity gaps by combining Python's interoperability and CUDA-like syntax for compile-time portable GPU programming. We target four scientific workloads: a seven-point stencil (memory-bound), BabelStream (memory-bound), miniBUDE (compute-bound), and Hartree-Fock (compute-bound with atomic operations); and compare their performance against vendor baselines on NVIDIA H100 and AMD MI300A GPUs. We show that Mojo's performance is competitive with CUDA and HIP for memory-bound kernels, whereas gaps exist on AMD GPUs for atomic operations and for fast-math compute-bound kernels on both AMD and NVIDIA GPUs. Although the learning curve and programming requirements are still fairly low-level, Mojo can close significant gaps in the fragmented Python ecosystem in the convergence of scientific computing and AI.
Emerging expert-specialized Mixture-of-Experts (MoE) architectures, such as DeepSeek-MoE, deliver strong model quality through fine-grained expert segmentation and large top-k routing. However, their scalability is limited by substantial activation memory overhead and costly all-to-all communication. Furthermore, current MoE training systems - primarily optimized for NVIDIA GPUs - perform suboptimally on non-NVIDIA platforms, leaving significant computational potential untapped. In this work, we present X-MoE, a novel MoE training system designed to deliver scalable training performance for next-generation MoE architectures. X-MoE achieves this via several novel techniques, including efficient padding-free MoE training with cross-platform kernels, redundancy-bypassing dispatch, and hybrid parallelism with sequence-sharded MoE blocks. Our evaluation on the Frontier supercomputer, powered by AMD MI250X GPUs, shows that X-MoE scales DeepSeek-style MoEs up to 545 billion parameters across 1024 GPUs - 10x larger than the largest trainable model with existing methods under the same hardware budget, while maintaining high training throughput. The source code of X-MoE is available at this https URL.
14
Approximating the ground state of many-body systems is a key computational bottleneck underlying important applications in physics and chemistry. The most widely known quantum algorithm for ground state approximation, quantum phase estimation, is out of reach of current quantum processors due to its high circuit-depths. Subspace-based quantum diagonalization methods offer a viable alternative for pre- and early-fault-tolerant quantum computers. Here, we introduce a quantum diagonalization algorithm which combines two key ideas on quantum subspaces: a classical diagonalization based on quantum samples, and subspaces constructed with quantum Krylov states. We prove that our algorithm converges in polynomial time under the working assumptions of Krylov quantum diagonalization and sparseness of the ground state. We then demonstrate the scalability of our approach by performing the largest ground-state quantum simulation of impurity models using a Heron quantum processors and the Frontier supercomputer. We consider both the single-impurity Anderson model with 41 bath sites, and a system with 4 impurities and 7 bath sites per impurity. Our results are in excellent agreement with Density Matrix Renormalization Group calculations.
Linear systems arise in generating samples and in calculating observables in lattice quantum chromodynamics~(QCD). Solving the Hermitian positive definite systems, which are sparse but ill-conditioned, involves using iterative methods, such as Conjugate Gradient (CG), which are time-consuming and computationally expensive. Preconditioners can effectively accelerate this process, with the state-of-the-art being multigrid preconditioners. However, constructing useful preconditioners can be challenging, adding additional computational overhead, especially in large linear systems. We propose a framework, leveraging operator learning techniques, to construct linear maps as effective preconditioners. The method in this work does not rely on explicit matrices from either the original linear systems or the produced preconditioners, allowing efficient model training and application in the CG solver. In the context of the Schwinger model U(1) gauge theory in 1+1 spacetime dimensions with two degenerate-mass fermions), this preconditioning scheme effectively decreases the condition number of the linear systems and approximately halves the number of iterations required for convergence in relevant parameter ranges. We further demonstrate the framework learns a general mapping dependent on the lattice structure which leads to zero-shot learning ability for the Dirac operators constructed from gauge field configurations of different sizes.
Hamiltonian simulation is one of the most promising candidates for the demonstration of quantum advantage within the next ten years, and several studies have proposed end-to-end resource estimates for executing such algorithms on fault-tolerant quantum processors. Usually, these resource estimates are based upon the assumption that quantum error correction is implemented using the surface code, and that the best surface code compilation scheme involves serializing input circuits by eliminating all Clifford gates. This transformation is thought to make best use of the native multi-body measurement (lattice surgery) instruction set available to surface codes. Some work, however, has suggested that direct compilation from Clifford+T to lattice surgery operations may be beneficial for circuits that have high degrees of logical parallelism. In this study, we analyze the resource costs for implementing Hamiltonian simulation using example approaches from each of these leading surface code compilation families. The Hamiltonians whose dynamics we consider are those of the transverse-field Ising model in several geometries, the Kitaev honeycomb model, and the αRuCl3\mathrm{\alpha-RuCl_3} complex under a time-varying magnetic field. We show, among other things, that the optimal scheme depends on whether Hamiltonian simulation is implemented using the quantum signal processing or Trotter-Suzuki algorithms, with Trotterization benefiting by orders of magnitude from direct Clifford+T compilation for these applications. Our results suggest that surface code quantum computers should not have a one-size-fits-all compilation scheme, but that smart compilers should predict the optimal scheme based upon high-level quantities from logical circuits such as average circuit density, numbers of logical qubits, and T fraction.
Artificial Intelligence (AI) technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis. Traditionally reliant on manual interpretation and task-specific models, remote sensing research has been significantly enhanced by the advent of foundation models-large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency. This paper provides a comprehensive survey of foundation models in the remote sensing domain. We categorize these models based on their architectures, pre-training datasets, and methodologies. Through detailed performance comparisons, we highlight emerging trends and the significant advancements achieved by those foundation models. Additionally, we discuss technical challenges, practical implications, and future research directions, addressing the need for high-quality data, computational resources, and improved model generalization. Our research also finds that pre-training methods, particularly self-supervised learning techniques like contrastive learning and masked autoencoders, remarkably enhance the performance and robustness of foundation models. This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.
The research from Oak Ridge National Laboratory, University of Tennessee, and UNSW Sydney introduces an agentic system that transforms urban digital twins into autonomous cognitive platforms for optimizing urban freight logistics. It leverages generative AI, multi-agent systems, and a Model Context Protocol to orchestrate scientific tools, achieving optimal intermodal delivery plans for 250 containers in 0.06 seconds and completing the full workflow in under 15 seconds.
Researchers developed Straight Variational Flow Matching (S-VFM), a method that resolves the curvature problem in Flow Matching models by integrating a variational latent code and an explicit straightness objective. This enables the generation of high-quality samples with significantly fewer ODE integration steps, including efficient one-step generation, across various datasets.
1
Identifying useful sorbent materials for direct air capture (DAC) from humid air remains a challenge. We present the Open DAC 2025 (ODAC25) dataset, a significant expansion and improvement upon ODAC23 (Sriram et al., ACS Central Science, 10 (2024) 923), comprising nearly 60 million DFT single-point calculations for CO2_2, H2_2O, N2_2, and O2_2 adsorption in 15,000 MOFs. ODAC25 introduces chemical and configurational diversity through functionalized MOFs, high-energy GCMC-derived placements, and synthetically generated frameworks. ODAC25 also significantly improves upon the accuracy of DFT calculations and the treatment of flexible MOFs in ODAC23. Along with the dataset, we release new state-of-the-art machine-learned interatomic potentials trained on ODAC25 and evaluate them on adsorption energy and Henry's law coefficient predictions.
Sparse observations and coarse-resolution climate models limit effective regional decision-making, underscoring the need for robust downscaling. However, existing AI methods struggle with generalization across variables and geographies and are constrained by the quadratic complexity of Vision Transformer (ViT) self-attention. We introduce ORBIT-2, a scalable foundation model for global, hyper-resolution climate downscaling. ORBIT-2 incorporates two key innovations: (1) Residual Slim ViT (Reslim), a lightweight architecture with residual learning and Bayesian regularization for efficient, robust prediction; and (2) TILES, a tile-wise sequence scaling algorithm that reduces self-attention complexity from quadratic to linear, enabling long-sequence processing and massive parallelism. ORBIT-2 scales to 10 billion parameters across 65,536 GPUs, achieving up to 4.1 exaFLOPS sustained throughput and 74--98% strong scaling efficiency. It supports downscaling to 0.9 km global resolution and processes sequences up to 4.2 billion tokens. On 7 km resolution benchmarks, ORBIT-2 achieves high accuracy with R2R^2 scores in the range of 0.98--0.99 against observational data.
There are no more papers matching your filters at the moment.