Washington University in Saint Louis
A major effort in modern high-dimensional statistics has been devoted to the analysis of linear predictors trained on nonlinear feature embeddings via empirical risk minimization (ERM). Gaussian equivalence theory (GET) has emerged as a powerful universality principle in this context: it states that the behavior of high-dimensional, complex features can be captured by Gaussian surrogates, which are more amenable to analysis. Despite its remarkable successes, numerical experiments show that this equivalence can fail even for simple embeddings -- such as polynomial maps -- under general scaling regimes. We investigate this breakdown in the setting of random feature (RF) models in the quadratic scaling regime, where both the number of features and the sample size grow quadratically with the data dimension. We show that when the target function depends on a low-dimensional projection of the data, such as generalized linear models, GET yields incorrect predictions. To capture the correct asymptotics, we introduce a Conditional Gaussian Equivalent (CGE) model, which can be viewed as appending a low-dimensional non-Gaussian component to an otherwise high-dimensional Gaussian model. This hybrid model retains the tractability of the Gaussian framework and accurately describes RF models in the quadratic scaling regime. We derive sharp asymptotics for the training and test errors in this setting, which continue to agree with numerical simulations even when GET fails. Our analysis combines general results on CLT for Wiener chaos expansions and a careful two-phase Lindeberg swapping argument. Beyond RF models and quadratic scaling, our work hints at a rich landscape of universality phenomena in high-dimensional ERM.
We study denoising of a third-order tensor when the ground-truth tensor is not necessarily Tucker low-rank. Specifically, we observe Y=X+ZRp1×p2×p3, Y=X^\ast+Z\in \mathbb{R}^{p_{1} \times p_{2} \times p_{3}}, where XX^\ast is the ground-truth tensor, and ZZ is the noise tensor. We propose a simple variant of the higher-order tensor SVD estimator X~\widetilde{X}. We show that uniformly over all user-specified Tucker ranks (r1,r2,r3)(r_{1},r_{2},r_{3}), X~XF2=O(κ2{r1r2r3+k=13pkrk}  +  ξ(r1,r2,r3)2) with high probability. \| \widetilde{X} - X^* \|_{ \mathrm{F}}^2 = O \Big( \kappa^2 \Big\{ r_{1}r_{2}r_{3}+\sum_{k=1}^{3} p_{k} r_{k} \Big\} \; + \; \xi_{(r_{1},r_{2},r_{3})}^2\Big) \quad \text{ with high probability.} Here, the bias term ξ(r1,r2,r3)\xi_{(r_1,r_2,r_3)} corresponds to the best achievable approximation error of XX^\ast over the class of tensors with Tucker ranks (r1,r2,r3)(r_1,r_2,r_3); κ2\kappa^2 quantifies the noise level; and the variance term κ2{r1r2r3+k=13pkrk}\kappa^2 \{r_{1}r_{2}r_{3}+\sum_{k=1}^{3} p_{k} r_{k}\} scales with the effective number of free parameters in the estimator X~\widetilde{X}. Our analysis achieves a clean rank-adaptive bias--variance tradeoff: as we increase the ranks of estimator X~\widetilde{X}, the bias ξ(r1,r2,r3)\xi(r_{1},r_{2},r_{3}) decreases and the variance increases. As a byproduct we also obtain a convenient bias-variance decomposition for the vanilla low-rank SVD matrix estimators.
In this paper we present a computation of the rates of strangeness-changing processes and the resultant bulk viscosity in matter at the densities and temperatures typical of neutron star mergers. To deal with the high temperature in this environment we go beyond the Fermi surface approximation in our rate calculations and numerically evaluate the full phase space integral. We include processes where quarks move between baryons via meson exchange: these have generally been omitted in previous analyses but provide the dominant contribution to the rates of strangeness-changing processes and the bulk viscosity. The calculation of these rates is an essential step towards any calculation of dissipation mechanisms in hyperonic matter in mergers. As one application, we calculate the dissipation times for density oscillations at the frequencies seen in merger simulations. We find that hyperon bulk viscosity for temperatures in the MeV regime can probably be neglected in this context, but becomes highly relevant for keV-range temperatures.
There has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross-variate relationships, it is unclear how to best integrate these two sources of information in the context of the Transformer architecture while optimizing for both performance and efficiency. We re-purpose the Transformer architecture to effectively model both cross-time and cross-variate dependencies. Our approach begins by embedding each variate independently into a variate-wise representation that captures its cross-time dynamics, and then models cross-variate dependencies through attention mechanisms on these learned embeddings. Gating operations in both cross-time and cross-variate modeling phases regulate information flow, allowing the model to focus on the most relevant features for accurate predictions. Our method achieves state-of-the-art performance across 13 real-world datasets and can be seamlessly integrated into other Transformer-based and LLM-based forecasters, delivering performance improvements up to 20.7\% over original models. Code is available at this repository: this https URL.
Stereoelectroencephalography (SEEG) is a neurosurgical method to survey electrophysiological activity within the brain to treat disorders such as Epilepsy. In this stereotactic approach, leads are implanted through straight trajectories to survey both cortical and sub-cortical activity. Visualizing the recorded locations covering sulcal and gyral activity while staying true to the cortical architecture is challenging due to the folded, three-dimensional nature of the human cortex. To overcome this challenge, we developed a novel visualization concept, allowing investigators to dynamically morph between the subjects' cortical reconstruction and an inflated cortex representation. This inflated view, in which gyri and sulci are viewed on a smooth surface, allows better visualization of electrodes buried within the sulcus while staying true to the underlying cortical architecture.
Exploring the equation of state of dense matter is an essential part of interpreting the observable properties of neutron stars. We present here the first results for dense matter in the zero-temperature limit generated by the MUSES Calculation Engine, a composable workflow management system that orchestrates calculation and data processing stages comprising a collection of software modules designed within the MUSES framework. The modules presented in this work calculate equations of state using algorithms spanning three different theories/models: (1) Crust Density Functional Theory, valid starting at low densities, (2) Chiral Effective Field Theory, valid around saturation density, and (3) the Chiral Mean Field model, valid beyond saturation density. Lepton contributions are added through the Lepton module to each equation of state, ensuring charge neutrality and the possibility of β\beta-equilibrium. Using the Synthesis module, we match the three equations of state using different thermodynamic variables and different methods. We then couple the complete equation of state to a novel full-general-relativity solver (QLIMR) module that calculates neutron star properties. We find that the matching performed using different thermodynamic variables affects differently the range obtained for neutron star masses and radii (although never beyond a few percent difference). We also investigate the universality of equation of state-independent relations for our matched stars. Finally, for the first time, we use the Flavor Equilibration module to estimate bulk viscosity and flavor relaxation charge fraction and rates (at low temperature) for Chiral Effective Field Theory and the Chiral Mean Field model.
The performance of Large Language Models (LLMs) on multiple-choice question (MCQ) benchmarks is frequently cited as proof of their medical capabilities. We hypothesized that LLM performance on medical MCQs may in part be illusory and driven by factors beyond medical content knowledge and reasoning capabilities. To assess this, we created a novel benchmark of free-response questions with paired MCQs (FreeMedQA). Using this benchmark, we evaluated three state-of-the-art LLMs (GPT-4o, GPT-3.5, and LLama-3-70B-instruct) and found an average absolute deterioration of 39.43% in performance on free-response questions relative to multiple-choice (p = 1.3 * 10-5) which was greater than the human performance decline of 22.29%. To isolate the role of the MCQ format on performance, we performed a masking study, iteratively masking out parts of the question stem. At 100% masking, the average LLM multiple-choice performance was 6.70% greater than random chance (p = 0.002) with one LLM (GPT-4o) obtaining an accuracy of 37.34%. Notably, for all LLMs the free-response performance was near zero. Our results highlight the shortcomings in medical MCQ benchmarks for overestimating the capabilities of LLMs in medicine, and, broadly, the potential for improving both human and machine assessments using LLM-evaluated free-response questions.
Decision-making under uncertainty is a fundamental problem encountered frequently and can be formulated as a stochastic multi-armed bandit problem. In the problem, the learner interacts with an environment by choosing an action at each round, where a round is an instance of an interaction. In response, the environment reveals a reward, which is sampled from a stochastic process, to the learner. The goal of the learner is to maximize cumulative reward. In this work, we assume that the rewards are the inner product of an action vector and a state vector generated by a linear Gaussian dynamical system. To predict the reward for each action, we propose a method that takes a linear combination of previously observed rewards for predicting each action's next reward. We show that, regardless of the sequence of previous actions chosen, the reward sampled for any previously chosen action can be used for predicting another action's future reward, i.e. the reward sampled for action 1 at round t1t-1 can be used for predicting the reward for action 22 at round tt. This is accomplished by designing a modified Kalman filter with a matrix representation that can be learned for reward prediction. Numerical evaluations are carried out on a set of linear Gaussian dynamical systems and are compared with 2 other well-known stochastic multi-armed bandit algorithms.
There remains an important need for the development of image reconstruction methods that can produce diagnostically useful images from undersampled measurements. In magnetic resonance imaging (MRI), for example, such methods can facilitate reductions in data-acquisition times. Deep learning-based methods hold potential for learning object priors or constraints that can serve to mitigate the effects of data-incompleteness on image reconstruction. One line of emerging research involves formulating an optimization-based reconstruction method in the latent space of a generative deep neural network. However, when generative adversarial networks (GANs) are employed, such methods can result in image reconstruction errors if the sought-after solution does not reside within the range of the GAN. To circumvent this problem, in this work, a framework for reconstructing images from incomplete measurements is proposed that is formulated in the latent space of invertible neural network-based generative models. A novel regularization strategy is introduced that takes advantage of the multiscale architecture of certain invertible neural networks, which can result in improved reconstruction performance over classical methods in terms of traditional metrics. The proposed method is investigated for reconstructing images from undersampled MRI data. The method is shown to achieve comparable performance to a state-of-the-art generative model-based reconstruction method while benefiting from a deterministic reconstruction procedure and easier control over regularization parameters.
Flavor-changing charged current ("Urca") processes are of central importance in the astrophysics of neutron stars. Standard calculations approximate the Urca rate as the sum of two contributions, direct Urca and modified Urca. Attempts to make modified Urca calculations more accurate have been impeded by an unphysical divergence at the direct Urca threshold density. In this paper we describe a systematically improvable approach where, in the simplest approximation, instead of modified Urca we include an imaginary part of the nucleon mass (nucleon width). The total Urca rate is then obtained via a straightforward generalization of the direct Urca calculation, yielding results that agree with both direct and modified Urca at the densities where those approximations are valid. At low densities, we observe an enhancement of the rate by more than an order of magnitude, with important ramifications for neutron star cooling and other transport properties.
This article surveys blockchain-based approaches for several security services. These services include authentication, confidentiality, privacy, and access control list (ACL), data and resource provenance, and integrity assurance. All these services are critical for the current distributed applications, especially due to the large amount of data being processed over the networks and the use of cloud computing. Authentication ensures that the user is who he/she claims to be. Confidentiality guarantees that data cannot be read by unauthorized users. Privacy provides the users the ability to control who can access their data. Provenance allows an efficient tracking of the data and resources along with their ownership and utilization over the network. Integrity helps in verifying that the data has not been modified or altered. These services are currently managed by centralized controllers, for example, a certificate authority. Therefore, the services are prone to attacks on the centralized controller. On the other hand, blockchain is a secured and distributed ledger that can help resolve many of the problems with centralization. The objectives of this paper are to give insights on the use of security services for current applications, to highlight the state of the art techniques that are currently used to provide these services, to describe their challenges, and to discuss how the blockchain technology can resolve these challenges. Further, several blockchain-based approaches providing such security services are compared thoroughly. Challenges associated with using blockchain-based security services are also discussed to spur further research in this area.
Sakharov's 1967 notion of ``induced gravity'' is currently enjoying a significant resurgence. The basic idea, originally presented in a very brief 3-page paper with a total of 4 formulas, is that gravity is not ``fundamental'' in the sense of particle physics. Instead it was argued that gravity (general relativity) emerges from quantum field theory in roughly the same sense that hydrodynamics or continuum elasticity theory emerges from molecular physics. In this article I will translate the key ideas into modern language, and explain the various versions of Sakharov's idea currently on the market.
An algorithmic framework, MedG–KRP, was developed to generate and evaluate medical knowledge graphs from large language models, structured around single concepts. It demonstrates that generalist LLMs show stronger human-rated causal accuracy, while specialized medical LLMs align more closely with structured biomedical ontologies in terms of precision and recall.
The observation of neutrino oscillations and hence non-zero neutrino masses provided a milestone in the search for physics beyond the Standard Model. But even though we now know that neutrinos are massive, the nature of neutrino masses, i.e., whether they are Dirac or Majorana, remains an open question. A smoking-gun signature of Majorana neutrinos is the observation of neutrinoless double-beta decay, a process that violates the lepton-number conservation of the Standard Model. This white paper focuses on the theoretical aspects of the neutrinoless double-beta decay program and lays out a roadmap for future developments. The roadmap is a multi-scale path starting from high-energy models of neutrinoless double-beta decay all the way to the low-energy nuclear many-body problem that needs to be solved to supplement measurements of the decay rate. The path goes through a systematic effective-field-theory description of the underlying processes at various scales and needs to be supplemented by lattice quantum chromodynamics input. The white paper also discusses the interplay between neutrinoless double-beta decay, experiments at the Large Hadron Collider and results from astrophysics and cosmology in probing simplified models of lepton-number violation at the TeV scale, and the generation of the matter-antimatter asymmetry via leptogenesis. This white paper is prepared for the topical groups TF11 (Theory of Neutrino Physics), TF05 (Lattice Gauge Theory), RF04 (Baryon and Lepton Number Violating Processes), NF03 (Beyond the Standard Model) and NF05 (Neutrino Properties) within the Theory Frontier, Rare Processes and Precision Frontier, and Neutrino Physics Frontier of the U.S. Community Study on the Future of Particle Physics (Snowmass 2021).
Intrusion detection systems (IDS) for the Internet of Things (IoT) systems can use AI-based models to ensure secure communications. IoT systems tend to have many connected devices producing massive amounts of data with high dimensionality, which requires complex models. Complex models have notorious problems such as overfitting, low interpretability, and high computational complexity. Adding model complexity penalty (i.e., regularization) can ease overfitting, but it barely helps interpretability and computational efficiency. Feature engineering can solve these issues; hence, it has become critical for IDS in large-scale IoT systems to reduce the size and dimensionality of data, resulting in less complex models with excellent performance, smaller data storage, and fast detection. This paper proposes a new feature engineering method called LEMDA (Light feature Engineering based on the Mean Decrease in Accuracy). LEMDA applies exponential decay and an optional sensitivity factor to select and create the most informative features. The proposed method has been evaluated and compared to other feature engineering methods using three IoT datasets and four AI/ML models. The results show that LEMDA improves the F1 score performance of all the IDS models by an average of 34% and reduces the average training and detection times in most cases.
Leading vision-language models (VLMs) are trained on general Internet content, overlooking scientific journals' rich, domain-specific knowledge. Training on specialty-specific literature could yield high-performance, task-specific tools, enabling generative AI to match generalist models in specialty publishing, educational, and clinical tasks. We created NeuroPubs, a multimodal dataset of 23,000 Neurosurgery Publications articles (134M words, 78K image-caption pairs). Using NeuroPubs, VLMs generated publication-ready graphical abstracts (70% of 100 abstracts) and board-style questions indistinguishable from human-written ones (54% of 89,587 questions). We used these questions to train CNS-Obsidian, a 34B-parameter VLM. In a blinded, randomized controlled trial, our model demonstrated non-inferiority to then state-of-the-art GPT-4o in neurosurgical differential diagnosis (clinical utility, 40.62% upvotes vs. 57.89%, p=0.1150; accuracy, 59.38% vs. 65.79%, p=0.3797). Our pilot study demonstrates how training generative AI models on specialty-specific journal content - without large-scale internet data - results in high-performance academic and clinical tools, enabling domain-tailored AI across diverse fields.
Accurate deformable 4-dimensional (4D) (3-dimensional in space and time) medical images registration is essential in a variety of medical applications. Deep learning-based methods have recently gained popularity in this area for the significant lower inference time. However, they suffer from drawbacks of non-optimal accuracy and the requirement of a large amount of training data. A new method named GroupRegNet is proposed to address both limitations. The deformation fields to warp all images in the group into a common template is obtained through one-shot learning. The use of the implicit template reduces bias and accumulated error associated with the specified reference image. The one-shot learning strategy is similar to the conventional iterative optimization method but the motion model and parameters are replaced with a convolutional neural network (CNN) and the weights of the network. GroupRegNet also features a simpler network design and a more straightforward registration process, which eliminates the need to break up the input image into patches. The proposed method was quantitatively evaluated on two public respiratory-binned 4D-CT datasets. The results suggest that GroupRegNet outperforms the latest published deep learning-based methods and is comparable to the top conventional method pTVreg. To facilitate future research, the source code is available at this https URL
This study analyzes crop yield prediction in India from 1997 to 2020, focusing on various crops and key environmental factors. It aims to predict agricultural yields by utilizing advanced machine learning techniques like Linear Regression, Decision Tree, KNN, Naïve Bayes, K-Mean Clustering, and Random Forest. The models, particularly Naïve Bayes and Random Forest, demonstrate high effectiveness, as shown through data visualizations. The research concludes that integrating these analytical methods significantly enhances the accuracy and reliability of crop yield predictions, offering vital contributions to agricultural data science.
1
Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in developing general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registration (BraTS-Reg) challenge, as the first public benchmark environment for deformable registration algorithms focusing on estimating correspondences between pre-operative and follow-up scans of the same patient diagnosed with a diffuse brain glioma. The BraTS-Reg data comprise de-identified multi-institutional multi-parametric MRI (mpMRI) scans, curated for size and resolution according to a canonical anatomical template, and divided into training, validation, and testing sets. Clinical experts annotated ground truth (GT) landmark points of anatomical locations distinct across the temporal domain. Quantitative evaluation and ranking were based on the Median Euclidean Error (MEE), Robustness, and the determinant of the Jacobian of the displacement field. The top-ranked methodologies yielded similar performance across all evaluation metrics and shared several methodological commonalities, including pre-alignment, deep neural networks, inverse consistency analysis, and test-time instance optimization per-case basis as a post-processing step. The top-ranked method attained the MEE at or below that of the inter-rater variability for approximately 60% of the evaluated landmarks, underscoring the scope for further accuracy and robustness improvements, especially relative to human experts. The aim of BraTS-Reg is to continue to serve as an active resource for research, with the data and online evaluation tools accessible at this https URL
Learning-in-memory (LIM) is a recently proposed paradigm to overcome fundamental memory bottlenecks in training machine learning systems. While compute-in-memory (CIM) approaches can address the so-called memory-wall (i.e. energy dissipated due to repeated memory read access) they are agnostic to the energy dissipated due to repeated memory writes at the precision required for training (the update-wall), and they don't account for the energy dissipated when transferring information between short-term and long-term memories (the consolidation-wall). The LIM paradigm proposes that these bottlenecks, too, can be overcome if the energy barrier of physical memories is adaptively modulated such that the dynamics of memory updates and consolidation match the Lyapunov dynamics of gradient-descent training of an AI model. In this paper, we derive new theoretical lower bounds on energy dissipation when training AI systems using different LIM approaches. The analysis presented here is model-agnostic and highlights the trade-off between energy efficiency and the speed of training. The resulting non-equilibrium energy-efficiency bounds have a similar flavor as that of Landauer's energy-dissipation bounds. We also extend these limits by taking into account the number of floating-point operations (FLOPs) used for training, the size of the AI model, and the precision of the training parameters. Our projections suggest that the energy-dissipation lower-bound to train a brain scale AI system (comprising of 101510^{15} parameters) using LIM is 10810910^8 \sim 10^9 Joules, which is on the same magnitude the Landauer's adiabatic lower-bound and 66 to 77 orders of magnitude lower than the projections obtained using state-of-the-art AI accelerator hardware lower-bounds.
There are no more papers matching your filters at the moment.