Johns Hopkins University School of Medicine
Multimodal survival methods combining gigapixel histology whole-slide images (WSIs) and transcriptomic profiles are particularly promising for patient prognostication and stratification. Current approaches involve tokenizing the WSIs into smaller patches (>10,000 patches) and transcriptomics into gene groups, which are then integrated using a Transformer for predicting outcomes. However, this process generates many tokens, which leads to high memory requirements for computing attention and complicates post-hoc interpretability analyses. Instead, we hypothesize that we can: (1) effectively summarize the morphological content of a WSI by condensing its constituting tokens using morphological prototypes, achieving more than 300x compression; and (2) accurately characterize cellular functions by encoding the transcriptomic profile with biological pathway prototypes, all in an unsupervised fashion. The resulting multimodal tokens are then processed by a fusion network, either with a Transformer or an optimal transport cross-alignment, which now operates with a small and fixed number of tokens without approximations. Extensive evaluation on six cancer types shows that our framework outperforms state-of-the-art methods with much less computation while unlocking new interpretability analyses.
69
Medical imaging plays a pivotal role in modern healthcare, with computed tomography pulmonary angiography (CTPA) being a critical tool for diagnosing pulmonary embolism and other thoracic conditions. However, the complexity of interpreting CTPA scans and generating accurate radiology reports remains a significant challenge. This paper introduces Abn-BLIP (Abnormality-aligned Bootstrapping Language-Image Pretraining), an advanced diagnosis model designed to align abnormal findings to generate the accuracy and comprehensiveness of radiology reports. By leveraging learnable queries and cross-modal attention mechanisms, our model demonstrates superior performance in detecting abnormalities, reducing missed findings, and generating structured reports compared to existing methods. Our experiments show that Abn-BLIP outperforms state-of-the-art medical vision-language models and 3D report generation methods in both accuracy and clinical relevance. These results highlight the potential of integrating multimodal learning strategies for improving radiology reporting. The source code is available at this https URL.
We explore whether survival model performance in underrepresented high- and low-risk subgroups - regions of the prognostic spectrum where clinical decisions are most consequential - can be improved through targeted restructuring of the training dataset. Rather than modifying model architecture, we propose a novel risk-stratified sampling method that addresses imbalances in prognostic subgroup density to support more reliable learning in underrepresented tail strata. We introduce a novel methodology that partitions patients by baseline prognostic risk and applies matching within each stratum to equalize representation across the risk distribution. We implement this framework on a cohort of 1,799 patients with resected colorectal liver metastases (CRLM), including 1,197 who received adjuvant chemotherapy and 602 who did not. All models used in this study are Cox proportional hazards models trained on the same set of selected variables. Model performance is assessed via Harrell's C index, time-dependent AUC, and Integrated Calibration Index (ICI), with internal validation using Efron's bias-corrected bootstrapping. External validation is conducted on two independent CRLM datasets. Cox models trained on risk-balanced cohorts showed consistent improvements in internal validation compared to models trained on the full dataset while noticeably enhancing stratified C-index values in underrepresented high- and low-risk strata of the external cohorts. Our findings suggest that survival model performance in observational oncology cohorts can be meaningfully improved through targeted rebalancing of the training data across prognostic risk strata. This approach offers a practical and model-agnostic complement to existing methods, especially in applications where predictive reliability across the full risk continuum is critical to downstream clinical decisions.
Segmentation is a fundamental problem in surgical scene analysis using artificial intelligence. However, the inherent data scarcity in this domain makes it challenging to adapt traditional segmentation techniques for this task. To tackle this issue, current research employs pretrained models and finetunes them on the given data. Even so, these require training deep networks with millions of parameters every time new data becomes available. A recently published foundation model, Segment-Anything (SAM), generalizes well to a large variety of natural images, hence tackling this challenge to a reasonable extent. However, SAM does not generalize well to the medical domain as is without utilizing a large amount of compute resources for fine-tuning and using task-specific prompts. Moreover, these prompts are in the form of bounding-boxes or foreground/background points that need to be annotated explicitly for every image, making this solution increasingly tedious with higher data size. In this work, we propose AdaptiveSAM - an adaptive modification of SAM that can adjust to new datasets quickly and efficiently, while enabling text-prompted segmentation. For finetuning AdaptiveSAM, we propose an approach called bias-tuning that requires a significantly smaller number of trainable parameters than SAM (less than 2\%). At the same time, AdaptiveSAM requires negligible expert intervention since it uses free-form text as prompt and can segment the object of interest with just the label name as prompt. Our experiments show that AdaptiveSAM outperforms current state-of-the-art methods on various medical imaging datasets including surgery, ultrasound and X-ray. Code is available at this https URL
Physiologically based pharmacokinetic (PBPK) models provide a mechanistic framework for simulating radiopharmaceutical kinetics and estimating patient-specific absorbed doses (ADs). PBPK models incorporate prior knowledge of patient physiology and drug-specific properties, which can enhance the models predictive performance. PBPK models can ultimately be used to predict treatment response and thereby enable theranostic digital twins (TDTs) for personalized treatment planning in radiopharmaceutical therapies (RPTs). To achieve this potential of precision RPT, however, the reliability of the underlying modeling, including the PBPK-based dosimetry, must be established through rigorous verification, validation, and uncertainty quantification (VVUQ). This review outlines the role of VVUQ in ensuring the credibility and clinical applicability of PBPK models in radiotheranostics. Key methodologies for PBPK model VVUQ are discussed, including goodness-of-fit (GOF) assessment, prediction evaluation, and uncertainty propagation.
Application of machine learning techniques enables segmentation of functional tissue units in histology whole-slide images (WSIs). We built a pipeline to apply previously validated segmentation models of kidney structures and extract quantitative features from these structures. Such quantitative analysis also requires qualitative inspection of results for quality control, exploration, and communication. We extend the Vitessce web-based visualization tool to enable visualization of segmentations of multiple types of functional tissue units, such as, glomeruli, tubules, arteries/arterioles in the kidney. Moreover, we propose a standard representation for files containing multiple segmentation bitmasks, which we define polymorphically, such that existing formats including OME-TIFF, OME-NGFF, AnnData, MuData, and SpatialData can be used. We demonstrate that these methods enable researchers and the broader public to interactively explore datasets containing multiple segmented entities and associated features, including for exploration of renal morphometry of biopsies from the Kidney Precision Medicine Project (KPMP) and the Human Biomolecular Atlas Program (HuBMAP).
Focused ultrasound (FUS) therapy is a promising tool for optimally targeted treatment of spinal cord injuries (SCI), offering submillimeter precision to enhance blood flow at injury sites while minimizing impact on surrounding tissues. However, its efficacy is highly sensitive to the placement of the ultrasound source, as the spinal cord's complex geometry and acoustic heterogeneity distort and attenuate the FUS signal. Current approaches rely on computer simulations to solve the governing wave propagation equations and compute patient-specific pressure maps using ultrasound images of the spinal cord anatomy. While accurate, these high-fidelity simulations are computationally intensive, taking up to hours to complete parameter sweeps, which is impractical for real-time surgical decision-making. To address this bottleneck, we propose a convolutional deep operator network (DeepONet) to rapidly predict FUS pressure fields in patient spinal cords. Unlike conventional neural networks, DeepONets are well equipped to approximate the solution operator of the parametric partial differential equations (PDEs) that govern the behavior of FUS waves with varying initial and boundary conditions (i.e., new transducer locations or spinal cord geometries) without requiring extensive simulations. Trained on simulated pressure maps across diverse patient anatomies, this surrogate model achieves real-time predictions with only a 2% loss on the test set, significantly accelerating the modeling of nonlinear physical systems in heterogeneous domains. By facilitating rapid parameter sweeps in surgical settings, this work provides a crucial step toward precise and individualized solutions in neurosurgical treatments.
Purpose: Dynamic glucose enhanced (DGE) MRI studies employ chemical exchange saturation transfer (CEST) or spin lock (CESL) to study glucose uptake. Currently, these methods are hampered by low effect size and sensitivity to motion. To overcome this, we propose to utilize exchange-based linewidth (LW) broadening of the direct water saturation (DS) curve of the water saturation spectrum (Z-spectrum) during and after glucose infusion (DS-DGE MRI). Methods: To estimate the glucose-infusion-induced LW changes (Δ\DeltaLW), Bloch-McConnell simulations were performed for normoglycemia and hyperglycemia in blood, gray matter (GM), white matter (WM), CSF, and malignant tumor tissue. Whole-brain DS-DGE imaging was implemented at 3 tesla using dynamic Z-spectral acquisitions (1.2 s per offset frequency, 38 s per spectrum) and assessed on four brain tumor patients using infusion of 35 g of D-glucose. To assess Δ\DeltaLW, a deep learning-based Lorentzian fitting approach was employed on voxel-based DS spectra acquired before, during, and post-infusion. Area-under-the-curve (AUC) images, obtained from the dynamic Δ\DeltaLW time curves, were compared qualitatively to perfusion-weighted imaging (PWI). Results: In simulations, Δ\DeltaLW was 1.3%, 0.30%, 0.29/0.34%, 7.5%, and 13% in arterial blood, venous blood, GM/WM, malignant tumor tissue, and CSF, respectively. In vivo, Δ\DeltaLW was approximately 1% in GM/WM, 5-20% for different tumor types, and 40% in CSF. The resulting DS-DGE AUC maps clearly outlined lesion areas. Conclusions: DS-DGE MRI is highly promising for assessing D-glucose uptake. Initial results in brain tumor patients show high-quality AUC maps of glucose-induced line broadening and DGE-based lesion enhancement similar and/or complementary to PWI.
Content-based image retrieval (CBIR) systems are an emerging technology that supports reading and interpreting medical images. Since 3D brain MR images are high dimensional, dimensionality reduction is necessary for CBIR using machine learning techniques. In addition, for a reliable CBIR system, each dimension in the resulting low-dimensional representation must be associated with a neurologically interpretable region. We propose a localized variational autoencoder (Loc-VAE) that provides neuroanatomically interpretable low-dimensional representation from 3D brain MR images for clinical CBIR. Loc-VAE is based on β\beta-VAE with the additional constraint that each dimension of the low-dimensional representation corresponds to a local region of the brain. The proposed Loc-VAE is capable of acquiring representation that preserves disease features and is highly localized, even under high-dimensional compression ratios (4096:1). The low-dimensional representation obtained by Loc-VAE improved the locality measure of each dimension by 4.61 points compared to naive β\beta-VAE, while maintaining comparable brain reconstruction capability and information about the diagnosis of Alzheimer's disease.
Electrical potential scalp recordings (Electroencephalograms-EEGs) are a common tool used to investigate brain activity. EEG is routinely used in clinical applications as well as in research studies thanks to its noninvasive nature, relatively inexpensive equipment, and high temporal resolution. But, EEG is prone to contamination from movement artifacts and signals from external sources. Thus, it requires advanced signal processing and mathematical analysis methods in tasks requiring brain state identification. Recently, tools from topological data analysis have been used successfully across many domains, including brain research, however these uses have been limited to fMRI datasets. We introduce the topological tool MapperEEG (M-EEG) and provide an example of it's ability to separate different brain states during a simple finger tapping teaming task without any pre-labeling or prior knowledge. M-EEG uses the power spectral density applied to traditional EEG frequency bands combined with the Mapper algorithm from topological data analysis to capture the underlying structure of the data and represent that structure as a graph in two-dimensional space. This tool provides clear separation (clustering) of states during different conditions of the experiment (syncopated vs. synchronized) and we demonstrate that M-EEG outperforms other clustering methods when applied to EEG data.
We introduce Dreamento (Dream engineering toolbox), an open-source Python package for dream engineering using sleep electroencephalography (EEG) wearables. Dreamento main functions are (1) real-time recording, monitoring, analysis, and sensory stimulation, and (2) offline post-processing of the resulting data, both in a graphical user interface (GUI). In real-time, Dreamento is capable of (1) data recording, visualization, and navigation, (2) power-spectrum analysis, (3) automatic sleep scoring, (4) sensory stimulation (visual, auditory, tactile), (5) establishing text-to-speech communication, and (6) managing annotations of automatic and manual events. The offline functions aid in post-processing the acquired data with features to reformat the wearable data and integrate it with non-wearable recorded modalities such as electromyography (EMG). While Dreamento was primarily developed for (lucid) dreaming studies, its applications can be extended to other areas of sleep research such as closed-loop auditory stimulation and targeted memory reactivation.
The observational ubiquity of inverse power law spectra (IPL) in complex phenomena entails theory for dynamic fractal phenomena capturing their fractal dimension, dynamics, and statistics. These and other properties are consequences of the complexity resulting from nonlinear dynamic networks collectively summarized for biomedical phenomena as the Network Effect (NE) or focused more narrowly as Network Physiology. Herein we address the measurable consequences of the NE on time series generated by different parts of the brain, heart, and lung organ networks, which are directly related to their inter-network and intra-network interactions. Moreover, these same physiologic organ networks have been shown to generate crucial event (CE) time series, and herein are shown, using modified diffusion entropy analysis (MDEA), to have scaling indices with quasiperiodic changes in complexity, as measured by scaling indices, over time. Such time series are generated by different parts of the brain, heart, and lung organ networks, and the results do not depend on the underlying coherence properties of the associated time series but demonstrate a generalized synchronization of complexity. This high order synchrony among the scaling indices of EEG (brain), ECG (heart), and respiratory time series is governed by the quantitative interdependence of the multifractal behavior of the various physiological organs' network dynamics. This consequence of the NE opens the door for an entirely general characterization of the dynamics of complex networks in terms of complexity synchronization (CS) independently of the scientific, engineering, or technological context.
While deep learning has catalyzed breakthroughs across numerous domains, its broader adoption in clinical settings is inhibited by the costly and time-intensive nature of data acquisition and annotation. To further facilitate medical machine learning, we present an ultrasound dataset of 10,223 Brightness-mode (B-mode) images consisting of sagittal slices of porcine spinal cords (N=25) before and after a contusion injury. We additionally benchmark the performance metrics of several state-of-the-art object detection algorithms to localize the site of injury and semantic segmentation models to label the anatomy for comparison and creation of task-specific architectures. Finally, we evaluate the zero-shot generalization capabilities of the segmentation models on human ultrasound spinal cord images to determine whether training on our porcine dataset is sufficient for accurately interpreting human data. Our results show that the YOLOv8 detection model outperforms all evaluated models for injury localization, achieving a mean Average Precision (mAP50-95) score of 0.606. Segmentation metrics indicate that the DeepLabv3 segmentation model achieves the highest accuracy on unseen porcine anatomy, with a Mean Dice score of 0.587, while SAMed achieves the highest Mean Dice score generalizing to human anatomy (0.445). To the best of our knowledge, this is the largest annotated dataset of spinal cord ultrasound images made publicly available to researchers and medical professionals, as well as the first public report of object detection and segmentation architectures to assess anatomical markers in the spinal cord for methodology development and clinical applications.
Self-supervised foundation models have recently been successfully extended to encode three-dimensional (3D) computed tomography (CT) images, with excellent performance across several downstream tasks, such as intracranial hemorrhage detection and lung cancer risk forecasting. However, as self-supervised models learn from complex data distributions, questions arise concerning whether these embeddings capture demographic information, such as age, sex, or race. Using the National Lung Screening Trial (NLST) dataset, which contains 3D CT images and demographic data, we evaluated a range of classifiers: softmax regression, linear regression, linear support vector machine, random forest, and decision tree, to predict sex, race, and age of the patients in the images. Our results indicate that the embeddings effectively encoded age and sex information, with a linear regression model achieving a root mean square error (RMSE) of 3.8 years for age prediction and a softmax regression model attaining an AUC of 0.998 for sex classification. Race prediction was less effective, with an AUC of 0.878. These findings suggest a detailed exploration into the information encoded in self-supervised learning frameworks is needed to help ensure fair, responsible, and patient privacy-protected healthcare AI.
Multi-slice magnetic resonance images of the fetal brain are usually contaminated by severe and arbitrary fetal and maternal motion. Hence, stable and robust motion correction is necessary to reconstruct high-resolution 3D fetal brain volume for clinical diagnosis and quantitative analysis. However, the conventional registration-based correction has a limited capture range and is insufficient for detecting relatively large motions. Here, we present a novel Affinity Fusion-based Framework for Iteratively Random Motion (AFFIRM) correction of the multi-slice fetal brain MRI. It learns the sequential motion from multiple stacks of slices and integrates the features between 2D slices and reconstructed 3D volume using affinity fusion, which resembles the iterations between slice-to-volume registration and volumetric reconstruction in the regular pipeline. The method accurately estimates the motion regardless of brain orientations and outperforms other state-of-the-art learning-based methods on the simulated motion-corrupted data, with a 48.4% reduction of mean absolute error for rotation and 61.3% for displacement. We then incorporated AFFIRM into the multi-resolution slice-to-volume registration and tested it on the real-world fetal MRI scans at different gestation stages. The results indicated that adding AFFIRM to the conventional pipeline improved the success rate of fetal brain super-resolution reconstruction from 77.2% to 91.9%.
Cerebrovascular disease is a leading cause of death globally. Prevention and early intervention are known to be the most effective forms of its management. Non-invasive imaging methods hold great promises for early stratification, but at present lack the sensitivity for personalized prognosis. Resting-state functional magnetic resonance imaging (rs-fMRI), a powerful tool previously used for mapping neural activity, is available in most hospitals. Here we show that rs-fMRI can be used to map cerebral hemodynamic function and delineate impairment. By exploiting time variations in breathing pattern during rs-fMRI, deep learning enables reproducible mapping of cerebrovascular reactivity (CVR) and bolus arrive time (BAT) of the human brain using resting-state CO2 fluctuations as a natural 'contrast media'. The deep-learning network was trained with CVR and BAT maps obtained with a reference method of CO2-inhalation MRI, which included data from young and older healthy subjects and patients with Moyamoya disease and brain tumors. We demonstrate the performance of deep-learning cerebrovascular mapping in the detection of vascular abnormalities, evaluation of revascularization effects, and vascular alterations in normal aging. In addition, cerebrovascular maps obtained with the proposed method exhibited excellent reproducibility in both healthy volunteers and stroke patients. Deep-learning resting-state vascular imaging has the potential to become a useful tool in clinical cerebrovascular imaging.
4
Urinary tract infections (UTIs) are a common condition that can lead to serious complications including kidney injury, altered mental status, sepsis, and death. Laboratory tests such as urinalysis and urine culture are the mainstays of UTI diagnosis, whereby a urine specimen is collected and processed to reveal its cellular and chemical composition. This process requires precise specimen collection, handling infectious human waste, controlled urine storage, and timely transportation to modern laboratory equipment for analysis. Holographic lens free imaging (LFI) can measure large volumes of urine via a simple and compact optical setup, potentially enabling automatic urine analysis at the patient bedside. We introduce an LFI system capable of resolving important urine clinical biomarkers such as red blood cells, white blood cells, crystals, casts, and E. Coli in urine phantoms. This approach is sensitive to the particulate concentrations relevant for detecting several clinical urine abnormalities such as hematuria, pyuria, and bacteriuria. We show bacteria concentrations across eight orders of magnitude can be estimated by analyzing LFI measurements. LFI measurements of blood cell concentrations are relatively insensitive to changes in bacteria concentrations of over seven orders of magnitude. Lastly, LFI reveals clear differences between UTI-positive and UTI-negative urine from human patients. Together, these results show promise for LFI as a tool for urine screening, potentially offering early, point-of-care detection of UTI and other pathological processes.
Neurobiological theories of spatial cognition developed with respect to recording data from relatively small and/or simplistic environments compared to animals' natural habitats. It has been unclear how to extend theoretical models to large or complex spaces. Complementarily, in autonomous systems technology, applications have been growing for distributed control methods that scale to large numbers of low-footprint mobile platforms. Animals and many-robot groups must solve common problems of navigating complex and uncertain environments. Here, we introduce the 'NeuroSwarms' control framework to investigate whether adaptive, autonomous swarm control of minimal artificial agents can be achieved by direct analogy to neural circuits of rodent spatial cognition. NeuroSwarms analogizes agents to neurons and swarming groups to recurrent networks. We implemented neuron-like agent interactions in which mutually visible agents operate as if they were reciprocally-connected place cells in an attractor network. We attributed a phase state to agents to enable patterns of oscillatory synchronization similar to hippocampal models of theta-rhythmic (5-12 Hz) sequence generation. We demonstrate that multi-agent swarming and reward-approach dynamics can be expressed as a mobile form of Hebbian learning and that NeuroSwarms supports a single-entity paradigm that directly informs theoretical models of animal cognition. We present emergent behaviors including phase-organized rings and trajectory sequences that interact with environmental cues and geometry in large, fragmented mazes. Thus, NeuroSwarms is a model artificial spatial system that integrates autonomous control and theoretical neuroscience to potentially uncover common principles to advance both domains.
The diversity and heterogeneity of biomarkers has made the development of general methods for single-step quantification of analytes difficult. For individual biomarkers, electrochemical methods that detect a conformational change in an affinity binder upon analyte binding have shown promise. However, because the conformational change must operate within a nanometer-scale working distance, an entirely new sensor, with a unique conformational change, must be developed for each analyte. Here, we demonstrate a modular electrochemical biosensor, built from DNA origami, which is easily adapted to diverse molecules by merely replacing its analyte binding domains. Instead of relying on a unique nanometer-scale movement of a single redox reporter, all sensor variants rely on the same 100-nanometer scale conformational change, which brings dozens of reporters close enough to a gold electrode surface that a signal can be measured via square wave voltammetry, a standard electrochemical technique. To validate our sensor's mechanism, we used single-stranded DNA as an analyte, and optimized the number of redox reporters and various linker lengths. Adaptation of the sensor to streptavidin and PDGF-BB analytes was achieved by simply adding biotin or anti-PDGF aptamers to appropriate DNA linkers. Geometrically-optimized streptavidin sensors exhibited signal gain and limit of detection markedly better than comparable reagentless electrochemical sensors. After use, the same sensors could be regenerated under mild conditions: performance was largely maintained over four cycles of DNA strand displacement and rehybridization. By leveraging the modularity of DNA nanostructures, our work provides a straightforward route to the single-step quantification of arbitrary nucleic acids and proteins.
While Shannon's mutual information has widespread applications in many disciplines, for practical applications it is often difficult to calculate its value accurately for high-dimensional variables because of the curse of dimensionality. This paper is focused on effective approximation methods for evaluating mutual information in the context of neural population coding. For large but finite neural populations, we derive several information-theoretic asymptotic bounds and approximation formulas that remain valid in high-dimensional spaces. We prove that optimizing the population density distribution based on these approximation formulas is a convex optimization problem which allows efficient numerical solutions. Numerical simulation results confirmed that our asymptotic formulas were highly accurate for approximating mutual information for large neural populations. In special cases, the approximation formulas are exactly equal to the true mutual information. We also discuss techniques of variable transformation and dimensionality reduction to facilitate computation of the approximations.
There are no more papers matching your filters at the moment.