Erasmus Medical CenterErasmus University
Electrocardiogram (ECG) interpretation is essential for cardiovascular disease diagnosis, but current automated systems often struggle with transparency and generalization to unseen conditions. To address this, we introduce ZETA, a zero-shot multimodal framework designed for interpretable ECG diagnosis aligned with clinical workflows. ZETA uniquely compares ECG signals against structured positive and negative clinical observations, which are curated through an LLM-assisted, expert-validated process, thereby mimicking differential diagnosis. Our approach leverages a pre-trained multimodal model to align ECG and text embeddings without disease-specific fine-tuning. Empirical evaluations demonstrate ZETA's competitive zero-shot classification performance and, importantly, provide qualitative and quantitative evidence of enhanced interpretability, grounding predictions in specific, clinically relevant positive and negative diagnostic features. ZETA underscores the potential of aligning ECG analysis with structured clinical knowledge for building more transparent, generalizable, and trustworthy AI diagnostic systems. We will release the curated observation dataset and code to facilitate future research.
Researchers at Erasmus University and IE University developed the Autonomous Prompt Engineering Toolbox (APET), a system that enables GPT-4 to autonomously optimize its own prompts, reducing human dependency in prompt design. This approach led to accuracy improvements of up to 6.8% on tasks like geometric shapes and 4.4% on word sorting, while revealing limitations in highly specialized strategic domains.
80
·
Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1W) and T2-weighted (T2W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We developed a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet's accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen's kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (std: 7.2%, at case level) with CT, 85.0% (std: 7.9%) with T1W MRI, and 86.3% (std: 6.4%) with T2W MRI. There was a high correlation for pancreas volume prediction with R^2 of 0.91, 0.84, and 0.85 for CT, T1W, and T2W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1W and T2W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at this https URL. Our source code is available at this https URL.
42
Understanding the decisions made and actions taken by increasingly complex AI system remains a key challenge. This has led to an expanding field of research in explainable artificial intelligence (XAI), highlighting the potential of explanations to enhance trust, support adoption, and meet regulatory standards. However, the question of what constitutes a "good" explanation is dependent on the goals, stakeholders, and context. At a high level, psychological insights such as the concept of mental model alignment can offer guidance, but success in practice is challenging due to social and technical factors. As a result of this ill-defined nature of the problem, explanations can be of poor quality (e.g. unfaithful, irrelevant, or incoherent), potentially leading to substantial risks. Instead of fostering trust and safety, poorly designed explanations can actually cause harm, including wrong decisions, privacy violations, manipulation, and even reduced AI adoption. Therefore, we caution stakeholders to beware of explanations of AI: while they can be vital, they are not automatically a remedy for transparency or responsible AI adoption, and their misuse or limitations can exacerbate harm. Attention to these caveats can help guide future research to improve the quality and impact of AI explanations.
Leveraging big data for patient care is promising in many medical fields such as cardiovascular health. For example, hemodynamic biomarkers like wall shear stress could be assessed from patient-specific medical images via machine learning algorithms, bypassing the need for time-intensive computational fluid simulation. However, it is extremely challenging to amass large-enough datasets to effectively train such models. We could address this data scarcity by means of self-supervised pre-training and foundations models given large datasets of geometric artery models. In the context of coronary arteries, leveraging learned representations to improve hemodynamic biomarker assessment has not yet been well studied. In this work, we address this gap by investigating whether a large dataset (8449 shapes) consisting of geometric models of 3D blood vessels can benefit wall shear stress assessment in coronary artery models from a small-scale clinical trial (49 patients). We create a self-supervised target for the 3D blood vessels by computing the heat kernel signature, a quantity obtained via Laplacian eigenvectors, which captures the very essence of the shapes. We show how geometric representations learned from this datasets can boost segmentation of coronary arteries into regions of low, mid and high (time-averaged) wall shear stress even when trained on limited data.
This survey identifies and analyzes 23 pioneering metaheuristic algorithms from over 150 new proposals introduced between 2019 and 2024, evaluating them based on criteria such as citation metrics, problem diversity, and innovative mechanisms, and mapping their state-of-the-art applications across various domains.
Objective Reticular pseudodrusen (RPD), a key feature of age-related macular degeneration (AMD), are poorly detected by human experts on standard color fundus photography (CFP) and typically require advanced imaging modalities such as fundus autofluorescence (FAF). The objective was to develop and evaluate the performance of a novel 'M3' deep learning framework on RPD detection. Materials and Methods A deep learning framework M3 was developed to detect RPD presence accurately using CFP alone, FAF alone, or both, employing >8000 CFP-FAF image pairs obtained prospectively (Age-Related Eye Disease Study 2). The M3 framework includes multi-modal (detection from single or multiple image modalities), multi-task (training different tasks simultaneously to improve generalizability), and multi-attention (improving ensembled feature representation) operation. Performance on RPD detection was compared with state-of-the-art deep learning models and 13 ophthalmologists; performance on detection of two other AMD features (geographic atrophy and pigmentary abnormalities) was also evaluated. Results For RPD detection, M3 achieved area under receiver operating characteristic (AUROC) 0.832, 0.931, and 0.933 for CFP alone, FAF alone, and both, respectively. M3 performance on CFP was very substantially superior to human retinal specialists (median F1-score 0.644 versus 0.350). External validation (on Rotterdam Study, Netherlands) demonstrated high accuracy on CFP alone (AUROC 0.965). The M3 framework also accurately detected geographic atrophy and pigmentary abnormalities (AUROC 0.909 and 0.912, respectively), demonstrating its generalizability. Conclusion This study demonstrates the successful development, robust evaluation, and external validation of a novel deep learning framework that enables accessible, accurate, and automated AMD diagnosis and prognosis.
We study decision problems under uncertainty, where the decision-maker has access to KK data sources that carry {\em biased} information about the underlying risk factors. The biases are measured by the mismatch between the risk factor distribution and the KK data-generating distributions with respect to an optimal transport (OT) distance. In this situation the decision-maker can exploit the information contained in the biased samples by solving a distributionally robust optimization (DRO) problem, where the ambiguity set is defined as the intersection of KK OT neighborhoods, each of which is centered at the empirical distribution on the samples generated by a biased data source. We show that if the decision-maker has a prior belief about the biases, then the out-of-sample performance of the DRO solution can improve with KK -- irrespective of the magnitude of the biases. We also show that, under standard convexity assumptions, the proposed DRO problem is computationally tractable if either KK or the dimension of the risk factors is kept constant.
Accurate classification of Intraductal Papillary Mucinous Neoplasms (IPMN) is essential for identifying high-risk cases that require timely intervention. In this study, we develop a federated learning framework for multi-center IPMN classification utilizing a comprehensive pancreas MRI dataset. This dataset includes 652 T1-weighted and 655 T2-weighted MRI images, accompanied by corresponding IPMN risk scores from 7 leading medical institutions, making it the largest and most diverse dataset for IPMN classification to date. We assess the performance of DenseNet-121 in both centralized and federated settings for training on distributed data. Our results demonstrate that the federated learning approach achieves high classification accuracy comparable to centralized learning while ensuring data privacy across institutions. This work marks a significant advancement in collaborative IPMN classification, facilitating secure and high-accuracy model training across multiple centers.
Objectives: Marginal donor kidneys are increasingly used for transplantation to overcome organ shortage. This study aims to investigate the additional value of Power Doppler (PD) imaging in kidney quality assessment during normothermic machine perfusion (NMP). Methods: Porcine kidneys (n=22) retrieved from a local slaughterhouse underwent 2 hours of NMP. Based on creatinine clearance (CrCl) and oxygen consumption (VO2), the kidneys were classified as functional (n=7) and non-functional (n=15) kidneys. PD imaging was performed at 30, 60, and 120 minutes, and PD metrics, including vascularization index (VI), flow index (FI), and vascularization flow index (VFI) were calculated. Renal blood flow (RBF), CrCl, and VO2 were measured at the same time points during NMP. The metrics were compared utilizing correlation analysis. Results: FI and VFI moderately correlated with CrCl (r=0.537, p<0.0001; r=0.536, p<0.0001, respectively), while VI strongly correlated with VO2 (r=0.839, p<0.0001). At 120 minutes, PD metrics demonstrated the highest diagnostic accuracy for distinguishing functional from non-functional kidneys, with an area under the curve (AUC) of 0.943 for VI, 0.924 for FI, and 0.943 for VFI. Cutoff values of 17% for VI, 50 a.u. for FI, and 9 a.u. for VFI provided 100% specificity and 73% sensitivity to identify non-functional kidneys, with an overall diagnostic accuracy of 82%. Baseline kidney biopsies showed moderate acute tubular necrosis in both groups with no significant differences. Conclusions: PD metrics strongly correlate with renal viability and effectively differentiate functional from non-functional kidneys. PD imaging can be a valuable alternative to RBF during NMP for kidney assessment.
Pancreatic cancer is projected to become the second-deadliest malignancy in Western countries by 2030, highlighting the urgent need for better early detection. Intraductal papillary mucinous neoplasms (IPMNs), key precursors to pancreatic cancer, are challenging to assess with current guidelines, often leading to unnecessary surgeries or missed malignancies. We present Cyst-X, an AI framework that predicts IPMN malignancy using multicenter MRI data, leveraging MRI's superior soft tissue contrast over CT. Trained on 723 T1- and 738 T2-weighted scans from 764 patients across seven institutions, our models (AUC=0.82) significantly outperform both Kyoto guidelines (AUC=0.75) and expert radiologists. The AI-derived imaging features align with known clinical markers and offer biologically meaningful insights. We also demonstrate strong performance in a federated learning setting, enabling collaborative training without sharing patient data. To promote privacy-preserving AI development and improve IPMN risk stratification, the Cyst-X dataset is released as the first large-scale, multi-center pancreatic cysts MRI dataset.
Federated learning (FL) enables collaborative model training across institutions without sharing sensitive data, making it an attractive solution for medical imaging tasks. However, traditional FL methods, such as Federated Averaging (FedAvg), face difficulties in generalizing across domains due to variations in imaging protocols and patient demographics across institutions. This challenge is particularly evident in pancreas MRI segmentation, where anatomical variability and imaging artifacts significantly impact performance. In this paper, we conduct a comprehensive evaluation of FL algorithms for pancreas MRI segmentation and introduce a novel approach that incorporates adaptive aggregation weights. By dynamically adjusting the contribution of each client during model aggregation, our method accounts for domain-specific differences and improves generalization across heterogeneous datasets. Experimental results demonstrate that our approach enhances segmentation accuracy and reduces the impact of domain shift compared to conventional FL methods while maintaining privacy-preserving capabilities. Significant performance improvements are observed across multiple hospitals (centers).
We present the results and conclusions from the citizen science competition `Observing Dark Worlds', where we asked participants to calculate the positions of dark matter halos from 120 catalogues of simulated weak lensing galaxy data, using computational methods. In partnership with Kaggle (this http URL), 357 users participated in the competition which saw 2278 downloads of the data and 3358 submissions. We found that the best algorithms improved on the benchmark code, LENSTOOL by > 30% and could measure the positions of > 3x10^14MSun halos to less than 5'' and < 10^14MSun to within 1'. In this paper, we present a brief overview of the winning algorithms with links to available code. We also discuss the implications of the experiment for future citizen science competitions.
Operations and Supply Chain Management (OSCM) has continually evolved, incorporating a broad array of strategies, frameworks, and technologies to address complex challenges across industries. This encyclopedic article provides a comprehensive overview of contemporary strategies, tools, methods, principles, and best practices that define the field's cutting-edge advancements. It also explores the diverse environments where OSCM principles have been effectively implemented. The article is meant to be read in a nonlinear fashion. It should be used as a point of reference or first-port-of-call for a diverse pool of readers: academics, researchers, students, and practitioners.
A method for automatically quantifying emphysema regions using High-Resolution Computed Tomography (HRCT) scans of patients with chronic obstructive pulmonary disease (COPD) that does not require manually annotated scans for training is presented. HRCT scans of controls and of COPD patients with diverse disease severity are acquired at two different centers. Textural features from co-occurrence matrices and Gaussian filter banks are used to characterize the lung parenchyma in the scans. Two robust versions of multiple instance learning (MIL) classifiers, miSVM and MILES, are investigated. The classifiers are trained with the weak labels extracted from the forced expiratory volume in one minute (FEV1_1) and diffusing capacity of the lungs for carbon monoxide (DLCO). At test time, the classifiers output a patient label indicating overall COPD diagnosis and local labels indicating the presence of emphysema. The classifier performance is compared with manual annotations by two radiologists, a classical density based method, and pulmonary function tests (PFTs). The miSVM classifier performed better than MILES on both patient and emphysema classification. The classifier has a stronger correlation with PFT than the density based method, the percentage of emphysema in the intersection of annotations from both radiologists, and the percentage of emphysema annotated by one of the radiologists. The correlation between the classifier and the PFT is only outperformed by the second radiologist. The method is therefore promising for facilitating assessment of emphysema and reducing inter-observer variability.
When Artificial Intelligence (AI) is used to replace consumers (e.g., synthetic data), it is often assumed that AI emulates established consumers, and more generally human behaviors. Ten experiments with Large Language Models (LLMs) investigate if this is true in the domain of well-documented biases and heuristics. Across studies we observe four distinct types of deviations from human-like behavior. First, in some cases, LLMs reduce or correct biases observed in humans. Second, in other cases, LLMs amplify these same biases. Third, and perhaps most intriguingly, LLMs sometimes exhibit biases opposite to those found in humans. Fourth, LLMs' responses to the same (or similar) prompts tend to be inconsistent (a) within the same model after a time delay, (b) across models, and (c) among independent research studies. Such inconsistencies can be uncharacteristic of humans and suggest that, at least at one point, LLMs' responses differed from humans. Overall, unhuman-like responses are problematic when LLMs are used to mimic or predict consumer behavior. These findings complement research on synthetic consumer data by showing that sources of bias are not necessarily human-centric. They also contribute to the debate about the tasks for which consumers, and more generally humans, can be replaced by AI.
In 1979 Stockman introduced the SSS* minimax search algorithm that domi- nates Alpha-Beta in the number of leaf nodes expanded. Further investigation of the algorithm showed that it had three serious drawbacks, which prevented its use by practitioners: it is difficult to understand, it has large memory requirements, and it is slow. This paper presents an alternate formulation of SSS*, in which it is implemented as a series of Alpha-Beta calls that use a transposition table (AB- SSS*). The reformulation solves all three perceived drawbacks of SSS*, making it a practical algorithm. Further, because the search is now based on Alpha-Beta, the extensive research on minimax search enhancements can be easily integrated into AB-SSS*. To test AB-SSS* in practise, it has been implemented in three state-of-the- art programs: for checkers, Othello and chess. AB-SSS* is comparable in performance to Alpha-Beta on leaf node count in all three games, making it a viable alternative to Alpha-Beta in practise. Whereas SSS* has usually been regarded as being entirely different from Alpha-Beta, it turns out to be just an Alpha-Beta enhancement, like null-window searching. This runs counter to published simulation results. Our research leads to the surprising result that iterative deepening versions of Alpha-Beta can expand fewer leaf nodes than iterative deepening versions of SSS* due to dynamic move re-ordering.
Functional ultrasound (fUS) indirectly measures brain activity by recording changes in cerebral blood volume and flow in response to neural activation. Conventional approaches model such functional neuroimaging data as the convolution between an impulse response, known as the hemodynamic response function (HRF), and a binarized representation of the input (i.e., source) signal based on the stimulus onsets, the so-called experimental paradigm (EP). However, the EP may not be enough to characterize the whole complexity of the underlying source signals that evoke the hemodynamic changes, such as in the case of spontaneous resting state activity. Furthermore, the HRF varies across brain areas and stimuli. To achieve an adaptable framework that can capture such dynamics and unknowns of the brain function, we propose a deconvolution method for multivariate fUS time-series that reveals both the region-specific HRFs, and the source signals that induce the hemodynamic responses in the studied regions. We start by modeling the fUS time-series as convolutive mixtures and use a tensor-based approach for deconvolution based on two assumptions: (1) HRFs are parametrizable, and (2) source signals are uncorrelated. We test our approach on fUS data acquired during a visual experiment on a mouse subject, focusing on three regions within the mouse brain's colliculo-cortical, image-forming pathway: the lateral geniculate nucleus, superior colliculus and visual cortex. The estimated HRFs in each region are in agreement with prior works, whereas the estimated source signal is observed to closely follow the EP. Yet, we note a few deviations from the EP in the estimated source signal that most likely arise due to the trial-by-trial variability of the neural response across different repetitions of the stimulus observed in the selected regions.
Background: Risk prediction models are useful tools in clinical decision-making which help with risk stratification and resource allocations and may lead to a better health care for patients. AutoScore is a machine learning-based automatic clinical score generator for binary outcomes. This study aims to expand the AutoScore framework to provide a tool for interpretable risk prediction for ordinal outcomes. Methods: The AutoScore-Ordinal framework is generated using the same 6 modules of the original AutoScore algorithm including variable ranking, variable transformation, score derivation (from proportional odds models), model selection, score fine-tuning, and model evaluation. To illustrate the AutoScore-Ordinal performance, the method was conducted on electronic health records data from the emergency department at Singapore General Hospital over 2008 to 2017. The model was trained on 70% of the data, validated on 10% and tested on the remaining 20%. Results: This study included 445,989 inpatient cases, where the distribution of the ordinal outcome was 80.7% alive without 30-day readmission, 12.5% alive with 30-day readmission, and 6.8% died inpatient or by day 30 post discharge. Two point-based risk prediction models were developed using two sets of 8 predictor variables identified by the flexible variable selection procedure. The two models indicated reasonably good performance measured by mean area under the receiver operating characteristic curve (0.785 and 0.793) and generalized c-index (0.737 and 0.760), which were comparable to alternative models. Conclusion: AutoScore-Ordinal provides an automated and easy-to-use framework for development and validation of risk prediction models for ordinal outcomes, which can systematically identify potential predictors from high-dimensional data.
Next-generation personalized healthcare devices are undergoing extreme miniaturization in order to improve user acceptability. However, such developments make it difficult to incorporate cryptographic primitives using available target technologies since these algorithms are notorious for their energy consumption. Besides, strengthening these schemes against side-channel attacks further adds to the device overheads. Therefore, viable alternatives among emerging technologies are being sought. In this work, we investigate the possibility of using memristors for implementing lightweight encryption. We propose a 40-nm RRAM-based GIFT-cipher implementation using a 1T1R configuration with promising results; it exhibits roughly half the energy consumption of a CMOS-only implementation. More importantly, its non-volatile and reconfigurable substitution boxes offer an energy-efficient protection mechanism against side-channel attacks. The complete cipher takes 0.0034 mm2^2 of area, and encrypting a 128-bit block consumes a mere 242 pJ.
There are no more papers matching your filters at the moment.