Amsterdam University Medical Centers
Machine learning (ML) has the potential to revolutionize various domains, but its adoption is often hindered by the disconnect between the needs of domain experts and translating these needs into robust and valid ML tools. Despite recent advances in LLM-based co-pilots to democratize ML for non-technical domain experts, these systems remain predominantly focused on model-centric aspects while overlooking critical data-centric challenges. This limitation is problematic in complex real-world settings where raw data often contains complex issues, such as missing values, label noise, and domain-specific nuances requiring tailored handling. To address this we introduce CliMB-DC, a human-guided, data-centric framework for LLM co-pilots that combines advanced data-centric tools with LLM-driven reasoning to enable robust, context-aware data processing. At its core, CliMB-DC introduces a novel, multi-agent reasoning system that combines a strategic coordinator for dynamic planning and adaptation with a specialized worker agent for precise execution. Domain expertise is then systematically incorporated to guide the reasoning process using a human-in-the-loop approach. To guide development, we formalize a taxonomy of key data-centric challenges that co-pilots must address. Thereafter, to address the dimensions of the taxonomy, we integrate state-of-the-art data-centric tools into an extensible, open-source architecture, facilitating the addition of new tools from the research community. Empirically, using real-world healthcare datasets we demonstrate CliMB-DC's ability to transform uncurated datasets into ML-ready formats, significantly outperforming existing co-pilot baselines for handling data-centric challenges. CliMB-DC promises to empower domain experts from diverse domains -- healthcare, finance, social sciences and more -- to actively participate in driving real-world impact using ML.
We developed and validated TRisk, a Transformer-based AI model predicting 36-month mortality in heart failure patients by analysing temporal patient journeys from UK electronic health records (EHR). Our study included 403,534 heart failure patients (ages 40-90) from 1,418 English general practices, with 1,063 practices for model derivation and 355 for external validation. TRisk was compared against the MAGGIC-EHR model across various patient subgroups. With median follow-up of 9 months, TRisk achieved a concordance index of 0.845 (95% confidence interval: [0.841, 0.849]), significantly outperforming MAGGIC-EHR's 0.728 (0.723, 0.733) for predicting 36-month all-cause mortality. TRisk showed more consistent performance across sex, age, and baseline characteristics, suggesting less bias. We successfully adapted TRisk to US hospital data through transfer learning, achieving a C-index of 0.802 (0.789, 0.816) with 21,767 patients. Explainability analyses revealed TRisk captured established risk factors while identifying underappreciated predictors like cancers and hepatic failure that were important across both cohorts. Notably, cancers maintained strong prognostic value even a decade after diagnosis. TRisk demonstrated well-calibrated mortality prediction across both healthcare systems. Our findings highlight the value of tracking longitudinal health profiles and revealed risk factors not included in previous expert-driven models.
Recent advances in single cell sequencing and multi-omics techniques have significantly improved our understanding of biological phenomena and our capacity to model them. Despite combined capture of data modalities showing similar progress, notably single cell transcriptomics and proteomics, simultaneous multi-omics level probing still remains challenging. As an alternative to combined capture of biological data, in this review, we explore current and upcoming methods for post-hoc network inference and integration with an emphasis on single cell transcriptomics and proteomics. By examining various approaches, from probabilistic models to graph-based algorithms, we outline the challenges and potential strategies for effectively combining biological data types while simultaneously highlighting the importance of model validation. With this review, we aim to inform readers of the breadth of tools currently available for the purpose-specific generation of heterogeneous multi-layer networks.
Cardiovascular disease (CVD) remains a leading cause of death, and primary prevention through personalized interventions is crucial. This paper introduces MyDigiTwin, a framework that integrates health digital twins with personal health environments to empower patients in exploring personalized health scenarios while ensuring data privacy. MyDigiTwin uses federated learning to train predictive models across distributed datasets without transferring raw data, and a novel data harmonization framework addresses semantic and format inconsistencies in health data. A proof-of-concept demonstrates the feasibility of harmonizing and using cohort data to train privacy-preserving CVD prediction models. This framework offers a scalable solution for proactive, personalized cardiovascular care and sets the stage for future applications in real-world healthcare settings.
In stopping the spread of infectious diseases, pathogen genomic data can be used to reconstruct transmission events and characterize population-level sources of infection. Most approaches for identifying transmission pairs do not account for the time passing since divergence of pathogen variants in individuals, which is problematic in viruses with high within-host evolutionary rates. This prompted us to consider possible transmission pairs in terms of phylogenetic data and additional estimates of time since infection derived from clinical biomarkers. We develop Bayesian mixture models with an evolutionary clock as signal component and additional mixed effects or covariate random functions describing the mixing weights to classify potential pairs into likely and unlikely transmission pairs. We demonstrate that although sources cannot be identified at the individual level with certainty, even with the additional data on time elapsed, inferences into the population-level sources of transmission are possible, and more accurate than using only phylogenetic data without time since infection estimates. We apply the approach to estimate age-specific sources of HIV infection in Amsterdam MSM transmission networks between 2010-2021. This study demonstrates that infection time estimates provide informative data to characterize transmission sources, and shows how phylogenetic source attribution can then be done with multi-dimensional mixture models.
Purpose: Coronary artery calcium (CAC) score, i.e. the amount of CAC quantified in CT, is a strong and independent predictor of coronary heart disease (CHD) events. However, CAC scoring suffers from limited interscan reproducibility, which is mainly due to the clinical definition requiring application of a fixed intensity level threshold for segmentation of calcifications. This limitation is especially pronounced in non-ECG-synchronized CT where lesions are more impacted by cardiac motion and partial volume effects. Therefore, we propose a CAC quantification method that does not require a threshold for segmentation of CAC. Approach: Our method utilizes a generative adversarial network where a CT with CAC is decomposed into an image without CAC and an image showing only CAC. The method, using a CycleGAN, was trained using 626 low-dose chest CTs and 514 radiotherapy treatment planning CTs. Interscan reproducibility was compared to clinical calcium scoring in radiotherapy treatment planning CTs of 1,662 patients, each having two scans. Results: A lower relative interscan difference in CAC mass was achieved by the proposed method: 47% compared to 89% manual clinical calcium scoring. The intraclass correlation coefficient of Agatston scores was 0.96 for the proposed method compared to 0.91 for automatic clinical calcium scoring. Conclusions: The increased interscan reproducibility achieved by our method may lead to increased reliability of CHD risk categorization and improved accuracy of CHD event prediction.
While shrinkage is essential in high-dimensional settings, its use for low-dimensional regression-based prediction has been debated. It reduces variance, often leading to improved prediction accuracy. However, it also inevitably introduces bias, which may harm two other measures of predictive performance: calibration and coverage of confidence intervals. Much of the criticism stems from the usage of standard shrinkage methods, such as lasso and ridge with a single, cross-validated penalty. Our aim is to show that readily available alternatives can strongly improve predictive performance, in terms of accuracy, calibration or coverage. For linear regression, we use small sample splits of a large, fairly typical epidemiological data set to illustrate this. We show that usage of differential ridge penalties for covariate groups may enhance prediction accuracy, while calibration and coverage benefit from additional shrinkage of the penalties. In the logistic setting, we apply an external simulation to demonstrate that local shrinkage improves calibration with respect to global shrinkage, while providing better prediction accuracy than other solutions, like Firth's correction. The benefits of the alternative shrinkage methods are easily accessible via example implementations using \texttt{mgcv} and \texttt{r-stan}, including the estimation of multiple penalties. A synthetic copy of the large data set is shared for reproducibility.
Standardization of data items collected in paediatric clinical trials is an important but challenging issue. The Clinical Data Interchange Standards Consortium (CDISC) data standards are well understood by the pharmaceutical industry but lack the implementation of some paediatric specific concepts. When a paediatric concept is absent within CDISC standards, companies and research institutions take multiple approaches in the collection of paediatric data, leading to different implementations of standards and potentially limited utility for reuse. To overcome these challenges, the conect4children consortium has developed a cross-cutting paediatric data dictionary (CCPDD). The dictionary was built over three phases - scoping (including a survey sent out to ten industrial and 34 academic partners to gauge interest), creation of a longlist and consensus building for the final set of terms. The dictionary was finalized during a workshop with attendees from academia, hospitals, industry and CDISC. The attendees held detailed discussions on each data item and participated in the final vote on the inclusion of the item in the CCPDD. Nine industrial and 34 academic partners responded to the survey, which showed overall interest in the development of the CCPDD. Following the final vote on 27 data items, three were rejected, six were deferred to the next version and a final opinion was sought from CDISC. The first version of the CCPDD with 25 data items was released in August 2019. The continued use of the dictionary has the potential to ensure the collection of standardized data that is interoperable and can later be pooled and reused for other applications. The dictionary is already being used for case report form creation in three clinical trials. The CCPDD will also serve as one of the inputs to the Paediatric User Guide, which is being developed by CDISC.
Extent of resection after surgery is one of the main prognostic factors for patients diagnosed with glioblastoma. To achieve this, accurate segmentation and classification of residual tumor from post-operative MR images is essential. The current standard method for estimating it is subject to high inter- and intra-rater variability, and an automated method for segmentation of residual tumor in early post-operative MRI could lead to a more accurate estimation of extent of resection. In this study, two state-of-the-art neural network architectures for pre-operative segmentation were trained for the task. The models were extensively validated on a multicenter dataset with nearly 1000 patients, from 12 hospitals in Europe and the United States. The best performance achieved was a 61\% Dice score, and the best classification performance was about 80\% balanced accuracy, with a demonstrated ability to generalize across hospitals. In addition, the segmentation performance of the best models was on par with human expert raters. The predicted segmentations can be used to accurately classify the patients into those with residual tumor, and those with gross total resection.
We address a classical problem in statistics: adding two-way interaction terms to a regression model. As the covariate dimension increases quadratically, we develop an estimator that adapts well to this increase, while providing accurate estimates and appropriate inference. Existing strategies overcome the dimensionality problem by only allowing interactions between relevant main effects. Building on this philosophy, we implement a softer link between the two types of effects using a local shrinkage model. We empirically show that borrowing strength between the amount of shrinkage for main effects and their interactions can strongly improve estimation of the regression coefficients. Moreover, we evaluate the potential of the model for inference, which is notoriously hard for selection strategies. Large-scale cohort data are used to provide realistic illustrations and evaluations. Comparisons with other methods are provided. The evaluation of variable importance is not trivial in regression models with many interaction terms. Therefore, we derive a new analytical formula for the Shapley value, which enables rapid assessment of individual-specific variable importance scores and their uncertainties. Finally, while not targeting for prediction, we do show that our models can be very competitive to a more advanced machine learner, like random forest, even for fairly large sample sizes. The implementation of our method in RStan is fairly straightforward, allowing for adjustments to specific needs.
Medical prediction applications often need to deal with small sample sizes compared to the number of covariates. Such data pose problems for prediction and variable selection, especially when the covariate-response relationship is complicated. To address these challenges, we propose to incorporate co-data, i.e. external information on the covariates, into Bayesian additive regression trees (BART), a sum-of-trees prediction model that utilizes priors on the tree parameters to prevent overfitting. To incorporate co-data, an empirical Bayes (EB) framework is developed that estimates, assisted by a co-data model, prior covariate weights in the BART model. The proposed method can handle multiple types of co-data simultaneously. Furthermore, the proposed EB framework enables the estimation of the other hyperparameters of BART as well, rendering an appealing alternative to cross-validation. We show that the method finds relevant covariates and that it improves prediction compared to default BART in simulations. If the covariate-response relationship is nonlinear, the method benefits from the flexibility of BART to outperform regression-based co-data learners. Finally, the use of co-data enhances prediction in an application to diffuse large B-cell lymphoma prognosis based on clinical covariates, gene mutations, DNA translocations, and DNA copy number data. Keywords: Bayesian additive regression trees; Empirical Bayes; Co-data; High-dimensional data; Omics; Prediction
Prediction models are used amongst others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: e.g., an individual may be estimated to be at low risk because similar individuals in the past received an intervention which lowered their risk. Therefore, prediction models supporting decisions should target risks belonging to defined intervention strategies. Previous works on prediction under interventions assumed that the prediction model was used only at one time point to make an intervention decision. In clinical practice, intervention decisions are rarely made only once: they might be repeated, deferred and re-evaluated. This requires estimated risks under interventions that can be reconsidered at several potential decision moments. In the current work, we highlight key considerations for formulating estimands in sequential prediction under interventions that can inform such intervention decisions. We illustrate these considerations by giving examples of estimands for a case study about choosing between vaginal delivery and cesarean section for women giving birth. Our formalization of prediction tasks in a sequential, causal, and estimand context provides guidance for future studies to ensure that the right question is answered and appropriate causal estimation approaches are chosen to develop sequential prediction models that can inform intervention decisions.
For high-dimensional linear regression models, we review and compare several estimators of variances τ2\tau^2 and σ2\sigma^2 of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty λ\lambda and heritability index h2h^2, often used in genetics. Direct and indirect estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), are also discussed. The comparisons include several cases of covariate matrix Xn×p\mathbf{X}_{n \times p}, with pnp \gg n, such as multi-collinear covariates and data-derived ones. In addition, we study robustness against departures from the model such as sparse instead of dense effects and non-Gaussian errors. An example on weight gain data with genomic covariates confirms the good performance of MML compared to CV. Several extensions are presented. First, to the high-dimensional linear mixed effects model, with REML as an alternative to MML. Second, to the conjugate Bayesian setting, which proves to be a good alternative. Third, and most prominently, to generalized linear models for which we derive a computationally efficient MML estimator by re-writing the marginal likelihood as an nn-dimensional integral. For Poisson and Binomial ridge regression, we demonstrate the superior accuracy of the resulting MML estimator of λ\lambda as compared to CV. Software is provided to enable reproduction of all results presented here.
Cancer prognosis is often based on a set of omics covariates and a set of established clinical covariates such as age and tumor stage. Combining these two sets poses challenges. First, dimension difference: clinical covariates should be favored because they are low-dimensional and usually have stronger prognostic ability than high-dimensional omics covariates. Second, interactions: genetic profiles and their prognostic effects may vary across patient subpopulations. Last, redundancy: a (set of) gene(s) may encode similar prognostic information as a clinical covariate. To address these challenges, we combine regression trees, employing clinical covariates only, with a fusion-like penalized regression framework in the leaf nodes for the omics covariates. The fusion penalty controls the variability in genetic profiles across subpopulations. We prove that the shrinkage limit of the proposed method equals a benchmark model: a ridge regression with penalized omics covariates and unpenalized clinical covariates. Furthermore, the proposed method allows researchers to evaluate, for different subpopulations, whether the overall omics effect enhances prognosis compared to only employing clinical covariates. In an application to colorectal cancer prognosis based on established clinical covariates and 20,000+ gene expressions, we illustrate the features of our method.
We present BayesPIM, a Bayesian prevalence-incidence mixture model for estimating time- and covariate-dependent disease incidence from screening and surveillance data. The method is particularly suited to settings where some individuals may have the disease at baseline, baseline tests may be missing or incomplete, and the screening test has imperfect test sensitivity. This setting was present in data from high-risk colorectal cancer (CRC) surveillance through colonoscopy, where adenomas, precursors of CRC, were already present at baseline and remained undetected due to imperfect test sensitivity. By including covariates, the model can quantify heterogeneity in disease risk, thereby informing personalized screening strategies. Internally, BayesPIM uses a Metropolis-within-Gibbs sampler with data augmentation and weakly informative priors on the incidence and prevalence model parameters. In simulations based on the real-world CRC surveillance data, we show that BayesPIM estimates model parameters without bias while handling latent prevalence and imperfect test sensitivity. However, informative priors on the test sensitivity are needed to stabilize estimation and mitigate non-convergence issues. We also show how conditioning incidence and prevalence estimates on covariates explains heterogeneity in adenoma risk and how model fit is assessed using information criteria and a non-parametric estimator.
Machine learning models used in medical applications often face challenges due to the covariate shift, which occurs when there are discrepancies between the distributions of training and target data. This can lead to decreased predictive accuracy, especially with unknown outcomes in the target data. This paper introduces a semi-supervised classification and regression tree (CART) that uses importance weighting to address these distribution discrepancies. Our method improves the predictive performance of the CART model by assigning greater weights to training samples that more accurately represent the target distribution, especially in cases of covariate shift without target outcomes. In addition to CART, we extend this weighted approach to generalized linear model trees and tree ensembles, creating a versatile framework for managing the covariate shift in complex datasets. Through simulation studies and applications to real-world medical data, we demonstrate significant improvements in predictive accuracy. These findings suggest that our weighted approach can enhance reliability in medical applications and other fields where the covariate shift poses challenges to model performance across various data distributions.
06 Mar 2025
4D Flow MRI is the state of the art technique for measuring blood flow, and it provides valuable information for inverse problems in the cardiovascular system. However, 4D Flow MRI has a very long acquisition time, straining healthcare resources and inconveniencing patients. Due to this, usually only a part of the frequency space is acquired, where then further assumptions need to be made in order to obtain an image. Inverse problems from 4D Flow MRI data have the potential to compute clinically relevant quantities without the need for invasive procedures, and/or expanding the set of biomarkers for a more accurate diagnosis. However, reconstructing MRI measurements with Compressed Sensing techniques introduces artifacts and inaccuracies, which can compromise the results of the inverse problems. Additionally, there is a high number of different sampling patterns available, and it is often unclear which of them is preferable. Here, we present a parameter estimation problem directly using highly undersampled frequency space measurements. This problem is numerically solved by a Reduced-Order Unscented Kalman Filter (ROUKF). We show that this results in more accurate parameter estimation for boundary conditions in a synthetic aortic blood flow than using measurements reconstructed with Compressed Sensing. We also compare different sampling patterns, demonstrating how the quality of the parameter estimation depends on the choice of the sampling pattern. The results show a considerably higher accuracy than an inverse problem using velocity measurements reconstructed via compressed sensing. Finally, we confirm these findings on real MRI data from a mechanical phantom.
Brachytherapy involves bringing a radioactive source near tumor tissue using implanted needles. Image-guided brachytherapy planning requires amongst others, the reconstruction of the needles. Manually annotating these needles on patient images can be a challenging and time-consuming task for medical professionals. For automatic needle reconstruction, a two-stage pipeline is commonly adopted, comprising a segmentation stage followed by a post-processing stage. While deep learning models are effective for segmentation, their results often contain errors. No currently existing post-processing technique is robust to all possible segmentation errors. We therefore propose adaptations to existing post-processing techniques mainly aimed at dealing with segmentation errors and thereby improving the reconstruction accuracy. Experiments on a prostate cancer dataset, based on MRI scans annotated by medical professionals, demonstrate that our proposed adaptations can help to effectively manage segmentation errors, with the best adapted post-processing technique achieving median needle-tip and needle-bottom point localization errors of 1.071.07 (IQR ±1.04\pm 1.04) mm and 0.430.43 (IQR ±0.46\pm 0.46) mm, respectively, and median shaft error of 0.750.75 (IQR ±0.69\pm 0.69) mm with 0 false positive and 0 false negative needles on a test set of 261 needles.
Magnetic resonance (MR) imaging is essential for evaluating central nervous system (CNS) tumors, guiding surgical planning, treatment decisions, and assessing postoperative outcomes and complication risks. While recent work has advanced automated tumor segmentation and report generation, most efforts have focused on preoperative data, with limited attention to postoperative imaging analysis. This study introduces a comprehensive pipeline for standardized postsurtical reporting in CNS tumors. Using the Attention U-Net architecture, segmentation models were trained for the preoperative (non-enhancing) tumor core, postoperative contrast-enhancing residual tumor, and resection cavity. Additionally, MR sequence classification and tumor type identification for contrast-enhancing lesions were explored using the DenseNet architecture. The models were integrated into a reporting pipeline, following the RANO 2.0 guidelines. Training was conducted on multicentric datasets comprising 2000 to 7000 patients, using a 5-fold cross-validation. Evaluation included patient-, voxel-, and object-wise metrics, with benchmarking against the latest BraTS challenge results. The segmentation models achieved average voxel-wise Dice scores of 87%, 66%, 70%, and 77% for the tumor core, non-enhancing tumor core, contrast-enhancing residual tumor, and resection cavity, respectively. Classification models reached 99.5% balanced accuracy in MR sequence classification and 80% in tumor type classification. The pipeline presented in this study enables robust, automated segmentation, MR sequence classification, and standardized report generation aligned with RANO 2.0 guidelines, enhancing postoperative evaluation and clinical decision-making. The proposed models and methods were integrated into Raidionics, open-source software platform for CNS tumor analysis, now including a dedicated module for postsurgical analysis.
The high dimensional nature of genomics data complicates feature selection, in particular in low sample size studies - not uncommon in clinical prediction settings. It is widely recognized that complementary data on the features, `co-data', may improve results. Examples are prior feature groups or p-values from a related study. Such co-data are ubiquitous in genomics settings due to the availability of public repositories. Yet, the uptake of learning methods that structurally use such co-data is limited. We review guided adaptive shrinkage methods: a class of regression-based learners that use co-data to adapt the shrinkage parameters, crucial for the performance of those learners. We discuss technical aspects, but also the applicability in terms of types of co-data that can be handled. This class of methods is contrasted with several others. In particular, group-adaptive shrinkage is compared with the better-known sparse group-lasso by evaluating feature selection. Finally, we demonstrate the versatility of the guided shrinkage methodology by showing how to `do-it-yourself': we integrate implementations of a co-data learner and the spike-and-slab prior for the purpose of improving feature selection in genetics studies.
There are no more papers matching your filters at the moment.