Smart Data Analysis and Statistics

Publications

Filter

Topic

History

Showing 10 of 21 publications

Measuring the performance of survival models to personalize treatment choices

Various statistical and machine learning algorithms can be used to predict treatment effects at the patient level using data from randomized clinical trials (RCTs). Such predictions can facilitate individualized treatment decisions. Recently, a range of methods and metrics were developed for assessing the accuracy of such predictions. Here, we extend these methods, focusing on the case of survival (time-to-event) outcomes. We start by providing alternative definitions of the participant-level treatment benefit; subsequently, we summarize existing and propose new measures for assessing the performance of models estimating participant-level treatment benefits. We explore metrics assessing discrimination and calibration for benefit and decision accuracy. These measures can be used to assess the performance of statistical as well as machine learning models and can be useful during model development (i.e., for model selection or for internal validation) or when testing a model in new settings (i.e., in an external validation). We illustrate methods using simulated data and real data from the OPERAM trial, an RCT in multimorbid older people, which randomized participants to either standard care or a pharmacotherapy optimization intervention. We provide R codes for implementing all models and measures.

Journal: Stat Med |

Year: 2025

The potential benefit of statin prescription based on prediction of treatment responsiveness in older individuals: An application to the PROSPER randomised controlled trial

Aims: Clinical guidelines often recommend treating individuals based on their cardiovascular risk. We revisit this paradigm and quantify the efficacy of three treatment strategies: (i) overall prescription, i.e. treatment to all individuals sharing the eligibility criteria of a trial; (ii) risk-stratified prescription, i.e. treatment only to those at an elevated outcome risk; and (iii) prescription based on predicted treatment responsiveness.

Methods and results: We reanalysed the PROSPER randomized controlled trial, which included individuals aged 70–82 years with a history of, or risk factors for, vascular diseases. We conducted the derivation and internal–external validation of a model predicting treatment responsiveness. We compared with placebo (n = 2913): (i) pravastatin (n = 2891); (ii) pravastatin in the presence of previous vascular diseases and placebo in the absence thereof (n = 2925); and (iii) pravastatin in the presence of a favourable prediction of treatment response and placebo in the absence thereof (n = 2890). We found an absolute difference in primary outcome events composed of coronary death, non-fatal myocardial infarction, and fatal or non-fatal stroke, per 10 000 person-years equal to: −78 events (95% CI, −144 to −12) when prescribing pravastatin to all participants; −66 events (95% CI, −114 to −18) when treating only individuals with an elevated vascular risk; and −103 events (95% CI, −162 to −44) when restricting pravastatin to individuals with a favourable prediction of treatment response.

Conclusion: Pravastatin prescription based on predicted responsiveness may have an encouraging potential for cardiovascular prevention. Further external validation of our results and clinical experiments are needed.

Journal: Eur J Prev Cardiol |

Year: 2023

Developing clinical prediction models: a step-by-step guide

Predicting future outcomes of patients is essential to clinical practice, with many prediction models published each year. Empirical evidence suggests that published studies often have severe methodological limitations, which undermine their usefulness. This article presents a step-by-step guide to help researchers develop and evaluate a clinical prediction model. The guide covers best practices in defining the aim and users, selecting data sources, addressing missing data, exploring alternative modelling options, and assessing model performance. The steps are illustrated using an example from relapsing-remitting multiple sclerosis. Comprehensive R code is also provided.

Journal: BMJ |

Year: 2024

Evaluating individualized treatment effect predictions: A model‐based perspective on discrimination and calibration assessment

In recent years, there has been a growing interest in the prediction of individualized treatment effects. While there is a rapidly growing literature on the development of such models, there is little literature on the evaluation of their performance. In this paper, we aim to facilitate the validation of prediction models for individualized treatment effects. The estimands of interest are defined based on the potential outcomes framework, which facilitates a comparison of existing and novel measures. In particular, we examine existing measures of discrimination for benefit (variations of the c-for-benefit), and propose model-based extensions to the treatment effect setting for discrimination and calibration metrics that have a strong basis in outcome risk prediction. The main focus is on randomized trial data with binary endpoints and on models that provide individualized treatment effect predictions and potential outcome predictions. We use simulated data to provide insight into the characteristics of the examined discrimination and calibration statistics under consideration, and further illustrate all methods in a trial of acute ischemic stroke treatment. The results show that the proposed model-based statistics had the best characteristics in terms of bias and accuracy. While resampling methods adjusted for the optimism of performance estimates in the development data, they had a high variance across replications that limited their accuracy. Therefore, individualized treatment effect models are best validated in independent data. To aid implementation, a software implementation of the proposed methods was made available in R.

Journal: Stat Med |

Year: 2024

Predicting personalised absolute treatment effects in individual participant data meta-analysis: an introduction to splines

Background: Modelling non-linear associations between an outcome and continuous patient characteristics, whilst investigating heterogeneous treatment effects, is one of the opportunities offered by individual participant data meta-analysis (IPD-MA). Splines offer great flexibility, but guidance is lacking.

Objective: To introduce modelling of nonlinear associations using restricted cubic splines (RCS), natural B-splines, P-splines, and smoothing splines in IPD-MA to estimate absolute treatment effects.

Methods: We describe the pooling of spline-based models using pointwise and multivariate meta-analysis (two-stage methods) and one-stage generalised additive mixed effects models (GAMMs). We illustrate their performance on three IPD-MA scenarios of five studies each: one where only the associations differ across studies, one where only the ranges of the effect modifier differ and one where both differ. We also evaluated the approaches in an empirical example, modelling the risk of fever and/or ear pain in children with acute otitis media conditional on age.

Results: In the first scenario, all pooling methods showed similar results. In the second and third scenario, pointwise meta-analysis was flexible but showed non-smooth results and wide confidence intervals; multivariate meta-analysis failed to converge with RCS, but was efficient with natural B-splines. GAMMs produced smooth pooled regression curves in all settings. In the empirical example, results were similar to the second and third scenario, except for multivariate meta-analysis with RCS, which now converged.

Conclusion: We provide guidance on the use of splines in IPD-MA, to capture heterogeneous treatment effects in presence of non-linear associations, thereby facilitating estimation of absolute treatment effects to enhance personalized healthcare.

Journal: Res Synth Methods |

Year: 2022

Citation: 4

A few things to consider when deciding whether or not to conduct underpowered research

Hernán, using a hypothetical example, argues that policies that prevent researchers from conducting underpowered observational studies using existing databases are misguided explaining that "[w]hen a causal question is important, it is preferable to have multiple studies with imprecise estimates than having no study at all." While we do not disagree with the sentiment expressed, caution is warranted. Small observational studies are a major cause of distrust in science, mainly because their results are often selectively reported. The hypothetical example used to justify Hernán's position is too simplistic and overly optimistic. In this short response, we reconsider Hernán's hypothetical example and offer a list of other factors - beyond simply the importance of the question - that are relevant when deciding whether or not to pursue underpowered research.

Journal: J Clin Epidemiol |

Year: 2021

Minimum sample size for external validation of a clinical prediction model with a binary outcome

In prediction model research, external validation is needed to examine an exist-ing model's performance using data independent to that for model development. Current external validation studies often suffer from small sample sizes and consequently imprecise predictive performance estimates. To address this, we propose how to determine the minimum sample size needed for a new external validation study of a prediction model for a binary outcome. Our calculations aim to precisely estimate calibration (Observed/Expected and calibration slope),discrimination (C-statistic), and clinical utility (net benefit). For each measure, we propose closed-form and iterative solutions for calculating the minimum sample size required. These require specifying: (i) target SEs (confidence interval widths) for each estimate of interest, (ii) the anticipated outcome event proportion in the validation population, (iii) the prediction model's anticipated (mis)calibration and variance of linear predictor values in the validation population, and (iv) potential risk thresholds for clinical decision-making. The calculations can also be used to inform whether the sample size of an existing (already collected) dataset is adequate for external validation. We illustrate our proposal for external validation of a prediction model for mechanical heart valve failure with an expected outcome event proportion of 0.018. Calculations suggest at least 9835 participants (177 events) are required to precisely estimate thecalibration and discrimination measures, with this number driven by the calibration slope criterion, which we anticipate will often be the case. Also, 6443 participants (116 events) are required to precisely estimate net benefit at a risk threshold of 8%. Software code is provided.

Journal: Stat Med |

Year: 2021

Citation: 110

Developing more generalizable prediction models from pooled studies and large clustered data sets

Prediction models often yield inaccurate predictions for new individuals. Large data sets from pooled studies or electronic healthcare records may alleviate this with an increased sample size and variability in sample characteristics. However, existing strategies for prediction model development generally do not account for heterogeneity in predictor-outcome associations between different settings and populations. This limits the generalizability of developed models (even from large, combined, clustered data sets) and necessitates local revisions. We aim to develop methodology for producing prediction models that require less tailoring to different settings and populations. We adopt internal-external cross-validation to assess and reduce heterogeneity in models' predictive performance during the development. We propose a predictor selection algorithm that optimizes the (weighted) average performance while minimizing its variability across the hold-out clusters (or studies). Predictors are added iteratively until the estimated generalizability is optimized. We illustrate this by developing a model for predicting the risk of atrial fibrillation and updating an existing one for diagnosing deep vein thrombosis, using individual participant data from 20 cohorts (N = 10 873) and 11 diagnostic studies (N = 10 014), respectively. Meta-analysis of calibration and discrimination performance in each hold-out cluster shows that trade-offs between average and heterogeneity of performance occurred. Our methodology enables the assessment of heterogeneity of prediction model performance during model development in multiple or clustered data sets, thereby informing researchers on predictor selection to improve the generalizability to different settings and populations, and reduce the need for model tailoring. Our methodology has been implemented in the R package metamisc.

Journal: Stat Med |

Year: 2021

Citation: 17

Individual participant data meta-analysis for external validation, recalibration and updating of a flexible parametric prognostic model

Individual participant data (IPD) from multiple sources allows external validation of a prognostic model across multiple populations. Often this reveals poor calibration, potentially causing poor predictive performance in some populations. However, rather than discarding the model outright, it may be possible to modify the model to improve performance using recalibration techniques. We use IPD meta-analysis to identify the simplest method to achieve good model performance. We examine four options for recalibrating an existing time-to-event model across multiple populations: (i) shifting the baseline hazard by a constant, (ii) re-estimating the shape of the baseline hazard, (iii) adjusting the prognostic index as a whole, and (iv) adjusting individual predictor effects. For each strategy, IPD meta-analysis examines (heterogeneity in) model performance across populations. Additionally, the probability of achieving good performance in a new population can be calculated allowing ranking of recalibration methods. In an applied example, IPD meta-analysis reveals that the existing model had poor calibration in some populations, and large heterogeneity across populations. However, re-estimation of the intercept substantially improved the expected calibration in new populations, and reduced between-population heterogeneity. Comparing recalibration strategies showed that re-estimating both the magnitude and shape of the baseline hazard gave the highest predicted probability of good performance in a new population. In conclusion, IPD meta-analysis allows a prognostic model to be externally validated in multiple settings, and enables recalibration strategies to be compared and ranked to decide on the least aggressive recalibration strategy to achieve acceptable external model performance without discarding existing model information.

Journal: Stat Med |

Year: 2021

Citation: 9

Real-time imputation of missing predictor values in clinical practice

Aims: Use of prediction models is widely recommended by clinical guidelines, but usually requires complete information on all predictors, which is not always available in daily practice. We aim to describe two methods for real-time handling of missing predictor values when using prediction models in practice.

Methods and results: We compare the widely used method of mean imputation (M-imp) to a method that personalizes the imputations by taking advantage of the observed patient characteristics. These characteristics may include both prediction model variables and other characteristics (auxiliary variables). The method was implemented using imputation from a joint multivariate normal model of the patient characteristics (joint modelling imputation; JMI). Data from two different cardiovascular cohorts with cardiovascular predictors and outcome were used to evaluate the real-time imputation methods. We quantified the prediction model's overall performance [mean squared error (MSE) of linear predictor], discrimination (c-index), calibration (intercept and slope), and net benefit (decision curve analysis). When compared with mean imputation, JMI substantially improved the MSE (0.10 vs. 0.13), c-index (0.70 vs. 0.68), and calibration (calibration-in-the-large: 0.04 vs. 0.06; calibration slope: 1.01 vs. 0.92), especially when incorporating auxiliary variables. When the imputation method was based on an external cohort, calibration deteriorated, but discrimination remained similar.

Conclusions: We recommend JMI with auxiliary variables for real-time imputation of missing values, and to update imputation models when implementing them in new settings or (sub)populations.

Journal: EHJ Digital Health |

Year: 2020

Citation: 7

Books

Systematic Reviews in Health Research

Individual Participant Data Meta-Analysis

Handbook of Meta-Analysis

Prognosis Research in Healthcare