Welcome to our research page featuring recent publications in the field of biostatistics and epidemiology! These fields play a crucial role in advancing our understanding of the causes, prevention, and treatment of various health conditions. Our team is dedicated to advancing the field through innovative studies and cutting-edge statistical analyses. On this page, you will find our collection of research publications describing the development of new statistical methods and their application to real-world data. Please feel free to contact us with any questions or comments.
Filter
Topic
Showing 1 of 10 publications
Predicting future outcomes of patients is essential to clinical practice, with many prediction models published each year. Empirical evidence suggests that published studies often have severe methodological limitations, which undermine their usefulness. This article presents a step-by-step guide to help researchers develop and evaluate a clinical prediction model. The guide covers best practices in defining the aim and users, selecting data sources, addressing missing data, exploring alternative modelling options, and assessing model performance. The steps are illustrated using an example from relapsing-remitting multiple sclerosis. Comprehensive R code is also provided.
In recent years, there has been a growing interest in the prediction of individualized treatment effects. While there is a rapidly growing literature on the development of such models, there is little literature on the evaluation of their performance. In this paper, we aim to facilitate the validation of prediction models for individualized treatment effects. The estimands of interest are defined based on the potential outcomes framework, which facilitates a comparison of existing and novel measures. In particular, we examine existing measures of discrimination for benefit (variations of the c-for-benefit), and propose model-based extensions to the treatment effect setting for discrimination and calibration metrics that have a strong basis in outcome risk prediction. The main focus is on randomized trial data with binary endpoints and on models that provide individualized treatment effect predictions and potential outcome predictions. We use simulated data to provide insight into the characteristics of the examined discrimination and calibration statistics under consideration, and further illustrate all methods in a trial of acute ischemic stroke treatment. The results show that the proposed model-based statistics had the best characteristics in terms of bias and accuracy. While resampling methods adjusted for the optimism of performance estimates in the development data, they had a high variance across replications that limited their accuracy. Therefore, individualized treatment effect models are best validated in independent data. To aid implementation, a software implementation of the proposed methods was made available in R.
Background: Modelling non-linear associations between an outcome and continuous patient characteristics, whilst investigating heterogeneous treatment effects, is one of the opportunities offered by individual participant data meta-analysis (IPD-MA). Splines offer great flexibility, but guidance is lacking.
Objective: To introduce modelling of nonlinear associations using restricted cubic splines (RCS), natural B-splines, P-splines, and smoothing splines in IPD-MA to estimate absolute treatment effects.
Methods: We describe the pooling of spline-based models using pointwise and multivariate meta-analysis (two-stage methods) and one-stage generalised additive mixed effects models (GAMMs). We illustrate their performance on three IPD-MA scenarios of five studies each: one where only the associations differ across studies, one where only the ranges of the effect modifier differ and one where both differ. We also evaluated the approaches in an empirical example, modelling the risk of fever and/or ear pain in children with acute otitis media conditional on age.
Results: In the first scenario, all pooling methods showed similar results. In the second and third scenario, pointwise meta-analysis was flexible but showed non-smooth results and wide confidence intervals; multivariate meta-analysis failed to converge with RCS, but was efficient with natural B-splines. GAMMs produced smooth pooled regression curves in all settings. In the empirical example, results were similar to the second and third scenario, except for multivariate meta-analysis with RCS, which now converged.
Conclusion: We provide guidance on the use of splines in IPD-MA, to capture heterogeneous treatment effects in presence of non-linear associations, thereby facilitating estimation of absolute treatment effects to enhance personalized healthcare.
Hernán, using a hypothetical example, argues that policies that prevent researchers from conducting underpowered observational studies using existing databases are misguided explaining that "[w]hen a causal question is important, it is preferable to have multiple studies with imprecise estimates than having no study at all." While we do not disagree with the sentiment expressed, caution is warranted. Small observational studies are a major cause of distrust in science, mainly because their results are often selectively reported. The hypothetical example used to justify Hernán's position is too simplistic and overly optimistic. In this short response, we reconsider Hernán's hypothetical example and offer a list of other factors - beyond simply the importance of the question - that are relevant when deciding whether or not to pursue underpowered research.
In prediction model research, external validation is needed to examine an exist-ing model's performance using data independent to that for model development. Current external validation studies often suffer from small sample sizes and consequently imprecise predictive performance estimates. To address this, we propose how to determine the minimum sample size needed for a new external validation study of a prediction model for a binary outcome. Our calculations aim to precisely estimate calibration (Observed/Expected and calibration slope),discrimination (C-statistic), and clinical utility (net benefit). For each measure, we propose closed-form and iterative solutions for calculating the minimum sample size required. These require specifying: (i) target SEs (confidence interval widths) for each estimate of interest, (ii) the anticipated outcome event proportion in the validation population, (iii) the prediction model's anticipated (mis)calibration and variance of linear predictor values in the validation population, and (iv) potential risk thresholds for clinical decision-making. The calculations can also be used to inform whether the sample size of an existing (already collected) dataset is adequate for external validation. We illustrate our proposal for external validation of a prediction model for mechanical heart valve failure with an expected outcome event proportion of 0.018. Calculations suggest at least 9835 participants (177 events) are required to precisely estimate thecalibration and discrimination measures, with this number driven by the calibration slope criterion, which we anticipate will often be the case. Also, 6443 participants (116 events) are required to precisely estimate net benefit at a risk threshold of 8%. Software code is provided.
Prediction models often yield inaccurate predictions for new individuals. Large data sets from pooled studies or electronic healthcare records may alleviate this with an increased sample size and variability in sample characteristics. However, existing strategies for prediction model development generally do not account for heterogeneity in predictor-outcome associations between different settings and populations. This limits the generalizability of developed models (even from large, combined, clustered data sets) and necessitates local revisions. We aim to develop methodology for producing prediction models that require less tailoring to different settings and populations. We adopt internal-external cross-validation to assess and reduce heterogeneity in models' predictive performance during the development. We propose a predictor selection algorithm that optimizes the (weighted) average performance while minimizing its variability across the hold-out clusters (or studies). Predictors are added iteratively until the estimated generalizability is optimized. We illustrate this by developing a model for predicting the risk of atrial fibrillation and updating an existing one for diagnosing deep vein thrombosis, using individual participant data from 20 cohorts (N = 10 873) and 11 diagnostic studies (N = 10 014), respectively. Meta-analysis of calibration and discrimination performance in each hold-out cluster shows that trade-offs between average and heterogeneity of performance occurred. Our methodology enables the assessment of heterogeneity of prediction model performance during model development in multiple or clustered data sets, thereby informing researchers on predictor selection to improve the generalizability to different settings and populations, and reduce the need for model tailoring. Our methodology has been implemented in the R package metamisc.
Individual participant data (IPD) from multiple sources allows external validation of a prognostic model across multiple populations. Often this reveals poor calibration, potentially causing poor predictive performance in some populations. However, rather than discarding the model outright, it may be possible to modify the model to improve performance using recalibration techniques. We use IPD meta-analysis to identify the simplest method to achieve good model performance. We examine four options for recalibrating an existing time-to-event model across multiple populations: (i) shifting the baseline hazard by a constant, (ii) re-estimating the shape of the baseline hazard, (iii) adjusting the prognostic index as a whole, and (iv) adjusting individual predictor effects. For each strategy, IPD meta-analysis examines (heterogeneity in) model performance across populations. Additionally, the probability of achieving good performance in a new population can be calculated allowing ranking of recalibration methods. In an applied example, IPD meta-analysis reveals that the existing model had poor calibration in some populations, and large heterogeneity across populations. However, re-estimation of the intercept substantially improved the expected calibration in new populations, and reduced between-population heterogeneity. Comparing recalibration strategies showed that re-estimating both the magnitude and shape of the baseline hazard gave the highest predicted probability of good performance in a new population. In conclusion, IPD meta-analysis allows a prognostic model to be externally validated in multiple settings, and enables recalibration strategies to be compared and ranked to decide on the least aggressive recalibration strategy to achieve acceptable external model performance without discarding existing model information.
Aims: Use of prediction models is widely recommended by clinical guidelines, but usually requires complete information on all predictors, which is not always available in daily practice. We aim to describe two methods for real-time handling of missing predictor values when using prediction models in practice.
Methods and results: We compare the widely used method of mean imputation (M-imp) to a method that personalizes the imputations by taking advantage of the observed patient characteristics. These characteristics may include both prediction model variables and other characteristics (auxiliary variables). The method was implemented using imputation from a joint multivariate normal model of the patient characteristics (joint modelling imputation; JMI). Data from two different cardiovascular cohorts with cardiovascular predictors and outcome were used to evaluate the real-time imputation methods. We quantified the prediction model's overall performance [mean squared error (MSE) of linear predictor], discrimination (c-index), calibration (intercept and slope), and net benefit (decision curve analysis). When compared with mean imputation, JMI substantially improved the MSE (0.10 vs. 0.13), c-index (0.70 vs. 0.68), and calibration (calibration-in-the-large: 0.04 vs. 0.06; calibration slope: 1.01 vs. 0.92), especially when incorporating auxiliary variables. When the imputation method was based on an external cohort, calibration deteriorated, but discrimination remained similar.
Conclusions: We recommend JMI with auxiliary variables for real-time imputation of missing values, and to update imputation models when implementing them in new settings or (sub)populations.
This commentary describes the creation of the Zika Virus Individual Participant Data Consortium, a global collaboration to address outstanding questions in Zika virus (ZIKV) epidemiology through conducting an individual participant data meta-analysis (IPD-MA). The aims of the IPD-MA are to (1) estimate the absolute and relative risks of miscarriage, fetal loss, and short- and long-term sequelae of fetal exposure; (2) identify and quantify the relative importance of different sources of heterogeneity (e.g., immune profiles, concurrent flavivirus infection) for the risk of adverse fetal, infant, and child outcomes among infants exposed to ZIKV in utero; and (3) develop and validate a prognostic model for the early identification of high risk pregnancies and inform communication between health care providers and their patients and public health interventions (e.g., vector control strategies, antenatal care, and family planning programs). By leveraging data from a diversity of populations across the world, the IPD-MA will provide a more precise estimate of the risk of adverse ZIKV-related outcomes within clinically relevant subgroups and a quantitative assessment of the generalizability of these estimates across populations and settings. The ZIKV IPD Consortium effort is indicative of the growing recognition that data sharing is a central component of global health security and outbreak response.
Background: Clinical trials have traditionally been analysed at the aggregate level, assuming that the group average would be applicable to all eligible and similar patients. We re-analyzed a mega-trial of antidepressant therapy for major depression to explore whether a multivariable prediction model may lead to different treatment recommendations for individual participants.
Methods: The trial compared the second-line treatment strategies of continuing sertraline, combining it with mirtazapine or switching to mirtazapine after initial failure to remit on sertraline among 1,544 patients with major depression. The outcome was the Personal Health Questionnaire-9 (PHQ-9) at week 9: the original analyses showed that both combining and switching resulted in greater reduction in PHQ-9 by 1.0 point than continuing. We considered several models of penalized regression or machine learning.
Results: Models using support vector machines (SVMs) provided the best performance. Using SVMs, continuing sertraline was predicted to be the best treatment for 123 patients, combining for 696 patients, and switching for 725 patients. In the last two subgroups, both combining and switching were equally superior to continuing by 1.2 to 1.4 points, resulting in the same treatment recommendations as with the original aggregate data level analyses; in the first subgroup, however, switching was substantively inferior to combining (-3.1, 95%CI: -5.4 to -0.5).
Limitations: Stronger predictors are needed to make more precise predictions.
Conclusions: The multivariable prediction models led to improved recommendations for a minority of participants than the group average approach in a megatrial.