Smart Data Analysis and Statistics

Publications

Filter

Topic

History

Showing 8 of 8 publications

Applying the Principal Stratum Strategy in Equivalence Trials: A Case Study

The estimand framework, introduced in the ICH E9 (R1) Addendum, provides a structured approach for defining precise research questions in randomised clinical trials. It suggests five strategies for addressing intercurrent events (ICE). This case study examines the principal stratum strategy, highlighting its potential for estimating causal treatment effects in specific subpopulations and the challenges involved. The occurrence of anti-drug antibodies (ADAs) and their potential clinical impact are important factors in evaluating biosimilars. Typically, analyses focus on subgroups of patients who develop ADAs during the study. However, conducting subgroup analyses based on post-randomisation variables, such as immunogenicity, can introduce substantial bias into treatment effect estimates and is therefore methodologically not optimal. The principal stratum strategy provides a statistical pathway for estimating treatment effects in subpopulations that cannot be anticipated at baseline. By leveraging counterfactuals to assess treatment outcomes, with and without the incidence of intercurrent events (ICEs), this approach can be implemented through a missing data perspective. We demonstrate the implementation of the principal stratum strategy in a phase 3 equivalence trial of a biosimilar for the treatment of rheumatoid arthritis. Using a multiple imputation approach, we leverage longitudinal measurements to create analysis datasets for subpopulations who develop ADAs as ICE. Our results highlight the principal stratum strategy's potential and challenges, emphasising its reliance on unobserved ICE states and the need for complex and rigorous modelling. This study contributes to a nuanced understanding and practical implementation of the principal stratum strategy within the ICH E9 (R1) framework.

Journal: Pharmaceutical Statistics |

Year: 2025

The use of imputation in clinical decision support systems: a cardiovascular risk management pilot vignette study among clinicians

Introduction: A major challenge of the use of prediction models in clinical care is missing data. Real-time imputation may alleviate this. However, to what extent clinicians accept this solution remains unknown. We aimed to assess acceptance of real-time imputation for missing patient data in a clinical decision support system (CDSS) including 10-year cardiovascular absolute risk for the individual patient.

Methods: We performed a vignette study extending an existing CDSS with the real-time imputation method Joint Modelling Imputation (JMI). We included 17 clinicians to use the CDSS with three different vignettes, describing potential use cases (missing data, no risk estimate; imputed values, risk estimate based on imputed data; complete information). In each vignette missing data was introduced to mimic a situation as could occur in clinical practice. Acceptance of end-users was assessed on three different axes: clinical realism, comfortableness and added clinical value.

Results: Overall, the imputed predictor values were found to be clinically reasonable and according to the expectations. However, for binary variables, use of a probability scale to express uncertainty was deemed inconvenient. The perceived comfortableness with imputed risk prediction was low and confidence intervals were deemed too wide for reliable decision making. The clinicians acknowledged added value for using JMI in clinical practice when used for educational, research or informative purposes.

Conclusion: Handling missing data in CDSS via JMI is useful, but more accurate imputations are needed to generate comfort in clinicians for use in routine care. Only then CDSS can create clinical value by improving decision making.

Journal: EHJ Digital Health |

Year: 2024

Dealing with missing data using the Heckman selection model: methods primer for epidemiologists

Journal: Int. J. Epidemiol. |

Year: 2023

Citation: 1

Predicting personalised absolute treatment effects in individual participant data meta-analysis: an introduction to splines

Background: Modelling non-linear associations between an outcome and continuous patient characteristics, whilst investigating heterogeneous treatment effects, is one of the opportunities offered by individual participant data meta-analysis (IPD-MA). Splines offer great flexibility, but guidance is lacking.

Objective: To introduce modelling of nonlinear associations using restricted cubic splines (RCS), natural B-splines, P-splines, and smoothing splines in IPD-MA to estimate absolute treatment effects.

Methods: We describe the pooling of spline-based models using pointwise and multivariate meta-analysis (two-stage methods) and one-stage generalised additive mixed effects models (GAMMs). We illustrate their performance on three IPD-MA scenarios of five studies each: one where only the associations differ across studies, one where only the ranges of the effect modifier differ and one where both differ. We also evaluated the approaches in an empirical example, modelling the risk of fever and/or ear pain in children with acute otitis media conditional on age.

Results: In the first scenario, all pooling methods showed similar results. In the second and third scenario, pointwise meta-analysis was flexible but showed non-smooth results and wide confidence intervals; multivariate meta-analysis failed to converge with RCS, but was efficient with natural B-splines. GAMMs produced smooth pooled regression curves in all settings. In the empirical example, results were similar to the second and third scenario, except for multivariate meta-analysis with RCS, which now converged.

Conclusion: We provide guidance on the use of splines in IPD-MA, to capture heterogeneous treatment effects in presence of non-linear associations, thereby facilitating estimation of absolute treatment effects to enhance personalized healthcare.

Journal: Res Synth Methods |

Year: 2022

Citation: 4

Individual participant data meta-analysis for external validation, recalibration and updating of a flexible parametric prognostic model

Individual participant data (IPD) from multiple sources allows external validation of a prognostic model across multiple populations. Often this reveals poor calibration, potentially causing poor predictive performance in some populations. However, rather than discarding the model outright, it may be possible to modify the model to improve performance using recalibration techniques. We use IPD meta-analysis to identify the simplest method to achieve good model performance. We examine four options for recalibrating an existing time-to-event model across multiple populations: (i) shifting the baseline hazard by a constant, (ii) re-estimating the shape of the baseline hazard, (iii) adjusting the prognostic index as a whole, and (iv) adjusting individual predictor effects. For each strategy, IPD meta-analysis examines (heterogeneity in) model performance across populations. Additionally, the probability of achieving good performance in a new population can be calculated allowing ranking of recalibration methods. In an applied example, IPD meta-analysis reveals that the existing model had poor calibration in some populations, and large heterogeneity across populations. However, re-estimation of the intercept substantially improved the expected calibration in new populations, and reduced between-population heterogeneity. Comparing recalibration strategies showed that re-estimating both the magnitude and shape of the baseline hazard gave the highest predicted probability of good performance in a new population. In conclusion, IPD meta-analysis allows a prognostic model to be externally validated in multiple settings, and enables recalibration strategies to be compared and ranked to decide on the least aggressive recalibration strategy to achieve acceptable external model performance without discarding existing model information.

Journal: Stat Med |

Year: 2021

Citation: 9

Real-time imputation of missing predictor values in clinical practice

Aims: Use of prediction models is widely recommended by clinical guidelines, but usually requires complete information on all predictors, which is not always available in daily practice. We aim to describe two methods for real-time handling of missing predictor values when using prediction models in practice.

Methods and results: We compare the widely used method of mean imputation (M-imp) to a method that personalizes the imputations by taking advantage of the observed patient characteristics. These characteristics may include both prediction model variables and other characteristics (auxiliary variables). The method was implemented using imputation from a joint multivariate normal model of the patient characteristics (joint modelling imputation; JMI). Data from two different cardiovascular cohorts with cardiovascular predictors and outcome were used to evaluate the real-time imputation methods. We quantified the prediction model's overall performance [mean squared error (MSE) of linear predictor], discrimination (c-index), calibration (intercept and slope), and net benefit (decision curve analysis). When compared with mean imputation, JMI substantially improved the MSE (0.10 vs. 0.13), c-index (0.70 vs. 0.68), and calibration (calibration-in-the-large: 0.04 vs. 0.06; calibration slope: 1.01 vs. 0.92), especially when incorporating auxiliary variables. When the imputation method was based on an external cohort, calibration deteriorated, but discrimination remained similar.

Conclusions: We recommend JMI with auxiliary variables for real-time imputation of missing values, and to update imputation models when implementing them in new settings or (sub)populations.

Journal: EHJ Digital Health |

Year: 2020

Citation: 7

Handling missing predictor values when validating and applying a prediction model to new patients

Missing data present challenges for development and real-world application of clinical prediction models. While these challenges have received considerable attention in the development setting, there is only sparse research on the handling of missing data in applied settings. The main unique feature of handling missing data in these settings is that missing data methods have to be performed for a single new individual, precluding direct application of mainstay methods used during model development. Correspondingly, we propose that it is desirable to perform model validation using missing data methods that transfer to practice in single new patients. This article compares existing and new methods to account for missing data for a new individual in the context of prediction. These methods are based on (i) submodels based on observed data only, (ii) marginalization over the missing variables, or (iii) imputation based on fully conditional specification (also known as chained equations). They were compared in an internal validation setting to highlight the use of missing data methods that transfer to practice while validating a model. As a reference, they were compared to the use of multiple imputation by chained equations in a set of test patients, because this has been used in validation studies in the past. The methods were evaluated in a simulation study where performance was measured by means of optimism corrected C-statistic and mean squared prediction error. Furthermore, they were applied in data from a large Dutch cohort of prophylactic implantable cardioverter defibrillator patients.

Journal: Stat Med |

Year: 2020

Citation: 25

Multiple imputation for multilevel data with continuous and binary variables

We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show why these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset gathers a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.

Journal: Stat Sci |

Year: 2018

Citation: 76

Books

Systematic Reviews in Health Research

Individual Participant Data Meta-Analysis

Handbook of Meta-Analysis

Prognosis Research in Healthcare