loader
publication

Innovation

Welcome to our research page featuring recent publications in the field of biostatistics and epidemiology! These fields play a crucial role in advancing our understanding of the causes, prevention, and treatment of various health conditions. Our team is dedicated to advancing the field through innovative studies and cutting-edge statistical analyses. On this page, you will find our collection of research publications describing the development of new statistical methods and their application to real-world data. Please feel free to contact us with any questions or comments.

Filter

Topic

History

Showing 1 of 7 publications

ISPE-endorsed guidance in using electronic health records for comparative effectiveness research in COVID-19: opportunities and trade-offs

As the scientific research community along with health care professionals and decision-makers around the world fight tirelessly against the COVID-19 pandemic, the need for comparative effectiveness research (CER) on preventive and therapeutic interventions for COVID-19 is immense. Randomized controlled trials markedly underrepresent the frail and complex patients seen in routine care, and they do not typically have data on long-term treatment effects. The increasing availability of electronic health records (EHRs) for clinical research offers the opportunity to generate timely real-world evidence reflective of routine care for optimal management of COVID-19. However, there are many potential threats to the validity of CER based on EHR data that are not originally generated for research purposes. To ensure unbiased and robust results, we need high-quality healthcare databases, rigorous study designs, and proper implementation of appropriate statistical methods. We aimed to describe opportunities and challenges in EHR-based CER for COVID-19-related questions and to introduce best practices in pharmacoepidemiology to minimize potential biases. We structured our discussion into the following topics: 1) Study population identification based on exposure status; 2) Ascertainment of outcomes; 3) Common biases and potential solutions; and 4) Data operational challenges specific to COVID-19 CER using EHR. We provide structured guidance for the proper conduct and appraisal of drug and vaccine effectiveness and safety research using EHR data for the pandemic. This manuscript is endorsed by the International Society for Pharmacoepidemiology (ISPE).

Journal: Clin Pharma and Therapeutics |
Year: 2022
Citation: 8
Minimum sample size for external validation of a clinical prediction model with a binary outcome

In prediction model research, external validation is needed to examine an exist-ing model's performance using data independent to that for model development. Current external validation studies often suffer from small sample sizes and consequently imprecise predictive performance estimates. To address this, we propose how to determine the minimum sample size needed for a new external validation study of a prediction model for a binary outcome. Our calculations aim to precisely estimate calibration (Observed/Expected and calibration slope),discrimination (C-statistic), and clinical utility (net benefit). For each measure, we propose closed-form and iterative solutions for calculating the minimum sample size required. These require specifying: (i) target SEs (confidence interval widths) for each estimate of interest, (ii) the anticipated outcome event proportion in the validation population, (iii) the prediction model's anticipated (mis)calibration and variance of linear predictor values in the validation population, and (iv) potential risk thresholds for clinical decision-making. The calculations can also be used to inform whether the sample size of an existing (already collected) dataset is adequate for external validation. We illustrate our proposal for external validation of a prediction model for mechanical heart valve failure with an expected outcome event proportion of 0.018. Calculations suggest at least 9835 participants (177 events) are required to precisely estimate thecalibration and discrimination measures, with this number driven by the calibration slope criterion, which we anticipate will often be the case. Also, 6443 participants (116 events) are required to precisely estimate net benefit at a risk threshold of 8%. Software code is provided.

Journal: Stat Med |
Year: 2021
Citation: 110
Developing more generalizable prediction models from pooled studies and large clustered data sets

Prediction models often yield inaccurate predictions for new individuals. Large data sets from pooled studies or electronic healthcare records may alleviate this with an increased sample size and variability in sample characteristics. However, existing strategies for prediction model development generally do not account for heterogeneity in predictor-outcome associations between different settings and populations. This limits the generalizability of developed models (even from large, combined, clustered data sets) and necessitates local revisions. We aim to develop methodology for producing prediction models that require less tailoring to different settings and populations. We adopt internal-external cross-validation to assess and reduce heterogeneity in models' predictive performance during the development. We propose a predictor selection algorithm that optimizes the (weighted) average performance while minimizing its variability across the hold-out clusters (or studies). Predictors are added iteratively until the estimated generalizability is optimized. We illustrate this by developing a model for predicting the risk of atrial fibrillation and updating an existing one for diagnosing deep vein thrombosis, using individual participant data from 20 cohorts (N = 10 873) and 11 diagnostic studies (N = 10 014), respectively. Meta-analysis of calibration and discrimination performance in each hold-out cluster shows that trade-offs between average and heterogeneity of performance occurred. Our methodology enables the assessment of heterogeneity of prediction model performance during model development in multiple or clustered data sets, thereby informing researchers on predictor selection to improve the generalizability to different settings and populations, and reduce the need for model tailoring. Our methodology has been implemented in the R package metamisc.

Journal: Stat Med |
Year: 2021
Citation: 17
Individual participant data meta-analysis of intervention studies with time-to-event outcomes: A review of the methodology and an applied example

Many randomized trials evaluate an intervention effect on time-to-event outcomes. Individual participant data (IPD) from such trials can be obtained and combined in a so-called IPD meta-analysis (IPD-MA), to summarize the overall intervention effect. We performed a narrative literature review to provide an overview of methods for conducting an IPD-MA of randomized intervention studies with a time-to-event outcome. We focused on identifying good methodological practice for modeling frailty of trial participants across trials, modeling heterogeneity of intervention effects, choosing appropriate association measures, dealing with (trial differences in) censoring and follow-up times, and addressing time-varying intervention effects and effect modification (interactions).

We discuss how to achieve this using parametric and semi-parametric methods, and describe how to implement these in a one-stage or two-stage IPD-MA framework. We recommend exploring heterogeneity of the effect(s) through interaction and non-linear effects. Random effects should be applied to account for residual heterogeneity of the intervention effect. We provide further recommendations, many of which specific to IPD-MA of time-to-event data from randomized trials examining an intervention effect.

We illustrate several key methods in a real IPD-MA, where IPD of 1225 participants from 5 randomized clinical trials were combined to compare the effects of Carbamazepine and Valproate on the incidence of epileptic seizures.

Journal: Res Synth Methods |
Year: 2019
Citation: 42
Statistical approaches to identify subgroups in meta-analysis of individual participant data: a simulation study

Background: Individual participant data meta-analysis (IPD-MA) is considered the gold standard for investigating subgroup effects. Frequently used regression-based approaches to detect subgroups in IPD-MA are: meta-regression, per-subgroup meta-analysis (PS-MA), meta-analysis of interaction terms (MA-IT), naive one-stage IPD-MA (ignoring potential study-level confounding), and centred one-stage IPD-MA (accounting for potential study-level confounding). Clear guidance on the analyses is lacking and clinical researchers may use approaches with suboptimal efficiency to investigate subgroup effects in an IPD setting. Therefore, our aim is to overview and compare the aforementioned methods, and provide recommendations over which should be preferred.

Methods: We conducted a simulation study where we generated IPD of randomised trials and varied the magnitude of subgroup effect (0, 25, 50%; relative reduction), between-study treatment effect heterogeneity (none, medium, large), ecological bias (none, quantitative, qualitative), sample size (50,100,200), and number of trials (5,10) for binary, continuous and time-to-event outcomes. For each scenario, we assessed the power, false positive rate (FPR) and bias of aforementioned five approaches.

Results: Naive and centred IPD-MA yielded the highest power, whilst preserving acceptable FPR around the nominal 5% in all scenarios. Centred IPD-MA showed slightly less biased estimates than naïve IPD-MA. Similar results were obtained for MA-IT, except when analysing binary outcomes (where it yielded less power and FPR <5%). PS-MA showed similar power as MA-IT in non-heterogeneous scenarios, but power collapsed as heterogeneity increased, and decreased even more in the presence of ecological bias. PS-MA suffered from too high FPRs in non-heterogeneous settings and showed biased estimates in all scenarios. Meta-regression showed poor power (<20%) in all scenarios and completely biased results in settings with qualitative ecological bias.

Conclusions: Our results indicate that subgroup detection in IPD-MA requires careful modelling. Naive and centred IPD-MA performed equally well, but due to less bias of the estimates in the presence of ecological bias, we recommend the latter.

Journal: BMC Med Res Methodol |
Year: 2019
Citation: 24
Assessment of heterogeneity in an individual participant data meta-analysis of prediction models: An overview and illustration

Clinical prediction models aim to provide estimates of absolute risk for a diagnostic or prognostic endpoint. Such models may be derived from data from various studies in the context of a meta-analysis. We describe and propose approaches for assessing heterogeneity in predictor effects and predictions arising from models based on data from different sources. These methods are illustrated in a case study with patients suffering from traumatic brain injury, where we aim to predict 6-month mortality based on individual patient data using meta-analytic techniques (15 studies, n = 11022 patients). The insights into various aspects of heterogeneity are important to develop better models and understand problems with the transportability of absolute risk predictions.

Journal: Stat Med |
Year: 2019
Citation: 37
Practical Implications of Using Real-World Evidence in Comparative Effectiveness Research: Learnings from IMI-GetReal

In light of increasing attention towards the use of Real-World Evidence (RWE) in decision making in recent years, this commentary aims to reflect on the experiences gained in accessing and using RWE for Comparative Effectiveness Research (CER) as part of the Innovative Medicines Initiative GetReal Consortium (IMI-GetReal) and discuss their implications for RWE use in decision-making. For the purposes of this commentary, we define RWE as evidence generated based on health data collected outside the context of RCTs. Meanwhile, we define Comparative Effectiveness Research (CER) as the conduct and/or synthesis of research comparing different benefits and harms of alternative interventions and strategies to prevent, diagnose, treat, and monitor health conditions in routine clinical practice (i.e. the real-world setting). The equivalent term for CER as used in the European context of Health Technology Assessment (HTA) and decision making is Relative Effectiveness Assessment (REA).

Journal: J Comp Eff Res |
Year: 2017
Citation: 13