Better predictions using big data sets

Thomas Debray

Clinical prediction models are an important tool in contemporary medical decision making and abundant in the medical literature. Prediction models estimate the probability/risk that a certain condition is present or will occur in the future by combining information from multiple variables (predictors) from an individual, e.g. predictors from patient history, physical examination or medical testing. Prediction models are used to determine referral of patients for further testing, for planning lifestyle or therapeutic decisions or to risk-stratify participants in therapeutic clinical trials.

Probability estimates provided by prediction models should be sufficiently accurate, otherwise incorrect management decisions are being made, leading to suboptimal outcomes for individuals and unnecessary health care costs. Unfortunately, many prediction models predict much worse than anticipated during their development. A major reason for unsatisfactory accuracy and limited use in clinical practice is that they are typically developed from relatively small datasets, and subsequently used in populations/settings too different from the original development population/setting, without proper validation and adaptation to the new situation.

To improve the accuracy and generalizability of prediction models, their development and subsequent validation should be based on larger datasets. This strategy is increasingly common by sharing of research data. Currently, however, there is a lack of statistical approaches to properly develop, validate and adapt prediction models when predictor effects vary across individuals due to differences in predictor/outcome burden or in measurement techniques across studies, populations, settings or time periods.



Westeneng HJ, Al-Chalabi A, Hardiman O, Debray TP, van den Berg LH. The life expectancy of Stephen Hawking, according to the ENCALS model. Lancet Neurology 2018:17;662-3.

Debray TPA, Damen JAA, Riley RD, Snell K, Reitsma JB, Hooft L, Collins GS, Moons KGM. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Stat Methods Med Res 2018:0.

Westeneng HJ, Debray TPA, Visser AE, van Eijk RPA, Rooney JPK, Calvo A, Martin S, McDermott CJ, Thompson AG, Pinto S, Kobeleva X, Rosenbohm A, Stubendorff B, Sommer H, Middelkoop BM, Dekker AM, van Vugt JJFA, van Rheenen W, Vajda A, Heverin M, Kazoka M, Hollinger H, Gromicho M, Körner S, Ringer TM, Rödiger A, Gunkel A, Shaw CE, Bredenoord AL, van Es MA, Corcia P, Couratier P, Weber M, Grosskreutz J, Ludolph AC, Petri S, de Carvalho M, Van Damme P, Talbot K, Turner MR, Shaw PJ, Al-Chalabi A, Chiò A, Hardiman O, Moons KGM, Veldink JH, van den Berg LH. Prediction of personalised prognosis in patients with amyotrophic lateral sclerosis: development and validation of a prediction model. Lancet Neurology 2018:17;423-33.

Kers J, Peters-Sengers H, Heemskerk MBA, Berger SP, Betjes MGH, van Zuilen AD, Hilbrands LB, de Fijter JW, Nurmohamed AS, Christiaans MH, Homan van der Heide JJ, Debray TPA, Bemelman FJ. Predicition models for delayed graft function: external validation on The Dutch Prospective Renal Transplantation Registry. Nephrology Dialysis Transplantation 2018:0.

van Hoorn R, Tummers M, Booth A, Gerhardus A, Rehfuess E, Hind D, Bossuyt PM, Welch V, Debray TP, Underwood M, Cuijpers P, Kraemer H, van der Wilt GJ, Kievit W. The development of CHAMP: a checklist for the appraisal of moderators and predictors. BMC Med Res Methodol 2017:173.

Debray TP, Moons KG, Riley RD. Detecting small-study effects and funnel plot asymmetry in meta-analysis of survival data: a comparison of new and existing tests. Res Synth Methods 2018:9;41-50.

Snell KI, Ensor J, Debray TP, Moons KG, Riley RD. Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?. Stat Methods Med Res 2017:0.

Rietbergen C, Debray TPA, Klugkist I, Janssen KJM, Moons KG. Reporting of Bayesian analysis in epidemiologic research should become more transparent. J Clin Epidemiol 2017:86;51-58.

van Doorn S, Debray TPA, Kaasenbrood F, Hoes AW, Rutten FH, Moons KGM, Geersing GJ. Predictive performance of the CHA2DS2-VASc rule in atrial fibrillation: a systematic review and meta-analysis. J Thromb Haemost 2017:15;1-13.

Debray TPA, Damen JAAG, Snell, KIE, Ensor J, Hooft L, Reitsma JB, Riley RD, Moons KGM. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017:356;i6460.

Audigier V, White IR, Jolani S, Debray TPA, Quartagno M, Carpenter J, van Buuren S, Resche-Rigon M. Multiple imputation for multilevel data with continuous and binary variables. Stat Sci 2018:33;160-83.

The Netherlands Organisation for Health Research and Development

Source of Funding

The Netherlands Organisation for Scientific Research supports a strong system of sciences in the Netherlands by encouraging quality and innovation in science. Our conviction is that scientific research contributes to our prosperity and well-being and that it provides for our growing need for knowledge: for facing societal challenges, for economic development and to better understand ourselves and the world.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley, Lorem Ipsum is simply dummy text of the printing and typesetting industry.