Overview | Funding | Publications | International activities | Community | Summary

Better predictions using big data sets

Clinical prediction models are an important tool in contemporary medical decision making and abundant in the medical literature. Prediction models estimate the probability/risk that a certain condition is present or will occur in the future by combining information from multiple variables (predictors) from an individual, e.g. predictors from patient history, physical examination or medical testing. Prediction models are used to determine referral of patients for further testing, for planning lifestyle or therapeutic decisions or to risk-stratify participants in therapeutic clinical trials.

Probability estimates provided by prediction models should be sufficiently accurate, otherwise incorrect management decisions are being made, leading to suboptimal outcomes for individuals and unnecessary health care costs. Unfortunately, many prediction models predict much worse than anticipated during their development. A major reason for unsatisfactory accuracy and limited use in clinical practice is that they are typically developed from relatively small datasets, and subsequently used in populations/settings too different from the original development population/setting, without proper validation and adaptation to the new situation.

To improve the accuracy and generalizability of prediction models, their development and subsequent validation should be based on larger datasets. This strategy is increasingly common by sharing of research data. Currently, however, there is a lack of statistical approaches to properly develop, validate and adapt prediction models when predictor effects vary across individuals due to differences in predictor/outcome burden or in measurement techniques across studies, populations, settings or time periods.


van Hoorn R, Tummers M, Booth A, Gerhardus A, Rehfuess E, Hind D, Bossuyt PM, Welch V, Debray TP, Underwood M, Cuijpers P, Kraemer H, van der Wilt GJ, Kievit W. The development of CHAMP: a checklist for the appraisal of moderators and predictors. BMC Med Res Methodol 2017.173:.

Debray TP, Moons KG, Riley RD. Detecting small-study effects and funnel plot asymmetry in meta-analysis of survival data: a comparison of new and existing tests. Res Synth Methods 2017.0:.

Snell KI, Ensor J, Debray TP, Moons KG, Riley RD. Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?. Stat Methods Med Res 2017.0:.

Rietbergen C, Debray TPA, Klugkist I, Janssen KJM, Moons KG. Reporting of Bayesian analysis in epidemiologic research should become more transparent. J Clin Epidemiol 2017.86: 51-58.

van Doorn S, Debray TPA, Kaasenbrood F, Hoes AW, Rutten FH, Moons KGM, Geersing GJ. Predictive performance of the CHA2DS2-VASc rule in atrial fibrillation: a systematic review and meta-analysis. J Thromb Haemost 2017.15:1-13.

Debray TPA, Damen JAAG, Snell, KIE, Ensor J, Hooft L, Reitsma JB, Riley RD, Moons KGM. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017.356:i6460.

Audigier V, White IR, Jolani S, Debray TPA, Quartagno M, Carpenter J, van Buuren S, Resche-Rigon M. Multiple imputation for multilevel data with continuous and binary variables. Stat Sci 2018.0:.

Kers J, Peters-Sengers H, Heemskerk MBA, Berger SP, Betjes MGH, van Zuilen AD, Hilbrands LB, de Fijter JW, Nurmohamed AS, Christiaans MH, Homan van der Heide JJ, Debray TPA, Bemelman FJ. Predicition models for delayed graft function: external validation on The Dutch Prospective Renal Transplantation Registry. Nephrology Dialysis Transplantation 2018.0:.

Project Details

FunderThe Netherlands Organisation for Health Research and Development
Project CategoryInnovational Research Incentives Scheme VENI
Project Reference91617050
Funded PeriodJan 2017 - present
Funded ValueEUR 250,000
Leaddr. Thomas PA Debray - ORCID iD
Cooperating centers