Overview | Funding | Publications | International activities | Community | Summary

Better predictions using big data sets

Clinical prediction models are an important tool in contemporary medical decision making and abundant in the medical literature. Prediction models estimate the probability/risk that a certain condition is present or will occur in the future by combining information from multiple variables (predictors) from an individual, e.g. predictors from patient history, physical examination or medical testing. Prediction models are used to determine referral of patients for further testing, for planning lifestyle or therapeutic decisions or to risk-stratify participants in therapeutic clinical trials.

Probability estimates provided by prediction models should be sufficiently accurate, otherwise incorrect management decisions are being made, leading to suboptimal outcomes for individuals and unnecessary health care costs. Unfortunately, many prediction models predict much worse than anticipated during their development. A major reason for unsatisfactory accuracy and limited use in clinical practice is that they are typically developed from relatively small datasets, and subsequently used in populations/settings too different from the original development population/setting, without proper validation and adaptation to the new situation.

To improve the accuracy and generalizability of prediction models, their development and subsequent validation should be based on larger datasets. This strategy is increasingly common by sharing of research data. Currently, however, there is a lack of statistical approaches to properly develop, validate and adapt prediction models when predictor effects vary across individuals due to differences in predictor/outcome burden or in measurement techniques across studies, populations, settings or time periods.


Snell KI, Ensor J, Debray TP, Moons KG, Riley RD. Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?. Stat Methods Med Res 2017.0:.

Rietbergen C, Debray TPA, Klugkist I, Janssen KJM, Moons KG. Use and reporting of Bayesian methods for primary data analysis in epidemiological research: a systematic review. J Clin Epidemiol 2017.0:.

van Doorn S, Debray TPA, Kaasenbrood F, Hoes AW, Rutten FH, Moons KGM, Geersing GJ. Predictive performance of the CHA2DS2-VASc rule in atrial fibrillation: a systematic review and meta-analysis. J Thromb Haemost 2017.0:.

Debray TPA, Damen JAAG, Snell, KIE, Ensor J, Hooft L, Reitsma JB, Riley RD, Moons KGM. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017.356:i6460.

Project Details

FunderThe Netherlands Organisation for Health Research and Development
Project CategoryInnovational Research Incentives Scheme VENI
Project Reference91617050
Funded PeriodJan 2017 - present
Funded ValueEUR 250,000
Leaddr. T.P.A. Debray