Bayesian adjustment for preferential testing in estimating the COVID-19 infection fatality rate

Campbell H, de Valpine P, Maxwell L, de Jong VM, Debray T, Jänisch T, Gustafson P

A key challenge in estimating the infection fatality rate (IFR) is determining the total number of cases. The total number of cases is not known because not everyone is tested but also, more importantly, because tested individuals are not representative of the population at large. We refer to the phenomenon whereby infected individuals are more likely to be tested than non-infected individuals, as "preferential testing". An open question is whether or not it is possible to reliably estimate the IFR without any specific knowledge about the degree to which the data are biased by preferential testing. In this paper we take a partial identifiability approach, formulating clearly where deliberate prior assumptions can be made and presenting a Bayesian model, which pools information from different samples. Results of a simulation study suggest that when certain populations with representative testing (i.e., or for which the degree of preferential testing is known) are included in the analysis, identifiability and reliable estimates are attainable. When only limited knowledge is available about the magnitude of preferential testing, reliable estimation of the IFR may still be possible so long as there is sufficient "heterogeneity of bias" across samples. When the model is fit to European data obtained from seroprevalence studies and national official COVID-19 statistics, we estimate the overall COVID-19 IFR for Europe to be 0.47%, 95% C.I. = [0.34%, 0.63%].

This article is distributed under the terms of the Non-exclusive license to distribute, which permits arXiv.org a perpetual, non-exclusive license to distribute this article.

arXiv License