Statistical Considerations in Environmental Microbial Forensics
 Authors: Graham McBride^{1}, Brent Gilpin^{2}
 Editors: Raúl J. Cano^{3}, Gary A. Toranzos^{4}

VIEW AFFILIATIONS HIDE AFFILIATIONSAffiliations: 1: National Institute of Water & Atmospheric Research (NIWA), Hamilton, Hamilton, 3216 New Zealand; 2: Environmental Science Research (ESR), Christchurch 8540, New Zealand; 3: California Polytechnic State University, San Luis Obispo, CA; 4: University of Puerto RicoRío Piedras, San Juan, Puerto Rico

Received 01 October 2015 Accepted 10 October 2015 Published 12 August 2016
 Correspondence: Graham McBride, [email protected]

Abstract:
In environmental microbial forensics, as in other pursuits, statistical calculations are sometimes inappropriately applied, giving rise to the appearance of support for a particular conclusion or failing to support an innately obvious conclusion. This is a reflection of issues related to dealing with sample sizes, the methodologies involved, and the difficulty of communicating uncertainties. In this brief review, we attempt to illustrate ways to minimize such problems. In doing so, we consider one of the most common applications of environmental microbial forensics—the use of genotyping in food and water and disease investigations. We explore three important questions. (i) Do hypothesis tests’ P values serve as adequate metrics of evidence? (ii) How can we quantify the value of the evidence? (iii) Can we turn a valueofevidence metric into attribution probabilities? Our general conclusions are as follows. (i) P values have the unfortunate property of regularly detecting trivial effects when sample sizes are large. (ii) Likelihood ratios, rather than any kind of probability, are the better strengthofevidence metric, addressing the question “what do these data say?” (iii) Attribution probabilities, addressing the question “what should I believe?,” can be calculated using Bayesian methods, relying in part on likelihood ratios but also invoking prior beliefs which therefore can be quite subjective. In legal settings a Bayesian analysis may be required, but the choice and sensitivity of prior assumptions should be made clear.

Citation: McBride G, Gilpin B. 2016. Statistical Considerations in Environmental Microbial Forensics. Microbiol Spectrum 4(4):EMF00052015. doi:10.1128/microbiolspec.EMF00052015.
Statistical Considerations in Environmental Microbial Forensics, Page 1 of 2
< Previous page  Next page > /docserver/preview/fulltext/microbiolspec/4/4/EMF000520151.gif /docserver/preview/fulltext/microbiolspec/4/4/EMF000520152.gifReferences
Article metrics loading...
Abstract:
In environmental microbial forensics, as in other pursuits, statistical calculations are sometimes inappropriately applied, giving rise to the appearance of support for a particular conclusion or failing to support an innately obvious conclusion. This is a reflection of issues related to dealing with sample sizes, the methodologies involved, and the difficulty of communicating uncertainties. In this brief review, we attempt to illustrate ways to minimize such problems. In doing so, we consider one of the most common applications of environmental microbial forensics—the use of genotyping in food and water and disease investigations. We explore three important questions. (i) Do hypothesis tests’ P values serve as adequate metrics of evidence? (ii) How can we quantify the value of the evidence? (iii) Can we turn a valueofevidence metric into attribution probabilities? Our general conclusions are as follows. (i) P values have the unfortunate property of regularly detecting trivial effects when sample sizes are large. (ii) Likelihood ratios, rather than any kind of probability, are the better strengthofevidence metric, addressing the question “what do these data say?” (iii) Attribution probabilities, addressing the question “what should I believe?,” can be calculated using Bayesian methods, relying in part on likelihood ratios but also invoking prior beliefs which therefore can be quite subjective. In legal settings a Bayesian analysis may be required, but the choice and sensitivity of prior assumptions should be made clear.
Full text loading...
Figures
Click to view
FIGURE 1
Power curves for a onesample twosided student’s t test at the 5% significance level, where the population variance (σ2) is unknown. Rotated numbers are the sample size (n).
Click to view
FIGURE 2
The P value for a test positing that the true odds ratio, 1, is the sum of the areas in the left and right tails of the unit normal probability density function that are cut off by Z and by −Z (nil tests are “twotailed”). For Z of 1.8 we read standard distribution tables to obtain P = 2 × 0.0359 = 0.0718. (The total area under this density function is 1.) Note that larger values of Z give rise to smaller values of P.
Tables
Click to view
TABLE 1
Example 2 × 2 contingency table
Click to view
TABLE 2
Results of nil hypothesis tests and likelihood ratio for varying sample size
Click to view
TABLE 3
E. coli O169 illnesses among Korean schoolchildren consuming kimchi (K)
Click to view
TABLE 4
Contraction of bacterial illnesses from various exposures
Click to view
TABLE 5
Campylobacteriosis cases among participants in a longdistance race
Supplemental Material
No supplementary material available for this content.