
Full text loading...
Category: Genomics and Bioinformatics; Environmental Microbiology
The Uncountables, Page 1 of 2
< Previous page | Next page > /docserver/preview/fulltext/10.1128/9781555815509/9781555814069_Chap03-1.gif /docserver/preview/fulltext/10.1128/9781555815509/9781555814069_Chap03-2.gifAbstract:
In the 1980s and 1990s, it was widely recognized that the extent of microbial diversity in any environmental sample had never been experimentally determined, and some commentators believed bacterial diversity to be beyond practical calculation. The assumptions regarding the shape of the underlying community taxon-abundance distribution is undoubtedly open to criticism; however, what this article serves to do is to focus attention on determining what the underlying taxon-abundance distribution is by demonstrating that it is fundamental in determining the extent of prokaryotic diversity. The lognormal taxon-abundance distribution was used to assemble 30,000 communities, because it fitted the data better than the inverse Gaussian. The rationale for studying bacterial diversity given in this chapter is the prospect of the uncharted taxa being a reservoir of new drugs and metabolic processes. This chapter also describes the mathematical diversity estimators and taxon-abundance distributions and provides maps of the microbial world that will help guide future exploration and direct resources.
Full text loading...
Distribution of taxon abundances in communities of 1012 individuals and in small samples of 200 individuals for (a) a lognormally distributed community; N T /N max is the ratio of the total number of individuals to the number of individuals belonging to the most abundant taxon, which can be used to index richness ( Curtis et al., 2002 ); (b) a logseries distributed community; θ is one of the parameters of the lognormal that can be used as an index to species richness ( Hubbell, 2001 ); (c) a community where 200 taxa are equally abundant, and (d) a bimodal distribution (redrawn from Sloan et al., 2007 ).
The sample sizes required to correctly characterize a sample with a diversity of 5,000, undertaking a complete census, a 95% census, or using nonparametric methods. Note that if the diversity is uniform, nonparametric estimators are very efficient. However, if the diversity is lognormally distributed, a very large sample is required to obtain the correct answer. The simulations are described in more detail in Schloss and Handelsmann (2006) , and this figure appeared in Curtis et al. (2007) .
A “quick and dirty” way to estimate diversity by assuming a distribution. (a) The total number of taxa in a community with a lognormal species abundance curve is simply the area under that curve (called the species curve). The individuals curve is the number of species at each abundance (the species curve) multiplied by their abundance (the x-axis). There is therefore a mathematical relationship between the area under a species area curve, the number of individuals NT (the area under the individuals curve), and the maximum and the minimum abundance (N max and N min). (b) The relationship, over 30 orders of magnitude in population size, for various ratios of NT/N max by assuming that N min is equal to 1 ( Curtis et al., 2002 ). This figure appeared in Curtis et al. (2007) .
Expected sample abundances obtained using the lognormal (solid line) and inverse Gaussian distributions (dashed line). The predictions are posterior means as explained in the text. Actual data points are solid circles.
Estimates of observed diversity as a function of number of 16S rDNA reads for the data set of Rusch et al. (2007) . The dotted line gives a curve generated by fitting a Michaelis-Menten equation to the rarefaction curve. The solid line is the median observed diversity from generating artificial communities with parameters obtained by sampling from the likelihood, Equation (3), assuming a log-normal taxon-abundance distribution. The gray lines give 95% confidence intervals. The inset graph gives the same data magnified near the origin together with the actual rarefaction curve (dashed line).
Estimates of number of genomes that can be assembled as a function of fragments read for the data set of Rusch et al. (2007) . The solid line is the median expected number of genomes for the artificial communities used in Fig. 3 . The gray lines give 95% confidence intervals. The inset graph gives the same data magnified near the origin. Rusch et al. (2007) conducted approximately 6.4 × 106 reads and assembled only a single genome.
Diversity estimates from the data of Rusch et al. (2007) a