Chapter 2.4.4 : Metagenomics: Assigning Functional Status to Community Gene Content

MyBook is a cheap paperback edition of the original book and will be sold at uniform, low price.

Preview this chapter:
Zoom in

Metagenomics: Assigning Functional Status to Community Gene Content, Page 1 of 2

| /docserver/preview/fulltext/10.1128/9781555818821/9781555818821.ch2.4.4-1.gif /docserver/preview/fulltext/10.1128/9781555818821/9781555818821.ch2.4.4-2.gif


Microbial metabolic processes, dynamics and interactions shape the biogeochemistry of the planet. An estimated > 1030 prokaryotic cells and ∼1030 phages inherited in their genomes, inhabit Earth and contain an estimated 350–550 Petagrams (1 Pg = 1015 g) of carbon, 85–130 Pg of nitrogen and 9–14 Pg of phosphorus. These nutrient data have come from nucleic acid based cultivation-independent surveys (CIS) of microbial communities sampled during the past two decades. Often, these communities have been surveyed using PCR-based sequencing approaches targeting organisms at the domain level. The term ‘metagenomics’ can refer to such broad amplicon surveys, but is more commonly used for shotgun sequencing approaches that do not use PCR to select either for specific genes or for specific organisms. Even after a decade of concerted effort of sequence and metadata generation, continuous improvement in sequencing technologies and computational frameworks, our understanding of the microbial dynamics (taxonomical, functional and evolutionary) is still limited in many respects. Specifically, our understanding of the functional potential of the key (eco-genetically adapted) and/or dominating taxa and their interdependencies with the ‘rare biosphere’ (i.e. lesser abundant but genetically diverge species) in complex microbial ecosystems is still largely unknown. In this review, the existing concepts, methodologies and approaches for metagenomic data analysis are outlined in order to highlight the potential of community genomics (metagenomics) to decipher the metabolic potential of microbial assemblages. Significance of closely coupled parameters like ‘individual read versus assembly based functional analysis’ and ‘cross validation (replicates) versus deep coverage’ is also explored.

Citation: Sangwan N, Lal R. 2016. Metagenomics: Assigning Functional Status to Community Gene Content, p 2.4.4-1-2.4.4-7. In Yates M, Nakatsu C, Miller R, Pillai S (ed), Manual of Environmental Microbiology, Fourth Edition. ASM Press, Washington, DC. doi: 10.1128/9781555818821.ch2.4.4
Highlighted Text: Show | Hide
Loading full text...

Full text loading...


Image of FIGURE 1

Flow chart for the downstream analysis of metagenome sequences, with a focus on functional analysis. Processes described here are sequence technology independent and can be used for assembled and individual (unassembled) reads. doi:10.1128/9781555818821.ch2.4.4.f1

Citation: Sangwan N, Lal R. 2016. Metagenomics: Assigning Functional Status to Community Gene Content, p 2.4.4-1-2.4.4-7. In Yates M, Nakatsu C, Miller R, Pillai S (ed), Manual of Environmental Microbiology, Fourth Edition. ASM Press, Washington, DC. doi: 10.1128/9781555818821.ch2.4.4
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 2

Relative potential of similarity-based methods used in metagenome functional annotation. Method_A: BLASTP-based comparison of protein sequences (predicted from contigs generated by assembly of raw reads) against a protein database. Method_B: Direct comparison (BLASTX; e value = 10) of individual metagenome reads against protein database. Ten thousand randomly selected metagenome reads (average read length = 300 bases) from hexachlorocyclohexane-contaminated soil (accession no: SRX0964712; [ ]) and ORFs predicted (minimum read length = 90 amino acid) from the metagenomic contigs were compared (BLASTX; e value = 10–5) against the STRING ( ) database using methods B and A, respectively. The number of unique protein families were compared against metagenome reads and ORFs and Pearson correlation coefficient (PCC) was calculated. Owing to its high coverage based characteristic (variation in each read gets compared individually) Method_B provides better annotation results with more number of unique protein families per read. doi:10.1128/9781555818821.ch2.4.4.f2

Citation: Sangwan N, Lal R. 2016. Metagenomics: Assigning Functional Status to Community Gene Content, p 2.4.4-1-2.4.4-7. In Yates M, Nakatsu C, Miller R, Pillai S (ed), Manual of Environmental Microbiology, Fourth Edition. ASM Press, Washington, DC. doi: 10.1128/9781555818821.ch2.4.4
Permissions and Reprints Request Permissions
Download as Powerpoint


1. Whitman WB, Coleman DC, Wiebe WJ. 1998. Prokaryotes: the unseen majority. Proc Natl Acad Sci USA 95 : 6578 6583.[PubMed][CrossRef]
2. Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, Furlan M, Desnues C, Haynes M, Li L, McDaniel L, Moran MA, Nelson KE, Nilsson C, Olson R, Paul J, Brito BR, Ruan Y, Swan BK, Stevens R, Valentine DL, Thurber RV, Wegley L, White BA, Rohwer F. 2008. Functional metagenomic profiling of nine biomes. Nature 452 : 629 632.[PubMed][CrossRef]
3. Pace NR. 2009. Mapping the tree of life: progress and prospects. Microbiol Mol Biol Rev 73 : 565 576.[PubMed][CrossRef]
4. Kunin V, Copeland A, Lapidus A, Mavromatis K, Hugenholtz P. 2008. A bioinformatician's guide to metagenomics. Microbiol Mol Biol Rev 72: 557 578.[PubMed][CrossRef]
5. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF. 2004. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428 : 37 43.[PubMed][CrossRef]
6. Sangwan N, Verma H, Kumar R, Negi V, Lax S, Khurana P, Khurana JP, Gilbert JA, Lal R. 2014. Reconstructing an ancestral genotype of two hexachlorocyclohexane-degrading Sphingobium species using metagenomic sequence data. ISME J 8 : 398 408.[PubMed][CrossRef]
7. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215 : 403 410.[PubMed][CrossRef]
8. Badger JH, Olsen GJ. 1999. CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 16 : 512 524.[PubMed][CrossRef]
9. Frishman D, Mironov A, Mewes HW, Gelfand M. 1998. Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 26 : 2941 2947.[PubMed][CrossRef]
10. Kaelbling LP, Littman ML, Cassandra AR. 1998. Planning and acting in partially observable stochastic domains. Artif Intell 101 : 99 134.[CrossRef]
11. Borodovsky M, Lomsadze A, Ivanov N, Mills R. 2003. Eukaryotic gene prediction using GeneMark.hmm. Curr Protoc Bioinformatics 1:4.6 : 4.6.1 4.6.12.[CrossRef]
12. Kelley DR, Liu B, Delcher AL, Pop M, Salzberg SL. 2012. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res 40 : e9.[PubMed][CrossRef]
13. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11 : 119.[PubMed][CrossRef]
14. Noguchi H, Park J, Takagi T. 2006. MetaGene: prokaryotic gene finding from environmental genome shotgun sequences. Nucleic Acids Res 34 : 5623 5630.[PubMed][CrossRef]
15. Kelley DR, Salzberg SL. 2010. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics 11 : 544.[PubMed][CrossRef]
16. Boisvert S, Raymond F, Godzaridis E, Laviolette F, Corbeil J. 2012. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol 13 : R122.[PubMed][CrossRef]
17. Albertsen M,, Hugenholtz P,, Skarshewski A,, Nielsen KL,, Tyson GW,, Nielsen PH. 2013. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31 : 533 538.[PubMed][CrossRef]
18. Wrighton KC, Thomas BC, Sharon I, Miller CS, Castelle CJ, VerBerkmoes NC, Wilkins MJ, Hettich RL, Lipton MS, Williams KH, Long PE, Banfield JF. 2012. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated bacterial phyla. Science 337 : 1661 1665.[PubMed][CrossRef]
19. Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF. 2013. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res 23: 111 120.[PubMed][CrossRef]
20. Mohammed MH, Ghosh TS,, Singh NK,, Mande SS. 2011. SPHINX—an algorithm for taxonomic binning of metagenomic sequences. Bioinformatics 27 : 22 30.[PubMed][CrossRef]
21. Johannes A, Bjarnason BS,, de Bruijn I,, Schirmer M,, Quick J,, Ijaz UZ,, Lahti L,, Loman NJ,, Andersson AF,, Quince C. 2014. Binning metagenomic contigs by coverage and composition. Nat Meth 10.1038/nmeth.3103. http://dx.doi.org/10.1038/nmeth.3103
22. Stamps BW,, Corsetti FA,, Spear JR,, Stevenson BS. 2014. Draft genome of a novel Chlorobi member assembled by tetranucleotide binning of a hot spring metagenome. Genome Announce 2 : e00897 e00914.[CrossRef]
23. Wu YW, Tang YH, Tringe SG,, Simmons BA,, Singer SW. 2014. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2 : 1 18.[CrossRef]
24. Wang Y, Leung H, Yiu S,, Chin F. 2014. MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genomics 15 : S12.[PubMed][CrossRef]
25. Wooley JC, Godzik A, Friedberg I. 2010. A primer on metagenomics. PLoS Comput Biol 6 : e1000667.[PubMed][CrossRef]
26. Sangwan N, Lata P, Dwivedi V, Singh A, Niharika N, Kaur J, Anand S, Malhotra J, Jindal S, Nigam A, Lal D, Dua A, Saxena A, Garg N, Verma M, Kaur J, Mukherjee U, Gilbert JA, Dowd SE, Raman R, Khurana P, Khurana JP, Lal R. 2012. Comparative metagenomic analysis of soil microbial communities across three hexachlorocyclohexane contamination levels. PLoS One 7 : e46219.[PubMed][CrossRef]
27. Mackelprang R, Waldrop MP, DeAngelis KM, David MM, Chavarria KL, Blazewicz SJ, Rubin EM, Jansson JK. 2012. Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature 480 : 368 71.[CrossRef]
28. Huson DH, Xie C. 2013. A poor man's BLASTX-high-throughput metagenomic protein database search using PAUDA. Bioinformatics 30 : 38 39.[PubMed][CrossRef]
29. Pruitt KD, Tatusova T, Brown GR, Maglott DR. 2012. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40 : D130 D135.[PubMed][CrossRef]
30. Letunic I, Doerks T, Bork P. 2012. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res 40 : D302 D3025.[PubMed][CrossRef]
31. Wilke A, Harrison T, Wilkening J, Field D, Glass EM, Kyrpides N, Mavrommatis K, Meyer F. 2012. The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools. BMC Bioinformatics 13 : 141.[PubMed][CrossRef]
32. Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. 2007. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23 : 1282 1288.[PubMed][CrossRef]
33. Sun S, Chen J, Li W, Altintas I, Lin A, Peltier S, Stocks K, Allen EE, Ellisman M, Grethe J, Wooley J. 2011. Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource. Nucleic Acids Res 39 : D546 D451.[PubMed][CrossRef]
34. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, Wilkening J, Edwards RA. 2008. The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9 : 386.[PubMed][CrossRef]
35. Markowitz VM, Chen IM, Chu K, Szeto E, Palaniappan K, Jacob B, Ratner A, Liolios K, Pagani I, Huntemann M, Mavromatis K, Ivanova NN, Kyrpides NC. 2012. IMG/M-HMP: a metagenome comparative analysis system for the Human Microbiome Project. PLoS One 7 : e40151.[PubMed][CrossRef]
36. Tatusov RL, Galperin MY, Natale DA, Koonin EV. 2000. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28 : 33 36.[PubMed][CrossRef]
37. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR. 2004. The Pfam protein families database. Nucleic Acids Res 32 : D138 D141.[PubMed][CrossRef]
38. Haft DH, Loftus BJ, Richardson DL, Yang F, Eisen JA, Paulsen IT, White O. 2001. TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res 29 : 41 43.[PubMed][CrossRef]
39. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, Jensen LJ, von Mering C. 2011. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39 : D561 D568.[PubMed][CrossRef]
40. Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crécy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Rückert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V. 2005. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 33 : 5691 5702.[PubMed][CrossRef]
41. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. 1999. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 27 : 29 34.[PubMed][CrossRef]
42. Mende DR, Waller AS, Sunagawa S, Järvelin AI, Chan MM, Arumugam M, Raes J, Bork P. 2012. Assessment of metagenomic assembly using simulated next generation sequencing data. PLoS One 7 : e31386.[PubMed][CrossRef]
43. Prakash T, Taylor TD. 2012. Functional assignment of metagenomic data: challenges and applications. Brief Bioinform 13 : 711 727.[PubMed][CrossRef]
44. Huson DH, Mitra S, Weber N, Ruscheweyh H, Schuster SC. 2011. Integrative analysis of environmental sequences using MEGAN4. Genome Res 21 : 1552 1560.[PubMed][CrossRef]
45. Pinney JW, Shirley MW, McConkey GA, Westhead DR. 2005. metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella. Nucleic Acids Res 33 : 1399 1409.[PubMed][CrossRef]
46. Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, Rodriguez-Mueller B, Zucker J, Thiagarajan M, Henrissat B, White O, Kelley ST, Methé B, Schloss PD, Gevers D, Mitreva M, Huttenhower C. 2012. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput Biol 8 : e1002358.[PubMed][CrossRef]
47. Goodall DW. 1966. A new similarity index based on probability. Biometrics 22 : 882 907.[CrossRef]
48. Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26 : 2460 2461.[PubMed][CrossRef]
49. Hess M, Sczyrba A, Egan R, Kim TW, Chokhawala H, Schroth G, Luo S, Clark DS, Chen F, Zhang T, Mackie RI, Pennacchio LA, Tringe SG, Visel A, Woyke T, Wang Z, Rubin EM. 2011. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331 : 463 467.[PubMed][CrossRef]
50. Williamson SJ, Rusch DB, Yooseph S, Halpern AL, Heidelberg KB, Glass JI, Andrews-Pfannkoch C, Fadrosh D, Miller CS, Sutton G, Frazier M, Venter JC. 2008. The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples. PLoS One 3 : e1456.[PubMed][CrossRef]
51. Prosser JI. 2010. Replicate or lie. Environ Microbiol 12 : 1806 1810.[PubMed][CrossRef]
52. Gilbert JA, Field D, Swift P, Thomas S, Cummings D, Temperton B, Weynberg K, Huse S, Hughes M, Joint I, Somerfield PJ, Mühling M. 2010. The taxonomic and functional diversity of microbes at a temperate coastal site: a “multi-omic” study of seasonal and diel temporal variation. PLoS One 5 : e15545.[PubMed][CrossRef]
53. Ni J, Yan Q, Yu Y. 2013. How much metagenomic sequencing is enough to achieve a given goal? Sci Rep 3 : 1968.[PubMed]
54. Glass EM, Dribinsky Y, Yilmaz P, Levin H, Van Pelt R, Wendel D, Wilke A, Eisen JA, Huse S, Shipanova A, Sogin M, Stajich J, Knight R, Meyer F, Schriml LM. 2014. MIxS-BE: a MIxS extension defining a minimum information standard for sequence data from the built environment. ISME J 8 : 1 3.[PubMed][CrossRef]


Generic image for table

Computational resources for community metabolism reconstruction

Citation: Sangwan N, Lal R. 2016. Metagenomics: Assigning Functional Status to Community Gene Content, p 2.4.4-1-2.4.4-7. In Yates M, Nakatsu C, Miller R, Pillai S (ed), Manual of Environmental Microbiology, Fourth Edition. ASM Press, Washington, DC. doi: 10.1128/9781555818821.ch2.4.4

This is a required field
Please enter a valid email address
Please check the format of the address you have entered.
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error