
Full text loading...
Category: Microbial Genetics and Molecular Biology
Comparing Microbial Genomes: How the Gene Set Determines the Lifestyle, Page 1 of 2
< Previous page | Next page > /docserver/preview/fulltext/10.1128/9781555818180/9781555811518_Chap06-1.gif /docserver/preview/fulltext/10.1128/9781555818180/9781555811518_Chap06-2.gifAbstract:
Sequencing of complete microbial genomes, pioneered in 1995 by J. C. Venter and colleagues, continues at an ever-increasing pace. The availability of complete genome sequences has had a major impact on the view of microbial evolution. Comparative analysis of the complete genomes of several diverse microorganisms on the basis of such properties as codon usage, open reading frame (ORF) density, and the lengths of coding regions shows many common trends in their organization. Comparison of the complete genomes of closely related bacterial species, Mycoplasma genitalium and Mycoplasma pneumoniae, showed a significant degree of synteny between these organisms. Homologous genes and their products can be classified into orthologs, related by vertical descent (e.g., speciation), and paralogs, related by duplication. Pathogenic bacteria import a variety of metabolites from their hosts, which allows them to shed genes encoding enzymes for some of the metabolic pathways. Comparative analysis of the available microbial genomes reveals both conservation of protein families and diversity of gene repertoires and gene organization among organisms that belong to diverse phylogenetic lineages. In many cases, bacterial genes seem to have substituted for the original genes of the archaeal-eukaryotic lineage, making phylogenetic reconstructions extremely complicated.
Full text loading...
Lengths of proteins encoded in completely sequenced genomes. The lengths of predicted ORFs were averaged over 50-amino-acid intervals and plotted as fractions of the total number of ORFs predicted in each particular organism. The plots were normalized so that the total area under each curve is the same. Data for B. burgdorferi, Synechocystis sp., and S. cerevisiae were added to the plots of length distributions in proteobacteria (A), low-G+C gram-positive bacteria (B), and archaea (C), respectively, for illustrative purposes. Hinf, H. influenzae; Hpyl, H. pylori; Bbur, B. burgdorferi; Mgen, M. genitalium; Mpne, M. pneumoniae; Bsub, B. subtilis; Syne, Synechocystis sp.; Aful, A. fulgidus; Mjan, M. jannaschii; Mthe, M. thermoautotrophicum.
Lengths of proteins encoded in completely sequenced genomes. The lengths of predicted ORFs were averaged over 50-amino-acid intervals and plotted as fractions of the total number of ORFs predicted in each particular organism. The plots were normalized so that the total area under each curve is the same. Data for B. burgdorferi, Synechocystis sp., and S. cerevisiae were added to the plots of length distributions in proteobacteria (A), low-G+C gram-positive bacteria (B), and archaea (C), respectively, for illustrative purposes. Hinf, H. influenzae; Hpyl, H. pylori; Bbur, B. burgdorferi; Mgen, M. genitalium; Mpne, M. pneumoniae; Bsub, B. subtilis; Syne, Synechocystis sp.; Aful, A. fulgidus; Mjan, M. jannaschii; Mthe, M. thermoautotrophicum.
Functional roles of conserved proteins in completely sequenced genomes. Each pie chart represents the numbers of proteins in each genome that are not included in the current set of COGs (the largest sector in every genome), and clockwise, the numbers of proteins that are responsible for (i) information storage and processing, including transcription, translation, DNA replication, recombination, and repair, and ribosome biogenesis; (ii) cellular processes, such as membrane and cell wall biogenesis, protein folding, and secretion; (iii) cellular metabolism, including carbohydrate, amino acid, lipid, and nucleotide metabolism and energy production and conversion. The last sector indicates poorly characterized conserved proteins. The total number is larger than the number of proteins in each particular genome, as different domains of the same protein may be included in different COGs.
Functional roles of conserved proteins in completely sequenced genomes. Each pie chart represents the numbers of proteins in each genome that are not included in the current set of COGs (the largest sector in every genome), and clockwise, the numbers of proteins that are responsible for (i) information storage and processing, including transcription, translation, DNA replication, recombination, and repair, and ribosome biogenesis; (ii) cellular processes, such as membrane and cell wall biogenesis, protein folding, and secretion; (iii) cellular metabolism, including carbohydrate, amino acid, lipid, and nucleotide metabolism and energy production and conversion. The last sector indicates poorly characterized conserved proteins. The total number is larger than the number of proteins in each particular genome, as different domains of the same protein may be included in different COGs.
Conservation of the pyrimidine biosynthesis pathway in various microorganisms. The enzymes are listed under E. coli gene names. The COG numbers are from the COG database (www.ncbi.nlm.nih.gov/COG [ 67 ]). The designations of species in the phylogenetic patterns are as follows: E, E. coli; H, H. influenzae; U, H. pylori; X, Rickettsia prowazekii; B, B. subtilis; G, M. genitalium; P, M. pneumoniae; R, Mycobacterium tuberculosis; O, B. burgdorferi; L, T. pallidum; I, C. trachomatis; Q, A. aeolicus; C, Synechocystis sp.; M, M. jannaschii; T, M. thermoautotrophicum; A, A.fulgidus; K, P. horikoshii; Y, S. cerevisiae. The uppercase letters indicate organisms that have proteins in a corresponding COG; the lowercase letters indicate the organisms that are not represented in a given COG. Nonorthologous enzymes catalyzing the same biochemical reaction are shown side by side, where known. Different subunits or conserved domains of the same enzyme are shown in the same frame, one under the other.
Conservation of the pyrimidine biosynthesis pathway in various microorganisms. The enzymes are listed under E. coli gene names. The COG numbers are from the COG database (www.ncbi.nlm.nih.gov/COG [ 67 ]). The designations of species in the phylogenetic patterns are as follows: E, E. coli; H, H. influenzae; U, H. pylori; X, Rickettsia prowazekii; B, B. subtilis; G, M. genitalium; P, M. pneumoniae; R, Mycobacterium tuberculosis; O, B. burgdorferi; L, T. pallidum; I, C. trachomatis; Q, A. aeolicus; C, Synechocystis sp.; M, M. jannaschii; T, M. thermoautotrophicum; A, A.fulgidus; K, P. horikoshii; Y, S. cerevisiae. The uppercase letters indicate organisms that have proteins in a corresponding COG; the lowercase letters indicate the organisms that are not represented in a given COG. Nonorthologous enzymes catalyzing the same biochemical reaction are shown side by side, where known. Different subunits or conserved domains of the same enzyme are shown in the same frame, one under the other.
Conservation of the purine biosynthesis pathway in various microorganisms. For an explanation of the symbols, see the legend to Fig. 3 .
Conservation of the purine biosynthesis pathway in various microorganisms. For an explanation of the symbols, see the legend to Fig. 3 .
Universally conserved gene strings (operons) in bacteria and archaea
Universally conserved gene strings (operons) in bacteria and archaea
Conservation of ATP synthase operons in microbial genomes
Conservation of ATP synthase operons in microbial genomes