
Full text loading...
Category: Microbial Genetics and Molecular Biology
This book examines the importance of gene organization in genome function. Organized hierarchically, it addresses four major areas: description, forces that shape the genome, the genome's influence on gene expression, and future directions. Chapters within each section address more focused topics and introductory and summary chapters guide the reader toward a broader perspective of viewing the genome as an integrated system.
Electronic Only, 394 pages, illustrations, index.
This chapter gives an overview of the strategies developed to sequence entire microbial genomes, and discusses the advantages and disadvantages of various approaches. For total-genome shotgun sequencing, the genomic DNA is fragmented into random pieces and subcloned directly into pUC, Ml3, or other vectors that accept insert sizes of 1 to 5 kbp. Typically, 6 to 10 genome equivalents are sequenced to cover the DNA molecule completely by using standard primers that prime at the end of the cloning vector. The primer-walking strategy has been tried primarily in the context of the yeast sequencing project. The method requires an ordered library of clones, either an overlapping set of large clones (e.g., a cosmid library) or an ordered set of discrete subclones (e.g., two 6-base cutter restriction digest libraries from a cosmid). Regardless of the sequencing strategy chosen in a particular project, there are four general phases of the sequencing process. They are primary sequencing phase, linking phase, polishing phase, and finished sequence. Only one genome project, the Escherichia coli effort at the University of Wisconsin, made substantial progress with radioactive sequencing before changing to automated-sequencing strategies. There are two different kinds of sequencing laboratories that produce genomic sequence: sequencing factories and smaller laboratories with an output of 2 to 5 Mbp of genomic sequence per year. With increasing levels of automation, the sequence production costs will be reduced, and in the future it may be possible to reach 10 cents per finished base pair.
Every bacterial protein-coding gene resides inside an open reading frame (ORF), but far from every ORF observed in bacterial DNA sequence hosts a gene. Nevertheless, ORF detection is a reasonable start for gene hunting. The average length of protein-coding ORFs stayed close to 900 nt. Such an observation suggests that to identify a "long" gene-hosting ORF should be a rather simple matter. A Markov model theory provided a natural basis for mathematical treatment of DNA sequence. Ordinary Markov models have been used since the earliest studies of DNA sequences. Later, with a larger amount of sequence data available, three-periodic inhomogeneous Markov models were proven to be more informative and more useful for protein-coding-sequence modeling and recognition. The performance of the GeneMark.hmm program was tested with several control sets, including 10 complete bacterial genomes. The complete genomic sequence of Escherichia coli consists of 4,639,221 nt, with 4,288 genes annotated. The chapter talks about higher-order models and models of typical and atypical genes. Genes predicted as atypical are likely to be horizontally transferred genes, a category of special interest for evolutionary studies, as well as for studies of pathogenic bacteria whose pathogenicity islands or antibiotic-resistance genes could be relatively recent additions to the whole genome.
This chapter summarizes the recent findings of bacterial genomics and comments on the themes and trends which are emerging. A variety of techniques and methods are available to construct physical maps, and those most commonly employed involve pulsed field gel electrophoresis (PFGE) of macrorestriction fragments generated by digesting intact genomic DNA, immobilized in agarose plugs, with rare-cutting enzymes. Hybridization techniques are often used to construct a map and to deduce the positions of genetic markers. In recent years significant effort has been devoted to developing direct-mapping techniques for large DNA molecules that do not require gel electrophoresis. Among the more promising of these are two new methods known as DNA combing and optical mapping, both of which make use of fluorescence microscopy and image analysis to visualize single DNA molecules. Overall, bacterial genomes range in size from about 0.6 to 9.4 Mb. In a recent review, it was suggested that there may be a relationship between the genome sizes and the lifestyles of bacteria. The genome size of Bacillus cereus strains varies from 5.5 to 6.3 Mb, and great diversity is seen in the number and organization of the chromosomes. Higher levels of gene expression might result from altered topological constraints acting on the DNA or from a more favorable location with respect to the origin of replication, leading to an increase in gene dosage.
Following a review of archaeal genomics, the author wishes to scrutinize the convenient though perhaps misleading construct that is organismic phylogeny. In so doing, the author will address theories of the origin of eukaryotes, theories which look to Archaea for answers, thanks to the (largely) accepted rooting of the tree of life in the bacterial branch. Interest in archaeal genomes, as judged by volume in the literature, has focused mainly on the extremely halophilic archaea. The genetic instability appears to be prevalent not only among members of the family Halobacteriaceae but also among members of the order Sulfolobales, though the insertion sequences responsible are apparently unrelated. Physical analysis of archaeal nucleoids lags behind the efforts of genetic characterization. Rooting of the tree, using anciently duplicated paralogous sequences further divided Archaea from Bacteria by positioning the root in the bacterial branch. The biological species will tend to restrict lateral transfers to specific groups. However, movement of genes from more distantly related organisms is not precluded, even between Bacteria and Eucarya. Specific genomes have subsets of these collections and typically possess a number of open reading frames not found anywhere else. Sequence evolution is responsible for the unmatched open reading frames, having erased the evidence of their homologies. Confounding things further are lateral genetic transfer and gene replacement, especially from extinct lineages. Comparative genomic analyses have begun, and they will undoubtedly transform the method of molecular evolutionary study.
This chapter begins by comparing the organization of DNA in bacteria and eukaryotes. The possibility that similar mechanisms are utilized by both eukaryotes and prokaryotes to separate or disentangle sister chromatids or daughter strands is outlined. It is suggested that transcriptional activity and resulting RNA-protein (hnRNP) particles may play a role in DNA strand separation in eukaryotic cells, just as coupled transcription-translation does in prokaryotes. Two characteristics of a bacterial cell, such as Escherichia coli, may contribute to the difficulty in understanding its mechanism of genome segregation: first, the occurrence of DNA synthesis throughout the whole cell cycle during rapid growth, and second, the lack of a unique centromere sequence, which is characteristic of the eukaryotic chromosome. In vitro studies have shown that phenomena like phase separation, monomolecular collapse, and intermolecular aggregation of the DNA into a condensed state can be induced by high concentrations of proteins (macromolecular crowded solution) and ions. Visualization of RNA transcription and DNA replication has shown that these two processes occur in hundreds of different domains scattered throughout the S-phase nucleus. The chapter discusses possible mechanisms used in bacteria for movement of DNA and for nucleoid segregation. It also focuses on separation of daughter strands in bacteria, and separation of sister chromatids in eukaryotes. The genomic organization may help in chromosome segregation, the expression of genes fulfilling a role in the fundamental process of daughter strand separation.
Sequencing of complete microbial genomes, pioneered in 1995 by J. C. Venter and colleagues, continues at an ever-increasing pace. The availability of complete genome sequences has had a major impact on the view of microbial evolution. Comparative analysis of the complete genomes of several diverse microorganisms on the basis of such properties as codon usage, open reading frame (ORF) density, and the lengths of coding regions shows many common trends in their organization. Comparison of the complete genomes of closely related bacterial species, Mycoplasma genitalium and Mycoplasma pneumoniae, showed a significant degree of synteny between these organisms. Homologous genes and their products can be classified into orthologs, related by vertical descent (e.g., speciation), and paralogs, related by duplication. Pathogenic bacteria import a variety of metabolites from their hosts, which allows them to shed genes encoding enzymes for some of the metabolic pathways. Comparative analysis of the available microbial genomes reveals both conservation of protein families and diversity of gene repertoires and gene organization among organisms that belong to diverse phylogenetic lineages. In many cases, bacterial genes seem to have substituted for the original genes of the archaeal-eukaryotic lineage, making phylogenetic reconstructions extremely complicated.
This chapter talks about bacterial genomes, primarily those of Escherichia coli and Salmonella typhimurium (proper name, Salmonella enterica serovar Typhimurium). For these bacteria, abundant information is available on evolutionary relationships, high-quality genetic maps exist (E. coli is completely sequenced), and there is extensive knowledge of mechanisms of recombination. Comparing the genomes of E. coli and S. typhimurium is therefore a natural starting point for discussing the forces which determine genome organization and stability in general. The degree to which ectopic exchanges between directly repeated sequences are RecA dependent varies with size and distance. Large chromosomal duplications are genetically unstable but are stabilized by recA mutations, implicating homologous recombination in their formation and loss. Genes expressed at high levels are generally located in the origin-proximal half of the chromosome, presumably because closeness to the origin of DNA replication gives a gene dosage effect. Tandem duplications, and their associated deletions and translocations, create novel sequence join points which potentially have selective value for the cell. Recombination between directly oriented repeat sequences can create a DNA fragment (linear or circular, depending on the mechanism of recombination) which can recombine with the chromosome. Inversions can occur between homologous short and long sequences. Most inversions isolated in E. coli and S. typhimurium are reported as having no significant effects on growth rate, but the few translocations made and tested in E. coli are associated with decreases in growth rate of up to a few percent.
Illegitimate recombination is a ubiquitous phenomenon and includes three types of events. In the first class, rearrangements occur by recombination between short homologous sequences. A second class is associated with site-specific elements. A last class groups all rearrangements in which the newly linked sequences share less than 3 bp of homology and have no homology with known specific sites. Oxidative lesions are known to induce rearrangements. Deletions by illegitimate recombination between short homologous sequences were reported in Escherichia coli fur mutants, in which a defect in iron metabolism regulation results in increased oxidative damage. In bacteria, transcription inhibits deletion between tandemly repeated sequences 10-fold. In contrast, transcription was shown to increase recombination between nonhomologous sequence. The stimulation of transposon excision by rolling-circle replication adds to the long list of indirect evidence that supports the occurrence of the replication slippage events in vivo. Most of the genetic studies of illegitimate recombination were performed in E. coli, either on the chromosome or with bacteriophages. Most of the recombination events between short homologous sequences occur independently from the action of RecA, since the length considered is far below the E. coli minimum efficient processing segment (MEPS). Topoisomerases are enzymes that modify the supercoiling of molecules through transient breakage and ligation of DNA strands. The first evidence that topoisomerases may promote rearrangements in bacteria came from the work of Ikeda and collaborators. Illegitimate recombination is a major issue in eukaryotes, because it is at the origin of numerous pathological disorders.
Transposable elements (TEs) have been defined as DNA sequences able to insert at many sites in the genome. At present there are about 500 TEs identified in many different bacterial species. Most insertion sequence (IS) elements and transposons were discovered after their transposition into genes of interest. In bacteria the relative juxtapositions of genes are not necessarily important because most will be involved in producing tows-acting factors, which are able to fulfill their function irrespective of their location or arrangement in the genome. Nonreplicative (or "cut-and-paste") transposons are excised from the donor site by double-strand breaks and inserted at the target site without the duplication of the transposon sequences (e.g., IS10 and IS50). Some bacteriophages are also considered to group with the replicative transposons because they use transposition to replicate during the lytic phase of their life cycles. It should be noted that the transposons in the portable-homology rearrangements are no longer flanked by the same direct repeats of target sequences as they were before the rearrangements. Usually composite transposons have their IS elements in the inverted-repeat configuration so that homologous recombination only causes the inversion of the markers within the transposon. Conjugative transposons harbor within the same sequence the cellular (transposition) and the intercellular (conjugation) mobilities. In the few experiments where selection has been maintained for many generations, IS elements have been found to have a major effect on the genetic structure of the population.
This chapter is an interdisciplinary attempt to bring together physical and biological perspectives on the organization of DNA within the bacterial cell. In living bacteria the nucleoid can be observed by phase-contrast microscopy when the cells are immersed in a medium with a high refractive index, such as aqueous gelatin or polyvinylpyrrolidone, adjusted to give isoosmotic conditions. The negative superhelical tension in the chromosome is maintained through the combined action of topoisomerases I and IV and DNA gyrase. The latter enzyme actively supercoils the DNA at the expense of free energy of ATP hydrolysis. Anchoring of DNA to a fixed structure like the plasma membrane could occur if envelope proteins are being transcriptionally and cotranslationally inserted into the membrane. A DNA model that has been of considerable use in polymer theory is that of the wormlike chain. Chemical details, including the base pair sequence, are smoothed out completely. Linear, double-stranded DNA in solution is viewed as a homogeneous elastic rod undulating in a heat bath at temperature T. Solvent molecules are continually buffeting the rod, which adopts wormlike configurations. The majority of cytoplasmic particles consist of small globular proteins, and it is these that we need to account for within an approximate picture based on statistical physics.
Gyrase was the first type II enzyme discovered, and it remains unique for its ability to introduce negative supercoils into relaxed, positively or negatively supercoiled DNA at the expense of ATP binding and hydrolysis. An essential enzyme in bacteria, gyrase is critical for nearly all complex transactions that involve DNA, including recombination, replication, transcription, and chromosome segregation. The homeostatic supercoil regulation model was inspired by two observations. First, many promoters sense supercoiling levels. Second, expression of gyrase increases when the chromosome becomes relaxed, whereas the expression of topo I requires high supercoiling levels. Chromosome replication has three critical stages (initiation, elongation, and segregation), each with different supercoiling problems. For initiation, the two strands of oriC must separate to allow assembly of the replication machinery (the replisome). Transcription encompasses supercoiling problems similar in two respects to those of replication. First, transcription initiation requires unpairing of the DNA duplex, and supercoiling can influence this step as it does in initiation of DNA replication. Second, transcription is similar to replication in that movement of RNA polymerase generates temporary positive supercoiling ahead of, and negative supercoiling behind, the DNA segment that is being transcribed. The torsional effects of transcription have been studied with supercoil-sensitive promoters by measuring the formation of Z-DNA and by monitoring the extrusion of cruciforms. Homologous and site-specific genetic recombination, adaptive mutation, "supercoil regulated" gene transcription, gene order on chromosomes, and plasmid-chromosome replication segregation are all phenomena that are likely to be influenced by DNA dynamics.
This chapter discusses the influence a gene's neighbors can have on that gene's expression and looks at the mechanisms by which this can occur. Both the packaging of DNA and its metabolic functioning are greatly facilitated by supercoiling. Supercoiling imparts torsional stress to DNA, which influences its interaction with RNA polymerase and other DNA-binding proteins as well as contributing to its compaction. The importance of supercoiling to cellular functioning is illustrated by the tight control prokaryotes maintain over this property of their genomes. The greatest difference between small plasmids and larger molecules may be that plasmids can rotate the entire molecule around its long axis during transcription, relieving local changes in supercoding levels. One additional finding that further supports the twin-supercoil domain model is that the length of the transcript located upstream from the leu-500 promoter affects the level of supercoiling changes. The large difference in transcription rates between mutant and wild-type strains may here be obscuring the fact that doubling the output of a gene can have significant consequences for a cell, either good or bad, depending on the situation. 4,5',8-trimethylpsoralen (TMP) is a sensitive indicator of supercoiling, and it is claimed to be able to detect changes in supercoil levels of 15% and 12%. Changes of this magnitude are regularly experienced by the genome of Escherichia coli.
This chapter evaluates the present knowledge about the degree of stability of the genome of enteric bacteria. It discusses about the forces which have contributed to maintaining stability, and considers the types of rearrangements which can and do occur even in those genomes generally considered to be stable. With the use of methods of physical analysis of DNA, and especially the introduction of the use of pulsed-field gel electrophoresis (PFGE), used first in Escherichia coli by Smith and colleagues, the genomes of many strains were determined and conservation of gene order was shown to be the rule. It talks about two modifications of PFGE methods, which allow the determination of genome structure in many strains, have been used in representative enteric bacteria. It focuses on the forces which would be expected to act in a conservative way to maintain gene order. It is possible that transspecies recombination due to conjugation or transduction followed by homologous recombination, though very rare, is so important that colinearity is an important advantage. The author concludes that the overall conservation of gene order within the enteric bacteria which was reported many years ago in comparisons of E. coli and Salmonella typhimurium has been confirmed by the analysis of the physical maps of many strains of Salmonella and other enteric bacteria determined by PFGE and by the comparison of nucleotide sequences.
In this chapter, large-scale DNA rearrangements, including deletions, amplifications, and other DNA alterations such as interchromosomal interactions, are dealt with and special attention is given to the instability of Streptomyces species. Streptomyces species belong to the order Actinomycetales and are filamentous gram-positive bacteria living in the soil. They possess a complex life cycle that begins on solid medium by the germination of spores to form the vegetative mycelium. The phenotypic instability is closely associated with genomic rearrangements, such as large deletions and intense tandem DNA amplifications. The linear structure of the chromosomal DNA raises questions about the replication mechanisms, the unstable region corresponding to natural termini of chromosomal replication. An interesting characteristic of genetic instability is that it is inducible. Hypotheses about the possible origin of this instability are based on reports of studies where the level of instability has been altered by a variety of treatments. The spontaneous frequencies of instability can be increased by treatments as varied as exposure to UV light, culture in the presence of intercalating agents, cold storage, temperature shifts during culture, nutritional shifts, and the regeneration of protoplasts. Homologous recombination is involved in numerous cases of chromosome rearrangement in bacteria. Genes may be directly identified by the phenotype accompanying their deletion or amplification. The exchanges of the terminal regions could be due to the structure which is suspected to keep the DNA ends together in vivo.
This chapter describes methods for assessing the frequency of successful gene acquisition and the fraction of modern genomes that has been acquired by horizontal transfer of useful phenotypic information. Although both loss and acquisition strongly influence genome evolution, the authors suggest that the two processes are synergistic due to the limits on genome expansion. To assess the contribution of genomic flux to genome evolution and speciation, one must measure rates of gene loss and acquisition. All novel phenotypes were conferred by horizontally transferred genes. Therefore, genes for the diverse metabolic pathways have formed slowly at earlier times and not during the course of competitive invasion of novel ecological niches. Regardless of the relative rates of these two processes, it is clear that gene loss and acquisition have facilitated exploration of novel environments and allowed more rapid divergence of bacterial types in competitive situations. The authors propose that the prevalence of gene clusters stands as evidence that genomic flux has historically been a primary contributor to genome evolution. They have outlined a model for the evolution of bacterial genomes through the synergistic processes of gene acquisition and gene loss. The organization of genes into operons reflects the important role in bacterial evolution and speciation played by genomic flux—the development of bacterial genomes by gene loss and gene acquisition.
Recombination includes both the rearrangement of the genetic material in an individual genome and gene transfer, the incorporation of exogenous genetic material into an individual genome. While rearrangement and gene transfer share some molecular processes, their functions are different in most basic ways. Nevertheless, the rarity of phylogenetically distant gene transfer is occasionally compensated for by an increased likelihood of its retention. In bacteria there are three major categories of gene transfer mechanisms: conjugation, transduction, and transformation. All operate in Escherichia coli. The organization of the E. coli chromosome can be seen in three perspectives: the arrangement of genes and basic chromosomal functions in an individual genome, the genetic and structural variation among strains of the species, and the dynamics of DNA exchange. The replacement's ancestry and therefore phylogenetic relationships will be different from those of its unreplaced neighbor. So gene transfer would result in new local phylogenies. In the hypervariable regions, the diversifying selection dominates the local scene: only the most recently separated lines remain identical. This diversification must be due to the new rfb complexes' relatively high frequency of occurrence and retention. The possibility of major recent changes in the rates of intraspecific gene transfer in E. coli seems contradicted by the presence of clonal segments.
The delineation of groups of genes and proteins that trace back to common ancestors derives legitimacy and depth when functions are known and can be assessed for relatedness within any sequence related group. The goal in understanding protein evolution is the reconstruction of past events that have given rise to the inventory of extant proteins. Early ancestral organisms that existed before the separation of the three branches of the tree of life are believed to have possessed many of the functions of cell physiology and metabolism that are found in all living forms today. Molecular phylogeny tries to trace all the speciation events back to the last universal common ancestor. A few Escherichia coli proteins have been rearranged and have fused since their initial duplication and divergence, and the separate elements within complex proteins need to be identified and separated. In E. coli many of the paralogous modules were clearly distinguishable by inspection of the DARWIN output. To inquire into the descent of the proteins of a family from their ancestral sequence, the authors have studied further the history of individual families. With the many whole-genome-sequencing projects under way today and planned for the future, there will be an abundance of data to be analyzed further along these lines, with the ambitious aim of eventually reconstructing the paths of evolution of all proteins.
Escherichia coli has been an obvious initial study for proteomics, as it is one of the best-understood organisms and its genome was one of the first sequenced. This chapter reviews the technologies that make proteomics a genuinely new approach to protein science, and reflects on the emergence of global protein studies that allow proteins separated on gels to be identified, thus enabling analysis of protein fluxes and networks. Compared to genomics, proteomics provides information on the end products of gene expression and, in essence, is an examination of the “tools” an organism uses to survive and proliferate in an environment. Genomics provides the total information base or capacity of an organism to survive in terms of the potential gene products, open reading frames, and organization of the genes. Fundamental approaches to protein identification have not changed much over the past decade, e.g., N-terminal sequencing, molecular weight, protein pI, amino acid analysis and peptide mapping. Improved methods are based on greater resolution of peptide masses by mass spectrometry (MS). In living organisms it is unlikely that at any one time all genes in the genome will be expressed. The differential display of proteins by 2-D PAGE has long been used with E. coli to monitor response to stimuli. Further advances in protein solubilization and detection methods are required to resolve in gel all proteins from a proteome, in particular, those that are membrane associated or of high hydrophobicity.
This chapter outlines how complete genome sequences promote the development of broad genome-wide or functional-genomics approaches for investigating individual genes to entire gene sets. Classical approaches had focused on a small portion of the Saccharomyces cerevisiae, Bacillus subtilis, and Escherichia coli genes. The power of genetically tractable model systems is that they facilitate the analysis of gene function at the level of the intact organism. In various adaptations of this method, PCR-generated replacement DNAs are used for constructing strains harboring simple point mutations or large deletions, making reporter gene constructs for any gene, inserting epitope or fluorescent tags in any protein, and studying the regulatory information controlling gene expression. Regulatable promoter replacement cassettes in combination with the availability of complete genome sequences will be important resources for studying gene function in yeasts and other eukaryotic systems. New methods for assessing gene expression, protein-protein interactions, and protein-nucleic acid interactions mean that it is now possible to use genome-wide approaches to study gene function. These global approaches should significantly accelerate progress toward a detailed understanding of the model organisms B. subtilis, E. coli, and S. cerevisiae. Genome-wide methods will generate exponentially increasing amounts of biological data. Effective systems of data storage, management, and analysis must keep pace with data generation so that the data and the tools necessary to apply these resources to individual research programs are accessible to all researchers.
Full text loading...