1887

Chapter 2 : Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models

MyBook is a cheap paperback edition of the original book and will be sold at uniform, low price.

Ebook: Choose a downloadable PDF or ePub file. Chapter is a downloadable PDF file. File must be downloaded within 48 hours of purchase

Buy this Chapter
Digital (?) $15.00

Preview this chapter:
Zoom in
Zoomout

Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, Page 1 of 2

| /docserver/preview/fulltext/10.1128/9781555818180/9781555811518_Chap02-1.gif /docserver/preview/fulltext/10.1128/9781555818180/9781555811518_Chap02-2.gif

Abstract:

Every bacterial protein-coding gene resides inside an open reading frame (ORF), but far from every ORF observed in bacterial DNA sequence hosts a gene. Nevertheless, ORF detection is a reasonable start for gene hunting. The average length of protein-coding ORFs stayed close to 900 nt. Such an observation suggests that to identify a "long" gene-hosting ORF should be a rather simple matter. A Markov model theory provided a natural basis for mathematical treatment of DNA sequence. Ordinary Markov models have been used since the earliest studies of DNA sequences. Later, with a larger amount of sequence data available, three-periodic inhomogeneous Markov models were proven to be more informative and more useful for protein-coding-sequence modeling and recognition. The performance of the GeneMark.hmm program was tested with several control sets, including 10 complete bacterial genomes. The complete genomic sequence of consists of 4,639,221 nt, with 4,288 genes annotated. The chapter talks about higher-order models and models of typical and atypical genes. Genes predicted as atypical are likely to be horizontally transferred genes, a category of special interest for evolutionary studies, as well as for studies of pathogenic bacteria whose pathogenicity islands or antibiotic-resistance genes could be relatively recent additions to the whole genome.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2

Key Concept Ranking

Ribosome Binding Site
0.42515737
0.42515737
Highlighted Text: Show | Hide
Loading full text...

Full text loading...

Figures

Image of FIGURE 1
FIGURE 1

Distributions of ORF lengths in the genome for protein-coding ORFs and for random ORFs found in intergenic regions.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 2
FIGURE 2

Positional nucleotide frequency patterns observed in 4,076 sequences, with those that surrounded annotated gene starts aligned against the start codon positions. Frequency values for T, A, C, and G nucleotides in each position are shown superimposed. The three-periodic pattern is clearly seen within the protein-coding region, downstream from the start codon position at positions 1 to 3. A rather random pattern is seen upstream of the start codon position, with the RBS region, positions −20 to −5, shown as a sharp decrease of pyrimidins T and C and abundance of purines G and A.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 3
FIGURE 3

False-positive and false-negative rates produced by the GeneMark core algorithm for three sets of gene fragments derived from genes of class I (a), class II (b), and class III (c).

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 4
FIGURE 4

The architecture of the Hidden Markov model used in the GeneMark.hmm algorithm. The diagram shows the hidden states and possible transitions between them. Note that the architecture does not allow for a gene overlap. (Reprinted from [ ] with permission of the publisher.)

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 5
FIGURE 5

Distribution of gene overlaps over the overlap length as annotated in GenBank. (a) Same-strand overlap; (b) Opposite-strand overlap.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 6
FIGURE 6

GeneMark.hmm performance as a function of the order of Markov model used in the algorithm. The results of comparison between the predicted and annotated parses are shown for the sequence of the first 500,000 nt taken from the entire genomic sequence. This contig contains 468 annotated genes. (A) Exact prediction: the fraction of annotated genes with both the 5′ and 3′ ends predicted exactly. ◊, the predicted parse was generated by the Viterbi algorithm with the Markov models for typical genes only; ○, the Markov model for atypical genes was included in the GeneMark.hmm algorithm; Δ, the parse was corrected by postprocessing with the RBS model. (B) Missing genes: the fraction of annotated genes with neither the 5′ nor the 3′ end predicted exactly (the postprocessing procedure did not change the number of missing genes [ Table 2 ]). The graph notations are the same as in panel A. (Reprinted from [ ] with permission of the publisher.)

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 7
FIGURE 7

Graphical output of the combined GeneMark and GeneMark.hmm programs ( ). The ORFs are indicated by the thin lines in the middle of each panel. The coding-potential graphs, obtained by plotting local a posteriori probabilities of protein-coding as defined by GeneMark, are shown in each panel. The probability values generated by using the typical-gene model are shown by solid lines; those obtained by using atypical-gene models are shown by dashed lines. The genes predicted by GeneMark.hmm are shown by bold face solid lines at the bottoms of the panels. Hatched rectangles mark the regions bound by two in-frame stop codons and possessing high enough coding potential.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint

References

/content/book/10.1128/9781555818180.chap2
1. Altshul, S. F.,, T. L. Madden,, A. A. Schaffer,, J. Zhang,, Z. Zhang,, W. Miller,, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:33893402.
2. Baldi, P.,, Y. Chauvin,, T. Hunkapiller,, and M. A. McClure. 1994. Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA 9:10591063.
3. Berg, O. G.,, and P. H. von Hippel. 1988. Selection of DNA binding sites by regulatory proteins. Trends Biochem. Sci. 13:207211.
4. Billingsley, P. 1961. Statistical methods in Markov chains. Ann. Math. Stat. 82:1240.
5. Blattner, F. R.,, G. Plunkett HI,, C. A. Bloch,, N. T. Perna,, V. Burland,, M. Riley,, J. Collado-Vides,, J. D. Glasner,, C. K. Rode,, G. F. Mayhew,, J. Gregor,, N. W. Davis,, H. A. Kirkpatrick,, M. A. Goeden,, D. J. Rose,, B. Mau,, and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:14531462.
6. Borodovsky, M.,, and J. McIninch. 1993. GeneMark: parallel gene recognition for both DNA strands. Comput. Chem. 18:259268.
6a.. Borodovsky, M.,, and J. McIninch. 1996. [Online.] http://genemark.biology.gatech.edu/GeneMark. 1992.
7. Borodovsky, M.,, J. McIninch,, E. Koonin,, K. Rudd,, C. Médigue,, and A. Danchin. 1995. Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res. 23:35543562.
8. Borodovsky, M.,, K. Rudd,, and E. Koonin. 1994. Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res. 22:47564767.
9. Borodovsky, M.,, Y. A. Sprizhitsky,, E. I. Golovanov,, and A. A. Alexandrov. 1986. Statistical features in the Escherichia coli genome functional primary structure. II. Non-homogeneous Markov chains. Mol. Biol. 20:833840.
10. Borodovsky, M.,, Y. A. Sprizhitsky,, E. I. Golovanov,, and A. A. Alexandrov. 1986. Statistical features in the Escherichia coli genome functional primary structure. III. Computer recognition of protein coding regions. Mol. Biol. 20: 11441150.
10a.. Borodovsky, M. Unpublished data.
11. Bult, C. J.,, O. White,, G.J. Olsen,, L. Zhou,, R. D. Fleischmann,, G. G. Sutton,, J. A. Blake,, L. M. FitzGerald,, R. A. Clayton,, J. D. Gocayne,, A. R. Kerlavage,, B. A. Dougherty,, J. Tomb,, M. D. Adams,, C. I. Reich,, R. Overbeek,, E. F. Kirkness,, K. G. Weinstock,, J. M. Merrick,, A. Glodek,, J. D. Scott,, N. S. Geoghagen,, J. F. Weidman,, J. L. Fuhrmann,, D. T. Nguyen,, T. Utterback,, J. M. Kelley,, J. D. Peterson,, P. W. Sadow,, M. C. Hanna,, M. D. Cotton,, M. A. Hurst,, K. M. Roberts,, B. B. Kaine,, M. Borodovsky,, H. P. Klenk,, C. M. Fraser,, H. O. Smith,, C. R. Woese,, and J. C. Venter. 1996. Complete genome sequence of the methanogenic archeon Methanococcus jannaschii . Science 273: 10581073.
12. Burge, C.,, and S. Karlin. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268:7894.
13. Churchill, G. A. 1989. Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51:7994.
14. Clover, T. M.,, and J. A. Thomas. 1991. Elements of Information Theory . John Wiley & Sons, New York, N.Y. 1992.
14a.. Danchin, A. Personal communication.
15. Durbin, R.,, S. Eddy,, A. Krogh,, and G. Mitchison. 1998. Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids . Cambridge University Press, Cambridge, United Kingdom. 1992.
16. Erickson, J. W.,, and G. G. Altaian. 1979. A search for patterns in the nucleotide sequence of the MS2 genome. J. Math. Biol. 7:219230.
17. Fickett, J. W.,, and C. S. Tung. 1992. Assessment of protein coding measures. Nucleic Acids Res. 20:64416450.
18. Fleischmann, R. D.,, M. D. Adams,, O. White,, R. A. Clayton,, E. F. Kirkness,, A. R. Kerlavage,, C. J. Bult,, J.-F. Tomb,, B. A. Dougherty,, J. M. Merrick,, K. McKenney,, G. Sutton,, W. FitzHugh,, C. A. Fields,, J. D. Gocayne,, J. D. Scott,, R. Shirley,, L.-I. Liu,, A. Glodek,, J. M. Kelley,, J. F. Weidman,, C. A. Phillips,, T. Spriggs,, E. Hedblom,, M. D. Cotton,, T. R. Utterback,, M. C. Hanna,, D. T. Hguyen,, D. M. Saudek,, R. C. Brandon,, L. D. Fine,, J. L. Fritchman,, J. L. Fuhrmann,, N. S. M. Geoghagen,, C. L. Gnehm,, L. A. McDonald,, K. V. Small,, C. M. Fräser,, H. O. Smith,, and J. C. Venter. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae . Science 269: 496512.
19. Fräser, C. M.,, J. D. Gocayne,, O. White,, M. D. Adams,, R. A. Clayton,, R. D. Fleischmann,, C. J. Bult,, A. R. Kerlavage,, G. Sutton,, J. M. Kelley,, J. L. Fritchman,, J. F. Weidman,, K. V. Small,, M. Sandusky,, J. L. Fuhrmann,, D. T. Nguyen,, T. R. Utterback,, D. M. Saudek,, C. A. Phillips,, J. M. Merrick,, J.-F. Tomb,, B. A. Dougherty,, K. F. Bott,, P.-C. Hu,, T. S. Lucier,, S. N. Peterson,, H. O. Smith,, C. A. Hutchison III,, and J. C. Venter. 1995. The minimal gene complement of Mycoplasma genitalium . Science 270:397403.
20. Frishman, D.,, A. Mironov,, H. W. Mewes,, and M. Gelfand. 1998. Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res. 26:29412947. (Erratum, 26:3870.).
21. Gatlin, L. 1972. Information Theory and Living Systems . Columbia University Press, New York, N.Y. 1992.
22. Green, P.,, D. Lipman,, L. Hillier,, R. Waterston,, D. States,, and J. M. Claverie. 1993. Ancient conserved regions in new gene sequences and the protein databases. Science 259:17111716.
23. Hayes, W. S.,, and M. Borodovsky. 1998. Deriving ribosome binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction, p. 279290. In Proceedings of the Pacific Symposium on Biocomputing 1998. World Scientific, Maui, Hawaii.
23a.. Hayes, W. S.,, and M. Borodovsky. Unpublished data.
24. Henderson, J.,, S. Salzberg,, and K. H. Fasman. 1997. Finding genes in DNA with a hidden Markov model. J. Comp. Biol. 4:127141.
25. Himmelreich, R.,, H. Hilbert,, H. Plagens,, E. Pirkl,, B.-C. Li,, and R. Herrmann. 1996. Complete sequence of the genome of the bacterium Mycoplasma pneumoniae . Nucleic Acids Res. 24:44204449.
26. Hogg, R. V.,, and E. A. Tanis. 1997. Probability and Statistical Inference . Prentice Hall, Englewood Cliffs, N.J. 1992.
27. Jelinek, F.,, and R. I. Mercer,. 1980. Interpolated estimation of Markov source parameters from sparse data p. 252260. In E. S. Gelsema, and L. N. Kanak (ed.), Pattern Recognition in Practice . Elsevier/North Holland, New York, N.Y..
28. Kaneko, T.,, A. Tanaka,, S. Sato,, H. Kotani,, T. Sazuka,, N. Miyajima,, M. Sugiura,, and S. Tabata. 1995. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803.1. Sequence features in the 1 Mb region from map positions 64% to 92% of the genome. DNA Res. 2:153166.
29. Kleffe, J.,, K. Hermann,, and M. Borodovsky. 1995. Statistical analysis of GeneMark performance by cross-validation. Comput. Chem. 20: 123134.
30. Klenk, H. P.,, R. A. Clayton,, J. Tomb,, O. White,, K. E. Nelson,, K. A. Ketchum,, R. J. Dodson,, M. Gwinn,, E. K. Hickey,, J. D. Peterson,, D. L. Richardson,, A. R. Kerlavage,, D. E. Graham,, N. C. Kyrpides,, R. D. Fleischmann,, J. Quackenbush,, N. H. Lee,, G. G. Sutton,, S. Gill,, E. F. Kirkness,, B. A. Dougherty,, K. McKenney,, M. D. Adams,, B. Loftus,, S. Peterson,, C. I. Reich,, L. K. McNeil,, J. H. Badger,, A. Glodek,, L. Zhou,, R. Overbeek,, J. D. Gocayne,, J. F. Weidman,, L. McDonald,, T. Utterback,, M. D. Cotton,, T. Spriggs,, P. Artiach,, B. P. Kaine,, S. M. Sykes,, P. W. Sadow,, K. P. D'Andrea,, C. Bowman,, C. Fujii,, S. A. Garland,, T. M. Mason,, G. J. Olsen,, C. M. Fräser,, H. O. Smith,, C. R. Woese,, and J. C. Venter. 1997. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus . Nature 390:364370.
31. Koonin, E. V.,, P. Bork,, and C. Sander. 1994. Yeast chromosome III: new gene functions. EMBO J. 13:493503.
32. Koonin E. V.,, and M. Y. Galperin. 1997. Prokaryotic genomes: the emerging paradigm of genome-based microbiology. Curr. Opin. Genet. Dev. 7:757763.
33. Koonin, E. V.,, A. R. Mushegian,, M. Y. Galperin,, and D. R. Walker. 1997. Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol. 25:619637.
34. Krogh, A.,, M. Brown,, I. S. Mian,, K. Sjolander,, and D. Haussler. 1994. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235: 15011531.
35. Krogh, S.,, I. S. Mian,, and D. Haussler. 1994. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 22:47684778.
36. Kulp, D.,, D. Haussler,, M. G. Reese,, and F. H. Eeckman,. 1996. In D. J. States,, P. Agarwal,, T. Gaasterland,, L. Hunter,, and R. F. Smith (ed.), Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology (ISMB-96), p. 134142. AAAI Press, Memo Park, Calif..
37. Kunst, F.,, N. Ogasawara,, I. Moszer,, A. M. Albertini,, G. Alloni,, V. Azevedo,, M. G. Bertero,, P. Bessieres,, A. Bolotin,, S. Bordiert,, R. Borriss,, L. Boursier,, A. Brans,, M. Braun,, S. C. Brignell,, S. Bron,, S. Brouillet,, C. V. Bruschi,, B. Caldwell,, V. Capuano,, N. M. Carter,, S. K. Choi,, J. J. Codani,, I. F. Connerton,, N. J. Cummings,, R. A. Daniel,, F. Denizot,, K. M. Devine,, A. Dusterhoft,, S. D. Ehrlich,, P. T. Emmerson,, K. D. Entian,, J. Errington,, C. Fabret,, E. Ferrari,, D. Foulger,, C. Fritz,, M. Fujita,, Y. Fujita,, S. Fuma,, A. Galizzi,, N. Galleron,, S. Y. Ghim,, P. Glaser,, A. Goffeau,, E. J. Golightly,, G. Grandi,, G. Guiseppi,, B. J. Guy,, K. Haga,, J. Haiech,, C. R. Harwood,, A. Henaut,, H. Hilbert,, S. Holsappel,, S. Hosono,, M. F. Hullo,, M. Itaya,, L. Jones,, B. Joris,, D. Karamata,, Y. Kasahara,, M. Klaerr-Blanchard,, C. Klein,, Y. Kobayashi,, P. Koetter,, G. Koningstein,, S. Krogh,, M. Kumano,, K. Kurita,, A. Lapidus,, S. Lardinois,, J. Lauber,, V. Lazarevic,, S. M. Lee,, A. Levine,, H. Liu,, S. Masuda,, C. Mauel,, C. Médigue,, N. Medina,, R. P. Mellado,, M. Mizuno,, D. Moestl,, S. Nakai,, M. Noback,, D. Noone,, M. O'Reilly,, K. Ogawa,, A. Ogiwara,, B. Oudega,, S. H. Park,, V. Parro,, T. M. Pohl,, D. Portetelle,, S. Porwollik,, A. M. Prescott,, E. Presecan,, P. Pujic,, B. Purnelle,, G. Rapoport,, M. Rey,, S. Reynolds,, M. Rieger,, C. Rivolta,, E. Rocha,, B. Roche,, M. Rose,, Y. Sadaie,, T. Sato,, E. Scanlan,, S. Schleich,, R. Schroeter,, F. Scoffone,, J. Sekiguchi,, A. Sekowska,, S. J. Seror,, P. Serror,, B. S. Shin,, B. Soldo,, A. Sorokin,, E. Tacconi,, T. Takagi,, H. Takahashi,, K. Takemaru,, M. Takeuchi,, A. Tamakoshi,, T. Tanaka,, P. Terpstra,, A. Tognoni,, V. Tosato,, S. Uchiyama,, M. Vandenbol,, F. Vannier,, A. Vassarotti,, A. Viari,, R. Wambutt,, E. Wedler,, H. Wedler,, T. Weitzenegger,, P. Winters,, A. Wipat,, H. Yamamoto,, K. Yamane,, K. Yasumoto,, K. Yata,, K. Yoshida,, H. F. Yoshikawa,, E. Zumstein,, H. Yoshikawa,, and A. Danchin. 1997. The complete genome sequence of the gram-positive bacterium Bacillus subtilis . Nature 390:249256.
38. Lawrence, J. G.,, and H. Ochman. 1997. Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44:383397.
39. Link, A. J.,, K. Robison,, and G. M. Church. 1997. Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12. Electrophoresis 18:12591313.
40. Lukashin, A. V.,, and M. Borodovsky. 1998. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26:11071115.
40a.. Lukashin, A. V.,, and M. Borodovsky. Unpublished data.
41. McIninch, J.,, W. Hayes,, and M. Borodovsky,. 1996. Application of GeneMark in multispecies environment, p. 176188. In D. J. States,, P. Agarwal,, T. Gaasterland,, L. Hunter,, and R. F. Smith (ed.), Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology (ISMB-96). AAAI Press, Menlo Park, Calif..
42. Médigue, C.,, T. Rouxel,, P. Vigier,, A. Henaut,, and A. Danchin. 1991. Evidence for horizontal gene transfer in Escherichia coli speciation. J. Mol. Biol. 222:851856.
43. Rabiner, L. R. 1989. Tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:257286.
44. Salzberg, S. L.,, A. L. Deicher,, S. Kasif,, and O. White. 1998. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26:544548.
44a.. Shmatkov, A. M.,, A. A. Melikyan,, F. L. Chernousko,, and M. Borodovsky. Unpublished data.
45. Smith, D. R.,, L. A. Doucette-Stamm,, C. Deloughery,, H.-M. Lee,, J. Dubois,, T. Aldredge,, R. Bashirzadeh,, D. Blakely,, R. Cook,, K. Gilbert,, D. Harrison,, L. Hoang,, P. Keagle,, W. Lumm,, B. Pothier,, D. Qiu,, R. Spadafora,, R. Vicare,, Y. Wang,, J. Wierzbowski,, R. Gibson,, N. Jiwani,, A. Caruso,, D. Bush,, H. Safer,, D. Patwell,, S. Prabhakar,, S. McDougall,, G. Shimer,, A. Goyal,, S. Pietrovski,, G. M. Church,, C. J. Daniels,, J.-I. Mao,, P. Rice,, J. Nolling,, and J. N. Reeve. 1997. Complete genome sequence of Methanobacterium thermoautotrophicum AH: functional analysis and comparative genomics. J. Bacteriol. 179:71357155.
46. Tavare, S.,, and B. Song. 1989. Codon preference and primary sequence structure in protein-coding regions. Bull. Math. Biol. 51:95115.
47. Tomb, J.-F.,, O. White,, A. R. Kerlavage,, R. A. Clayton,, G. G. Sutton,, R. D. Fleischmann,, K. A. Ketchum,, H. P. Klenk,, S. Gill,, B. A. Dougherty,, K. Nelson,, J. Quackenbush,, L. Zhou,, E. F. Kirkness,, S. Peterson,, B. Loftus,, D. Richardson,, R. Dodson,, H. G. Khalak,, A. Glodek,, K. McKenney,, L. M. Fitzegerald,, N. Lee,, M. D. Adams,, E. K. Hickey,, D. E. Berg,, J. D. Gocayne,, T. R. Utterback,, J. D. Peterson,, J. M. Kelley,, M. D. Cotton,, J. M. Weidman,, C. Fujii,, C. Bowman,, L. Watthey,, E. Wallin,, W. S. Hayes,, M. Borodovsky,, P. D. Karp,, H. O. Smith,, C. M. Fraser,, and J. C. Venter. 1997. The complete genomic sequence of the gastric pathogen Helicobacter pylori . Nature 388:539548.
48. Yada, T.,, and M. Hirosawa,. 1996. In D. J. States,, P. Agarwal,, T. Gaasterland,, L. Hunter,, and R. F. Smith (ed.), Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology (ISMB-96), p. 252260. AAAI Press, Menlo Park, Calif..

Tables

Generic image for table
TABLE 1

Sizes of sequence sets recommended for training a Markov model of a given order

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Generic image for table
TABLE 2

Percentage of words of given length (for genome) in which counts exceed the critical number, , of 400

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Generic image for table
TABLE 3

GeneMark performance for 10 complete genomes

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Generic image for table
TABLE 4

GeneMark.hmm performance

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2

This is a required field
Please enter a valid email address
Please check the format of the address you have entered.
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error