1887

Chapter 2 : Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models

MyBook is a cheap paperback edition of the original book and will be sold at uniform, low price.

Preview this chapter:
Zoom in
Zoomout

Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, Page 1 of 2

| /docserver/preview/fulltext/10.1128/9781555818180/9781555811518_Chap02-1.gif /docserver/preview/fulltext/10.1128/9781555818180/9781555811518_Chap02-2.gif

Abstract:

Every bacterial protein-coding gene resides inside an open reading frame (ORF), but far from every ORF observed in bacterial DNA sequence hosts a gene. Nevertheless, ORF detection is a reasonable start for gene hunting. The average length of protein-coding ORFs stayed close to 900 nt. Such an observation suggests that to identify a "long" gene-hosting ORF should be a rather simple matter. A Markov model theory provided a natural basis for mathematical treatment of DNA sequence. Ordinary Markov models have been used since the earliest studies of DNA sequences. Later, with a larger amount of sequence data available, three-periodic inhomogeneous Markov models were proven to be more informative and more useful for protein-coding-sequence modeling and recognition. The performance of the GeneMark.hmm program was tested with several control sets, including 10 complete bacterial genomes. The complete genomic sequence of consists of 4,639,221 nt, with 4,288 genes annotated. The chapter talks about higher-order models and models of typical and atypical genes. Genes predicted as atypical are likely to be horizontally transferred genes, a category of special interest for evolutionary studies, as well as for studies of pathogenic bacteria whose pathogenicity islands or antibiotic-resistance genes could be relatively recent additions to the whole genome.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2

Key Concept Ranking

Ribosome Binding Site
0.42515737
0.42515737
Highlighted Text: Show | Hide
Loading full text...

Full text loading...

Figures

Image of FIGURE 1
FIGURE 1

Distributions of ORF lengths in the genome for protein-coding ORFs and for random ORFs found in intergenic regions.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 2
FIGURE 2

Positional nucleotide frequency patterns observed in 4,076 sequences, with those that surrounded annotated gene starts aligned against the start codon positions. Frequency values for T, A, C, and G nucleotides in each position are shown superimposed. The three-periodic pattern is clearly seen within the protein-coding region, downstream from the start codon position at positions 1 to 3. A rather random pattern is seen upstream of the start codon position, with the RBS region, positions −20 to −5, shown as a sharp decrease of pyrimidins T and C and abundance of purines G and A.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 3
FIGURE 3

False-positive and false-negative rates produced by the GeneMark core algorithm for three sets of gene fragments derived from genes of class I (a), class II (b), and class III (c).

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 4
FIGURE 4

The architecture of the Hidden Markov model used in the GeneMark.hmm algorithm. The diagram shows the hidden states and possible transitions between them. Note that the architecture does not allow for a gene overlap. (Reprinted from [ ] with permission of the publisher.)

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 5
FIGURE 5

Distribution of gene overlaps over the overlap length as annotated in GenBank. (a) Same-strand overlap; (b) Opposite-strand overlap.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 6
FIGURE 6

GeneMark.hmm performance as a function of the order of Markov model used in the algorithm. The results of comparison between the predicted and annotated parses are shown for the sequence of the first 500,000 nt taken from the entire genomic sequence. This contig contains 468 annotated genes. (A) Exact prediction: the fraction of annotated genes with both the 5′ and 3′ ends predicted exactly. ◊, the predicted parse was generated by the Viterbi algorithm with the Markov models for typical genes only; ○, the Markov model for atypical genes was included in the GeneMark.hmm algorithm; Δ, the parse was corrected by postprocessing with the RBS model. (B) Missing genes: the fraction of annotated genes with neither the 5′ nor the 3′ end predicted exactly (the postprocessing procedure did not change the number of missing genes [ Table 2 ]). The graph notations are the same as in panel A. (Reprinted from [ ] with permission of the publisher.)

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint
Image of FIGURE 7
FIGURE 7

Graphical output of the combined GeneMark and GeneMark.hmm programs ( ). The ORFs are indicated by the thin lines in the middle of each panel. The coding-potential graphs, obtained by plotting local a posteriori probabilities of protein-coding as defined by GeneMark, are shown in each panel. The probability values generated by using the typical-gene model are shown by solid lines; those obtained by using atypical-gene models are shown by dashed lines. The genes predicted by GeneMark.hmm are shown by bold face solid lines at the bottoms of the panels. Hatched rectangles mark the regions bound by two in-frame stop codons and possessing high enough coding potential.

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Permissions and Reprints Request Permissions
Download as Powerpoint

References

/content/book/10.1128/9781555818180.chap2
1. Altshul, S. F.,, T. L. Madden,, A. A. Schaffer,, J. Zhang,, Z. Zhang,, W. Miller,, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25: 3389 3402.
2. Baldi, P.,, Y. Chauvin,, T. Hunkapiller,, and M. A. McClure. 1994. Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA 9: 1059 1063.
3. Berg, O. G.,, and P. H. von Hippel. 1988. Selection of DNA binding sites by regulatory proteins. Trends Biochem. Sci. 13: 207 211.
4. Billingsley, P. 1961. Statistical methods in Markov chains. Ann. Math. Stat. 82: 12 40.
5. Blattner, F. R.,, G. Plunkett HI,, C. A. Bloch,, N. T. Perna,, V. Burland,, M. Riley,, J. Collado-Vides,, J. D. Glasner,, C. K. Rode,, G. F. Mayhew,, J. Gregor,, N. W. Davis,, H. A. Kirkpatrick,, M. A. Goeden,, D. J. Rose,, B. Mau,, and Y. Shao. 1997. The complete genome sequence of Escherichia coli K-12. Science 277: 1453 1462.
6. Borodovsky, M.,, and J. McIninch. 1993. GeneMark: parallel gene recognition for both DNA strands. Comput. Chem. 18: 259 268.
6a.. Borodovsky, M.,, and J. McIninch. 1996. [Online.] http://genemark.biology.gatech.edu/GeneMark. 1992.
7. Borodovsky, M.,, J. McIninch,, E. Koonin,, K. Rudd,, C. Médigue,, and A. Danchin. 1995. Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res. 23: 3554 3562.
8. Borodovsky, M.,, K. Rudd,, and E. Koonin. 1994. Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res. 22: 4756 4767.
9. Borodovsky, M.,, Y. A. Sprizhitsky,, E. I. Golovanov,, and A. A. Alexandrov. 1986. Statistical features in the Escherichia coli genome functional primary structure. II. Non-homogeneous Markov chains. Mol. Biol. 20: 833 840.
10. Borodovsky, M.,, Y. A. Sprizhitsky,, E. I. Golovanov,, and A. A. Alexandrov. 1986. Statistical features in the Escherichia coli genome functional primary structure. III. Computer recognition of protein coding regions. Mol. Biol. 20: 1144 1150.
10a.. Borodovsky, M. Unpublished data.
11. Bult, C. J.,, O. White,, G.J. Olsen,, L. Zhou,, R. D. Fleischmann,, G. G. Sutton,, J. A. Blake,, L. M. FitzGerald,, R. A. Clayton,, J. D. Gocayne,, A. R. Kerlavage,, B. A. Dougherty,, J. Tomb,, M. D. Adams,, C. I. Reich,, R. Overbeek,, E. F. Kirkness,, K. G. Weinstock,, J. M. Merrick,, A. Glodek,, J. D. Scott,, N. S. Geoghagen,, J. F. Weidman,, J. L. Fuhrmann,, D. T. Nguyen,, T. Utterback,, J. M. Kelley,, J. D. Peterson,, P. W. Sadow,, M. C. Hanna,, M. D. Cotton,, M. A. Hurst,, K. M. Roberts,, B. B. Kaine,, M. Borodovsky,, H. P. Klenk,, C. M. Fraser,, H. O. Smith,, C. R. Woese,, and J. C. Venter. 1996. Complete genome sequence of the methanogenic archeon Methanococcus jannaschii . Science 273: 1058 1073.
12. Burge, C.,, and S. Karlin. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78 94.
13. Churchill, G. A. 1989. Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51: 79 94.
14. Clover, T. M.,, and J. A. Thomas. 1991. Elements of Information Theory . John Wiley & Sons, New York, N.Y. 1992.
14a.. Danchin, A. Personal communication.
15. Durbin, R.,, S. Eddy,, A. Krogh,, and G. Mitchison. 1998. Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids . Cambridge University Press, Cambridge, United Kingdom. 1992.
16. Erickson, J. W.,, and G. G. Altaian. 1979. A search for patterns in the nucleotide sequence of the MS2 genome. J. Math. Biol. 7: 219 230.
17. Fickett, J. W.,, and C. S. Tung. 1992. Assessment of protein coding measures. Nucleic Acids Res. 20: 6441 6450.
18. Fleischmann, R. D.,, M. D. Adams,, O. White,, R. A. Clayton,, E. F. Kirkness,, A. R. Kerlavage,, C. J. Bult,, J.-F. Tomb,, B. A. Dougherty,, J. M. Merrick,, K. McKenney,, G. Sutton,, W. FitzHugh,, C. A. Fields,, J. D. Gocayne,, J. D. Scott,, R. Shirley,, L.-I. Liu,, A. Glodek,, J. M. Kelley,, J. F. Weidman,, C. A. Phillips,, T. Spriggs,, E. Hedblom,, M. D. Cotton,, T. R. Utterback,, M. C. Hanna,, D. T. Hguyen,, D. M. Saudek,, R. C. Brandon,, L. D. Fine,, J. L. Fritchman,, J. L. Fuhrmann,, N. S. M. Geoghagen,, C. L. Gnehm,, L. A. McDonald,, K. V. Small,, C. M. Fräser,, H. O. Smith,, and J. C. Venter. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae . Science 269: 496 512.
19. Fräser, C. M.,, J. D. Gocayne,, O. White,, M. D. Adams,, R. A. Clayton,, R. D. Fleischmann,, C. J. Bult,, A. R. Kerlavage,, G. Sutton,, J. M. Kelley,, J. L. Fritchman,, J. F. Weidman,, K. V. Small,, M. Sandusky,, J. L. Fuhrmann,, D. T. Nguyen,, T. R. Utterback,, D. M. Saudek,, C. A. Phillips,, J. M. Merrick,, J.-F. Tomb,, B. A. Dougherty,, K. F. Bott,, P.-C. Hu,, T. S. Lucier,, S. N. Peterson,, H. O. Smith,, C. A. Hutchison III,, and J. C. Venter. 1995. The minimal gene complement of Mycoplasma genitalium . Science 270: 397 403.
20. Frishman, D.,, A. Mironov,, H. W. Mewes,, and M. Gelfand. 1998. Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res. 26: 2941 2947. ( Erratum, 26:3870.).
21. Gatlin, L. 1972. Information Theory and Living Systems . Columbia University Press, New York, N.Y. 1992.
22. Green, P.,, D. Lipman,, L. Hillier,, R. Waterston,, D. States,, and J. M. Claverie. 1993. Ancient conserved regions in new gene sequences and the protein databases. Science 259: 1711 1716.
23. Hayes, W. S.,, and M. Borodovsky. 1998. Deriving ribosome binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction, p. 279 290. In Proceedings of the Pacific Symposium on Biocomputing 1998. World Scientific, Maui, Hawaii.
23a.. Hayes, W. S.,, and M. Borodovsky. Unpublished data.
24. Henderson, J.,, S. Salzberg,, and K. H. Fasman. 1997. Finding genes in DNA with a hidden Markov model. J. Comp. Biol. 4: 127 141.
25. Himmelreich, R.,, H. Hilbert,, H. Plagens,, E. Pirkl,, B.-C. Li,, and R. Herrmann. 1996. Complete sequence of the genome of the bacterium Mycoplasma pneumoniae . Nucleic Acids Res. 24: 4420 4449.
26. Hogg, R. V.,, and E. A. Tanis. 1997. Probability and Statistical Inference . Prentice Hall, Englewood Cliffs, N.J. 1992.
27. Jelinek, F.,, and R. I. Mercer,. 1980. Interpolated estimation of Markov source parameters from sparse data p. 252 260. In E. S. Gelsema, and L. N. Kanak (ed.), Pattern Recognition in Practice . Elsevier/North Holland, New York, N.Y..
28. Kaneko, T.,, A. Tanaka,, S. Sato,, H. Kotani,, T. Sazuka,, N. Miyajima,, M. Sugiura,, and S. Tabata. 1995. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803.1. Sequence features in the 1 Mb region from map positions 64% to 92% of the genome. DNA Res. 2: 153 166.
29. Kleffe, J.,, K. Hermann,, and M. Borodovsky. 1995. Statistical analysis of GeneMark performance by cross-validation. Comput. Chem. 20: 123 134.
30. Klenk, H. P.,, R. A. Clayton,, J. Tomb,, O. White,, K. E. Nelson,, K. A. Ketchum,, R. J. Dodson,, M. Gwinn,, E. K. Hickey,, J. D. Peterson,, D. L. Richardson,, A. R. Kerlavage,, D. E. Graham,, N. C. Kyrpides,, R. D. Fleischmann,, J. Quackenbush,, N. H. Lee,, G. G. Sutton,, S. Gill,, E. F. Kirkness,, B. A. Dougherty,, K. McKenney,, M. D. Adams,, B. Loftus,, S. Peterson,, C. I. Reich,, L. K. McNeil,, J. H. Badger,, A. Glodek,, L. Zhou,, R. Overbeek,, J. D. Gocayne,, J. F. Weidman,, L. McDonald,, T. Utterback,, M. D. Cotton,, T. Spriggs,, P. Artiach,, B. P. Kaine,, S. M. Sykes,, P. W. Sadow,, K. P. D'Andrea,, C. Bowman,, C. Fujii,, S. A. Garland,, T. M. Mason,, G. J. Olsen,, C. M. Fräser,, H. O. Smith,, C. R. Woese,, and J. C. Venter. 1997. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus . Nature 390: 364 370.
31. Koonin, E. V.,, P. Bork,, and C. Sander. 1994. Yeast chromosome III: new gene functions. EMBO J. 13: 493 503.
32. Koonin E. V.,, and M. Y. Galperin. 1997. Prokaryotic genomes: the emerging paradigm of genome-based microbiology. Curr. Opin. Genet. Dev. 7: 757 763.
33. Koonin, E. V.,, A. R. Mushegian,, M. Y. Galperin,, and D. R. Walker. 1997. Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol. 25: 619 637.
34. Krogh, A.,, M. Brown,, I. S. Mian,, K. Sjolander,, and D. Haussler. 1994. Hidden Markov models in computational biology. Applications to protein modeling. J. Mol. Biol. 235: 1501 1531.
35. Krogh, S.,, I. S. Mian,, and D. Haussler. 1994. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 22: 4768 4778.
36. Kulp, D.,, D. Haussler,, M. G. Reese,, and F. H. Eeckman,. 1996. In D. J. States,, P. Agarwal,, T. Gaasterland,, L. Hunter,, and R. F. Smith (ed.), Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology (ISMB-96), p. 134 142. AAAI Press, Memo Park, Calif..
37. Kunst, F.,, N. Ogasawara,, I. Moszer,, A. M. Albertini,, G. Alloni,, V. Azevedo,, M. G. Bertero,, P. Bessieres,, A. Bolotin,, S. Bordiert,, R. Borriss,, L. Boursier,, A. Brans,, M. Braun,, S. C. Brignell,, S. Bron,, S. Brouillet,, C. V. Bruschi,, B. Caldwell,, V. Capuano,, N. M. Carter,, S. K. Choi,, J. J. Codani,, I. F. Connerton,, N. J. Cummings,, R. A. Daniel,, F. Denizot,, K. M. Devine,, A. Dusterhoft,, S. D. Ehrlich,, P. T. Emmerson,, K. D. Entian,, J. Errington,, C. Fabret,, E. Ferrari,, D. Foulger,, C. Fritz,, M. Fujita,, Y. Fujita,, S. Fuma,, A. Galizzi,, N. Galleron,, S. Y. Ghim,, P. Glaser,, A. Goffeau,, E. J. Golightly,, G. Grandi,, G. Guiseppi,, B. J. Guy,, K. Haga,, J. Haiech,, C. R. Harwood,, A. Henaut,, H. Hilbert,, S. Holsappel,, S. Hosono,, M. F. Hullo,, M. Itaya,, L. Jones,, B. Joris,, D. Karamata,, Y. Kasahara,, M. Klaerr-Blanchard,, C. Klein,, Y. Kobayashi,, P. Koetter,, G. Koningstein,, S. Krogh,, M. Kumano,, K. Kurita,, A. Lapidus,, S. Lardinois,, J. Lauber,, V. Lazarevic,, S. M. Lee,, A. Levine,, H. Liu,, S. Masuda,, C. Mauel,, C. Médigue,, N. Medina,, R. P. Mellado,, M. Mizuno,, D. Moestl,, S. Nakai,, M. Noback,, D. Noone,, M. O'Reilly,, K. Ogawa,, A. Ogiwara,, B. Oudega,, S. H. Park,, V. Parro,, T. M. Pohl,, D. Portetelle,, S. Porwollik,, A. M. Prescott,, E. Presecan,, P. Pujic,, B. Purnelle,, G. Rapoport,, M. Rey,, S. Reynolds,, M. Rieger,, C. Rivolta,, E. Rocha,, B. Roche,, M. Rose,, Y. Sadaie,, T. Sato,, E. Scanlan,, S. Schleich,, R. Schroeter,, F. Scoffone,, J. Sekiguchi,, A. Sekowska,, S. J. Seror,, P. Serror,, B. S. Shin,, B. Soldo,, A. Sorokin,, E. Tacconi,, T. Takagi,, H. Takahashi,, K. Takemaru,, M. Takeuchi,, A. Tamakoshi,, T. Tanaka,, P. Terpstra,, A. Tognoni,, V. Tosato,, S. Uchiyama,, M. Vandenbol,, F. Vannier,, A. Vassarotti,, A. Viari,, R. Wambutt,, E. Wedler,, H. Wedler,, T. Weitzenegger,, P. Winters,, A. Wipat,, H. Yamamoto,, K. Yamane,, K. Yasumoto,, K. Yata,, K. Yoshida,, H. F. Yoshikawa,, E. Zumstein,, H. Yoshikawa,, and A. Danchin. 1997. The complete genome sequence of the gram-positive bacterium Bacillus subtilis . Nature 390: 249 256.
38. Lawrence, J. G.,, and H. Ochman. 1997. Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44: 383 397.
39. Link, A. J.,, K. Robison,, and G. M. Church. 1997. Comparing the predicted and observed properties of proteins encoded in the genome of Escherichia coli K-12. Electrophoresis 18: 1259 1313.
40. Lukashin, A. V.,, and M. Borodovsky. 1998. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 26: 1107 1115.
40a.. Lukashin, A. V.,, and M. Borodovsky. Unpublished data.
41. McIninch, J.,, W. Hayes,, and M. Borodovsky,. 1996. Application of GeneMark in multispecies environment, p. 176 188. In D. J. States,, P. Agarwal,, T. Gaasterland,, L. Hunter,, and R. F. Smith (ed.), Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology (ISMB-96). AAAI Press, Menlo Park, Calif..
42. Médigue, C.,, T. Rouxel,, P. Vigier,, A. Henaut,, and A. Danchin. 1991. Evidence for horizontal gene transfer in Escherichia coli speciation. J. Mol. Biol. 222: 851 856.
43. Rabiner, L. R. 1989. Tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77: 257 286.
44. Salzberg, S. L.,, A. L. Deicher,, S. Kasif,, and O. White. 1998. Microbial gene identification using interpolated Markov models. Nucleic Acids Res. 26: 544 548.
44a.. Shmatkov, A. M.,, A. A. Melikyan,, F. L. Chernousko,, and M. Borodovsky. Unpublished data.
45. Smith, D. R.,, L. A. Doucette-Stamm,, C. Deloughery,, H.-M. Lee,, J. Dubois,, T. Aldredge,, R. Bashirzadeh,, D. Blakely,, R. Cook,, K. Gilbert,, D. Harrison,, L. Hoang,, P. Keagle,, W. Lumm,, B. Pothier,, D. Qiu,, R. Spadafora,, R. Vicare,, Y. Wang,, J. Wierzbowski,, R. Gibson,, N. Jiwani,, A. Caruso,, D. Bush,, H. Safer,, D. Patwell,, S. Prabhakar,, S. McDougall,, G. Shimer,, A. Goyal,, S. Pietrovski,, G. M. Church,, C. J. Daniels,, J.-I. Mao,, P. Rice,, J. Nolling,, and J. N. Reeve. 1997. Complete genome sequence of Methanobacterium thermoautotrophicum AH: functional analysis and comparative genomics. J. Bacteriol. 179: 7135 7155.
46. Tavare, S.,, and B. Song. 1989. Codon preference and primary sequence structure in protein-coding regions. Bull. Math. Biol. 51: 95 115.
47. Tomb, J.-F.,, O. White,, A. R. Kerlavage,, R. A. Clayton,, G. G. Sutton,, R. D. Fleischmann,, K. A. Ketchum,, H. P. Klenk,, S. Gill,, B. A. Dougherty,, K. Nelson,, J. Quackenbush,, L. Zhou,, E. F. Kirkness,, S. Peterson,, B. Loftus,, D. Richardson,, R. Dodson,, H. G. Khalak,, A. Glodek,, K. McKenney,, L. M. Fitzegerald,, N. Lee,, M. D. Adams,, E. K. Hickey,, D. E. Berg,, J. D. Gocayne,, T. R. Utterback,, J. D. Peterson,, J. M. Kelley,, M. D. Cotton,, J. M. Weidman,, C. Fujii,, C. Bowman,, L. Watthey,, E. Wallin,, W. S. Hayes,, M. Borodovsky,, P. D. Karp,, H. O. Smith,, C. M. Fraser,, and J. C. Venter. 1997. The complete genomic sequence of the gastric pathogen Helicobacter pylori . Nature 388: 539 548.
48. Yada, T.,, and M. Hirosawa,. 1996. In D. J. States,, P. Agarwal,, T. Gaasterland,, L. Hunter,, and R. F. Smith (ed.), Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology (ISMB-96), p. 252 260. AAAI Press, Menlo Park, Calif..

Tables

Generic image for table
TABLE 1

Sizes of sequence sets recommended for training a Markov model of a given order

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Generic image for table
TABLE 2

Percentage of words of given length (for genome) in which counts exceed the critical number, , of 400

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Generic image for table
TABLE 3

GeneMark performance for 10 complete genomes

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2
Generic image for table
TABLE 4

GeneMark.hmm performance

Citation: Borodovsky M, Hayes W, Lukashin A. 1999. Statistical Predictions of Coding Regions in Prokaryotic Genomes by Using Inhomogeneous Markov Models, p 11-33. In Charlebois R (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, DC. doi: 10.1128/9781555818180.ch2

This is a required field
Please enter a valid email address
Please check the format of the address you have entered.
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error