Chapter 16 : Origin and Evolution of the Proteome

This chapter briefly reviews our current understanding of the origin and evolutionary dynamic of the picornavirus proteome. There is a large body of literature on protein evolution recorded during picornavirus outbreaks and on picornavirus passaging in cells and animals in the absence or presence of a selective factor, e.g., a drug. Before discussing picornavirus proteins, it is useful to recall that they were originally named without regard to evolutionary considerations, which is a common framework in contemporary studies. Two processes, mutation and homologous recombination, have been shown to be involved in generating these changes in the most conserved proteins. Special cases of nonhomologous recombination are gene duplication and loss in progeny of a single parent. In the case of gene duplication, a genetic locus is repeatedly copied, while gene loss is a result of skipping a genetic locus from copying; both are considered to be aberrations of template-mediated replication in picornaviruses. The origin of the N-terminal amphipathic helix of 2C is another case open to different evolutionary interpretations. Gene loss along with repeated introduction of a protein variety may be invoked for explaining phylogenetic discontinuity of the presence of the protein variety in picornaviruses.

Citation: Gorbalenya A, Lauber C. 2010. Origin and Evolution of the Proteome, p 253-270. In Ehrenfeld E, Domingo E, Roos R (ed), The Picornaviruses. ASM Press, Washington, DC. doi: 10.1128/9781555816698.ch16
Image of Figure 1.
Figure 1.

Phylogenetic tree of the family. A phylogeny of 28 picornaviruses representing species diversity is shown. The maximum-likelihood tree is based on a multiple alignment of RdRps and was compiled using the PhyML program under the WAG amino acid substitution matrix and rate heterogeneity among sites (gamma distribution with four categories) ( ). A Bayesian reconstruction utilizing the BEAST software resulted in an identical topology. Numbers at branching points indicate bootstrap support values from 1,000 replicates. The scale of evolution in average number of amino acid substitutions per position is shown by the bar. The tree was rooted according to a separate phylogenetic analysis using nidovirus RdRps as an outgroup (data not shown). Picornavirus genera are indicated to the right of the phylogeny. For picornavirus species the presence of L and 2A proteins in polyproteins is depicted using rectangles of different shades. The widths of the rectangles are scaled proportionally to the size of L and 2A proteins. Homologous proteins are coded as described for Fig. 2 , below. The viruses included are: HAV, avian encephalomyelitis virus (AvEMV), HPeV, LjV, DuHV AP, SealPV, porcine teschovirus (PTeV), FMDV SAT 2, ERAV, Theiler’s-like virus of rats (TheiloV), encephalomyocarditis virus (EMCV), Seneca Valley virus (SVV), EERBV1, Aichi virus (AiV), bovine kobuvirus (BKoV), avian sapelovirus (DuPV), porcine sapelovirus (PEV-A), simian picornavirus 1 (SiPV), bovine enterovirus (BEV), simian enterovirus A (SiEV), HRV 30 (HRV-A), HRV-C, HRV-B, HEV-C, HEV-D, HEV 71 (HEV-A), HEV-B, and porcine enterovirus B (PEV-B).

Citation: Gorbalenya A, Lauber C. 2010. Origin and Evolution of the Proteome, p 253-270. In Ehrenfeld E, Domingo E, Roos R (ed), The Picornaviruses. ASM Press, Washington, DC. doi: 10.1128/9781555816698.ch16
Image of Figure 2.
Figure 2.

Polyprotein layout characteristics of genera or species of the family. Genomic organizations for 17 picornaviruses of 13 genera are shown, describing all variants of the polyprotein domain architecture found for the by 2009. The organizations were aligned at the 2B-2C border and are ranked in order of descending genome size. Mature proteins are depicted as different shaded rectangles (with the exception of 2A3 and 2A4 [NPGP] in cardioviruses, which are released as a fused product from the polyprotein), and UTRs are shown as solid horizontal lines. The identity of proteins can be determined using the legend at the bottom. Borders of proteins were identified using protein annotations for the most-well-characterized viruses, which were then applied to a family-wide polyprotein alignment, generated by using Muscle and curated manually with support of the Viralis software platform (Gorbalenya, unpublished). For the sake of this comparison, a region between a leader protein (where it is present) or the initiator codon (leaderless viruses) and 1B (VP2) was considered as 1A (VP4) in all viruses, although it is not produced in some viruses. For a discussion of the complexities of VP4 evolution, see the text. For TMEV two reading frames are shown (from top to bottom: 0 and + 1 with respect to the start of the most upstream open reading frame), as it encodes an additional protein (L*) in the + 1 frame.

Citation: Gorbalenya A, Lauber C. 2010. Origin and Evolution of the Proteome, p 253-270. In Ehrenfeld E, Domingo E, Roos R (ed), The Picornaviruses. ASM Press, Washington, DC. doi: 10.1128/9781555816698.ch16
Image of Figure 3.
Figure 3.

Polyprotein conservation of the family. A plot of the conservation along the polyprotein alignment of 13 picornaviruses representing genus diversity is shown. The normalized similarity measure was compiled using the Bio3d package in R under the Blosum62 substitution matrix and a sliding window size of 10 amino acid positions ( ). The mean similarity of the polyprotein is indicated by the dashed horizontal line. On top, the positions of single protein alignments are highlighted by black rectangles and names with the same nomenclature as used for Fig. 2 . For L and 2A proteins the positions of alignments for the different protein families (see also Table 1 ) are shown by grey vertical lines. The grey inserts represent separate conservation plots for the different L and 2A proteins that are expressed by at least two virus species. The following conserved sequence motifs are indicated at peaks of the similarity measure: NPGP cleavage motif in 2A4; 2C helicase motifs A, B, and C; 3B conserved Tyr (Y) nucleotidylated during priming in RNA synthesis; 3C protease catalytic His (H) and Cys (C), noncatalytic Asp/Glu (D/E) residues, and a substrate-binding motif (SB); 3D polymerase motifs A, B, C, E, F, and G.

Citation: Gorbalenya A, Lauber C. 2010. Origin and Evolution of the Proteome, p 253-270. In Ehrenfeld E, Domingo E, Roos R (ed), The Picornaviruses. ASM Press, Washington, DC. doi: 10.1128/9781555816698.ch16
Image of Figure 4.
Figure 4.

Protein conservation in the family. For each of the different picornavirus proteins the mean normalized similarity (see Fig. 3 and text) is plotted against the length deviation, where the latter was compiled as the standard deviation divided by the mean length. For the main figure (in black), lengths of protein regions L, 1A, 1B, 1C, 1D, 2A, 2B, 2C, 3A, 3B, 3C, and 3D were used (allowing lengths of 0 in cases of absent proteins), whereas lengths in the inset plot (grey) are based on mature proteins (absent proteins were not counted). The same data set used for Fig. 3 was used here.

Citation: Gorbalenya A, Lauber C. 2010. Origin and Evolution of the Proteome, p 253-270. In Ehrenfeld E, Domingo E, Roos R (ed), The Picornaviruses. ASM Press, Washington, DC. doi: 10.1128/9781555816698.ch16
Image of Figure 5.
Figure 5.

Conservation and diversity of genetic plans of the order Genomic organizations for seven viruses are shown (similar to Fig. 2 ) and represent polyprotein layouts of families of the Different shapes and shades were used to highlight protein families found in all or several virus families. Borders of proteins were identified using the GenBank annotation where available. Otherwise, positions were estimated utilizing homology searches (HMMer) against profiles of the picornavirus proteins ( ). The viruses included are Strawberry latent ringspot virus (Sadwavirus), Maize chlorotic dwarf virus (Sequivirus), Patchouli mild mosaic virus (Comovirus), Deformed wing virus (Iflavirus), Kashmir bee virus (Dicistrovirus), Heterosigma akashiwo RNA virus (Marnavirus), and encephalomyocarditis virus (Picornavirus). For Sadwavirus and Comovirus the two RNA segments are shown.

Citation: Gorbalenya A, Lauber C. 2010. Origin and Evolution of the Proteome, p 253-270. In Ehrenfeld E, Domingo E, Roos R (ed), The Picornaviruses. ASM Press, Washington, DC. doi: 10.1128/9781555816698.ch16
Generic image for table
Table 1.

The picornavirus proteome: function, structure, and evolution

Citation: Gorbalenya A, Lauber C. 2010. Origin and Evolution of the Proteome, p 253-270. In Ehrenfeld E, Domingo E, Roos R (ed), The Picornaviruses. ASM Press, Washington, DC. doi: 10.1128/9781555816698.ch16

