
Full text loading...
Category: Microbial Genetics and Molecular Biology
R2 and Related Site-Specific Non-Long Terminal Repeat Retrotransposons, Page 1 of 2
< Previous page | Next page > /docserver/preview/fulltext/10.1128/9781555817954/9781555812096_Chap34-1.gif /docserver/preview/fulltext/10.1128/9781555817954/9781555812096_Chap34-2.gifAbstract:
R2 elements are unusual mobile elements because all copies insert into a specific site within the genome, the 28S rRNA genes. The target-primed reverse transcription (TPRT) reaction characterized for R2 elements now serves as the model for studying the mechanism of all non-long terminal repeat (LTR) retrotransposons, and has also been shown to be similar to critical steps in group II intron retrohoming. Among the first eukaryotic genes to be cloned molecularly were the rRNA genes of Drosophila melanogaster. R1 elements were similar in coding capacity to L1 and I elements in that they contained two open reading frames (ORFs). A genomic DNA blotting approach was used initially to score for the presence of R1 and R2 insertions in the 28S rRNA genes of 47 species from nine insect orders. More recently, PCR has been used as the favored assay for the presence of R2 and R1 elements within insect species. In this approach degenerate oligonucleotide primers, designed to anneal to highly conserved regions within the RTencoding region of the elements, are used in combination with a second oligonucleotide that anneals to invariant 28S gene sequences approximately 700 bp downstream of the insertion sites. In summary, R1 and R2 elements have been identified in every major lineage of arthropods examined to date. All R2 elements insert in the 28S gene at a site exactly 74 bp upstream of all R1 insertions.
Full text loading...
Organization of R1 and R2 elements in the rDNA loci of arthropods. The rDNA loci of most arthropods contain hundreds of tandemly repeated rDNA units. Each rDNA unit contains an 18S, a 5.8S, and a 28S rRNA gene (filled boxes) separated by internal transcribed spacers (open boxes). R1 and R2 elements insert into specific sites in a fraction of the 28S gene. Individual 28S genes can contain either or both insertions. Based on the 3′ junction of each element, the R1 insertion site is 74 bp downstream of the R2 insertion site. Shown at the bottom of the figure is an expanded view of an individual rDNA unit containing both R1 and R2 elements. The R1 element encodes two ORFs (shaded boxes) flanked by untranslated regions (white boxes). The second ORF encodes an APE and an RT domain. The R2 element encodes a single ORF with a centrally located RTdomain and a C-terminal endonuclease domain (EN).
Organization of R1 and R2 elements in the rDNA loci of arthropods. The rDNA loci of most arthropods contain hundreds of tandemly repeated rDNA units. Each rDNA unit contains an 18S, a 5.8S, and a 28S rRNA gene (filled boxes) separated by internal transcribed spacers (open boxes). R1 and R2 elements insert into specific sites in a fraction of the 28S gene. Individual 28S genes can contain either or both insertions. Based on the 3′ junction of each element, the R1 insertion site is 74 bp downstream of the R2 insertion site. Shown at the bottom of the figure is an expanded view of an individual rDNA unit containing both R1 and R2 elements. The R1 element encodes two ORFs (shaded boxes) flanked by untranslated regions (white boxes). The second ORF encodes an APE and an RT domain. The R2 element encodes a single ORF with a centrally located RTdomain and a C-terminal endonuclease domain (EN).
Phylogeny of arthropod R2 elements. (A) Phylogeny of R2 elements from species throughout arthropods. A 50% consensus tree based on the neighbor-joining method is shown with bootstrap values indicated. The sequences used for the phylogeny are the 450-amino-acid carboxyl-terminal regions of the R2 ORF as described by Burke et al. ( 10 , 11 ). The tree was rooted using the R4 element of nematodes ( 12 ). Multiple families of R2 are found in some species, with each family given a letter classification. All R2 elements derived from species of Drosophila are located within the boxed portion of one branch. (B) Phylogeny of R2 elements in Drosophila. A 50% consensus tree based on the neighbor-joining method is shown with bootstrap values indicated. The sequences used for this phylogeny are the 510 nucleotides corresponding to the terminal regions of the R2 ORF as described by Lathe and Eickbush ( 57 ). Shown at the right are the nine species groups of Drosophila from which R2 sequences were derived. All species are from the Sophophora and Drosophila subgenera.
Phylogeny of arthropod R2 elements. (A) Phylogeny of R2 elements from species throughout arthropods. A 50% consensus tree based on the neighbor-joining method is shown with bootstrap values indicated. The sequences used for the phylogeny are the 450-amino-acid carboxyl-terminal regions of the R2 ORF as described by Burke et al. ( 10 , 11 ). The tree was rooted using the R4 element of nematodes ( 12 ). Multiple families of R2 are found in some species, with each family given a letter classification. All R2 elements derived from species of Drosophila are located within the boxed portion of one branch. (B) Phylogeny of R2 elements in Drosophila. A 50% consensus tree based on the neighbor-joining method is shown with bootstrap values indicated. The sequences used for this phylogeny are the 510 nucleotides corresponding to the terminal regions of the R2 ORF as described by Lathe and Eickbush ( 57 ). Shown at the right are the nine species groups of Drosophila from which R2 sequences were derived. All species are from the Sophophora and Drosophila subgenera.
Rate of sequence substitution in the evolution of R1 and R2. The x axis represents estimates of the host species divergence and the y axis represents calculations of the amino acid divergence of the RTdomain of the R1 or R2 element. In some cases the divergences are based on partial RTdomains (∼50% of the domain) that have been calibrated to the complete RTdivergence scale using at least two outgroup comparisons. All R2 elements are those shown in Fig. 2 , whereas R1 elements are derived from published reports ( 11 , 21 , 56 ). Species divergence times are described in Malik et al. ( 64 ) and can be summarized as follows: simulans species complex, 1 million years (MYR); simulans complex versus melanogaster, 2.5 MYR; melanogaster versus yakuba, 6 MYR; melanogaster versus errecta, 12 MYR; melanogaster group versus obscura group, 25 MYR; Drosophila subgenus versus Sophophora subgenus, 45 MYR; different insect orders, 200 to 300 MYR; different arthropod classes, 400 to 600 MYR. Data points at the upper left represent the different element lineages in the same species.
Rate of sequence substitution in the evolution of R1 and R2. The x axis represents estimates of the host species divergence and the y axis represents calculations of the amino acid divergence of the RTdomain of the R1 or R2 element. In some cases the divergences are based on partial RTdomains (∼50% of the domain) that have been calibrated to the complete RTdivergence scale using at least two outgroup comparisons. All R2 elements are those shown in Fig. 2 , whereas R1 elements are derived from published reports ( 11 , 21 , 56 ). Species divergence times are described in Malik et al. ( 64 ) and can be summarized as follows: simulans species complex, 1 million years (MYR); simulans complex versus melanogaster, 2.5 MYR; melanogaster versus yakuba, 6 MYR; melanogaster versus errecta, 12 MYR; melanogaster group versus obscura group, 25 MYR; Drosophila subgenus versus Sophophora subgenus, 45 MYR; different insect orders, 200 to 300 MYR; different arthropod classes, 400 to 600 MYR. Data points at the upper left represent the different element lineages in the same species.
All R2 elements have the same structure. The 5′ and 3′ UTRs at either end (open boxes) differ in length. R2 elements of half of the species end in a poly(A) tail (abbreviated An), whereas the others have no repeat sequences at their 3′ ends. The single ORF in each R2 element has been shaded with the RTand EN indicated. The solid bar near the N-terminal end of each element represents a C2H2 zinc-finger motif. The R2 elements of the horseshoe crab and the jewel wasp (B family) have three such C2H2 motifs. Shown with diagonal shading is a region that has sequence similarity to a c-myb DNA-binding motif. Only the regions upstream of the zinc-finger domains and between the c-myb and RT domains show length variation. The characterized jewel wasp R2 element (family A) ( 10 ) appears to have undergone a recent recombination with an R1 element and is not included here.
All R2 elements have the same structure. The 5′ and 3′ UTRs at either end (open boxes) differ in length. R2 elements of half of the species end in a poly(A) tail (abbreviated An), whereas the others have no repeat sequences at their 3′ ends. The single ORF in each R2 element has been shaded with the RTand EN indicated. The solid bar near the N-terminal end of each element represents a C2H2 zinc-finger motif. The R2 elements of the horseshoe crab and the jewel wasp (B family) have three such C2H2 motifs. Shown with diagonal shading is a region that has sequence similarity to a c-myb DNA-binding motif. Only the regions upstream of the zinc-finger domains and between the c-myb and RT domains show length variation. The characterized jewel wasp R2 element (family A) ( 10 ) appears to have undergone a recent recombination with an R1 element and is not included here.
TPRT model for R2 retrotransposition. (A) The R2 protein recognizes the RNA secondary structure of the R2 3′ UTR. The active protein complex is probably a dimer with the R2 RNA, a necessary cofactor for dimer formation. The RNA template used in the TPRT reaction is probably a 28S rRNA/R2 cotranscript. All initial steps of the TPRT reaction can be studied in vitro with purified R2 protein, R2 RNA, and DNA target ( 61 ). The R2 protein cleaves the first (primer) strand of the target site and uses the released 3′-hydroxyl of the terminal nucleotide to prime reverse transcription starting at the 3′ R2/28S junction on the RNA template. Cleavage of the second (nonprimer) strand occurs after reverse transcription. Thick lines, DNA target sequences; thin wavy lines, RNA sequences corresponding to the R2 element; thick wavy lines, flanking rRNA sequences on the RNA template; thin straight line, newly synthesized cDNA strand. (B) Location of the first- and secondstrand DNA cleavages at the target site. The 3′ ends of integrated R2 elements are precise and consistent with the initial nick used for TPRT. The attachment of the cDNA to the upstream target site cannot be studied in vitro. The process of 5′ attachment generates considerable sequence variation at the junction of R2 element with the 28S gene. A possible mechanism for this 5′ attachment is described in Fig. 6 .
TPRT model for R2 retrotransposition. (A) The R2 protein recognizes the RNA secondary structure of the R2 3′ UTR. The active protein complex is probably a dimer with the R2 RNA, a necessary cofactor for dimer formation. The RNA template used in the TPRT reaction is probably a 28S rRNA/R2 cotranscript. All initial steps of the TPRT reaction can be studied in vitro with purified R2 protein, R2 RNA, and DNA target ( 61 ). The R2 protein cleaves the first (primer) strand of the target site and uses the released 3′-hydroxyl of the terminal nucleotide to prime reverse transcription starting at the 3′ R2/28S junction on the RNA template. Cleavage of the second (nonprimer) strand occurs after reverse transcription. Thick lines, DNA target sequences; thin wavy lines, RNA sequences corresponding to the R2 element; thick wavy lines, flanking rRNA sequences on the RNA template; thin straight line, newly synthesized cDNA strand. (B) Location of the first- and secondstrand DNA cleavages at the target site. The 3′ ends of integrated R2 elements are precise and consistent with the initial nick used for TPRT. The attachment of the cDNA to the upstream target site cannot be studied in vitro. The process of 5′ attachment generates considerable sequence variation at the junction of R2 element with the 28S gene. A possible mechanism for this 5′ attachment is described in Fig. 6 .
Template-switch model for the attachment of the R2 element to the upstream target sequence. Extensive sequence variation is observed at the 5′ end of R2 elements, suggesting considerable flexibility in the attachment process. To explain this sequence variation the reverse transcriptase is proposed to extend along the RNA template to different positions (positions A, B, and C) before switching (jumping) to the DNA sequences of the target site. Thick lines, DNA target sequences; thin wavy lines, RNA sequences corresponding to the R2 element; thick wavy lines, flanking rRNA sequences on the RNA template; thin straight line, newly synthesized cDNA strand. Shown at the bottom are the potential 5′ junctions created by template jumps at positions A, B, and C. The means by which the RNA template is removed and the second strand of DNA synthesized is not known. In most events the R2 reverse transcriptase jumps from the R2 RNA template to the cleaved target DNA near the R2/28S gene junction (position A). Jumps at this location generate full-length R2 insertions that may have precise junctions, or small deletions, and additional (nontemplated) bases depending on whether the jump is to the terminal nucleotide of the upstream sequence and how readily the target DNA is engaged by the reverse transcriptase. If reverse transcription proceeds beyond the 5′ junction (position B) and then jumps to the free end of the target DNA (arrow 1), a tandem duplication of the target DNA is generated. If the jump at position B is to a homologous position in the upstream sequences (arrow 2), the template jump is similar to that in the retrotransposition of retroviruses and will result in precise 5′ junctions. Finally, if the reverse transcriptase jumps before reaching the end of the R2 sequence 5′-truncated elements are generated. Sequence variation similar to that seen with full-length R2 elements is found with 5′-truncated elements.
Template-switch model for the attachment of the R2 element to the upstream target sequence. Extensive sequence variation is observed at the 5′ end of R2 elements, suggesting considerable flexibility in the attachment process. To explain this sequence variation the reverse transcriptase is proposed to extend along the RNA template to different positions (positions A, B, and C) before switching (jumping) to the DNA sequences of the target site. Thick lines, DNA target sequences; thin wavy lines, RNA sequences corresponding to the R2 element; thick wavy lines, flanking rRNA sequences on the RNA template; thin straight line, newly synthesized cDNA strand. Shown at the bottom are the potential 5′ junctions created by template jumps at positions A, B, and C. The means by which the RNA template is removed and the second strand of DNA synthesized is not known. In most events the R2 reverse transcriptase jumps from the R2 RNA template to the cleaved target DNA near the R2/28S gene junction (position A). Jumps at this location generate full-length R2 insertions that may have precise junctions, or small deletions, and additional (nontemplated) bases depending on whether the jump is to the terminal nucleotide of the upstream sequence and how readily the target DNA is engaged by the reverse transcriptase. If reverse transcription proceeds beyond the 5′ junction (position B) and then jumps to the free end of the target DNA (arrow 1), a tandem duplication of the target DNA is generated. If the jump at position B is to a homologous position in the upstream sequences (arrow 2), the template jump is similar to that in the retrotransposition of retroviruses and will result in precise 5′ junctions. Finally, if the reverse transcriptase jumps before reaching the end of the R2 sequence 5′-truncated elements are generated. Sequence variation similar to that seen with full-length R2 elements is found with 5′-truncated elements.
Schematic diagram of the level of sequence similarity across the R2 ORF. Plotted is a sliding 15-amino-acid window of the combined amino acid similarity of the completely sequenced R2 elements to that from D. melanogaster. Identical amino acids are scored as +1, chemically similar amino acids are scored as -0.5, and indels are given a penalty of -0.5. Shown above the graph is a schematic diagram of the R2 ORF with the boundaries of the three R2 domains indicated. The central peaks of sequence similarity (number 0 to 9) can be found in all non-LTR elements and is believed to represent the boundaries of the RTdomain ( 10 , 64 ). The seven segments extensively used for the phylogenetic analysis of all retroelements ( 25 , 90 ) are shown with darker shading. Shown below the graph are regions of the N- and C-terminal domains of the R2 ORF that provide clues to the function of these domains. Black boxes, conserved in all R2 elements; gray boxes, similar residues in at least seven R2 elements. The N-terminal domain contains a C2H2 motif and a c-myb motif. These motifs suggest that the N terminus of the R2 protein is involved in DNA recognition. The C-terminal domain contains a CCHC motif, and a segment similar to the active site of certain restriction enzymes, abbreviated as PD...D. Mutagenesis of these residues demonstrates that they represent part of the active site of the R2 endonuclease ( 93 ). Two other highly conserved amino acid motifs (RHD and K..Y) are also found in all R2 elements, but the sequences are not shown. Because the N-terminal C2H2 and c-myb motifs cannot bind the entire region required for cleavage, the C-terminal domain is also likely involved in DNA binding.
Schematic diagram of the level of sequence similarity across the R2 ORF. Plotted is a sliding 15-amino-acid window of the combined amino acid similarity of the completely sequenced R2 elements to that from D. melanogaster. Identical amino acids are scored as +1, chemically similar amino acids are scored as -0.5, and indels are given a penalty of -0.5. Shown above the graph is a schematic diagram of the R2 ORF with the boundaries of the three R2 domains indicated. The central peaks of sequence similarity (number 0 to 9) can be found in all non-LTR elements and is believed to represent the boundaries of the RTdomain ( 10 , 64 ). The seven segments extensively used for the phylogenetic analysis of all retroelements ( 25 , 90 ) are shown with darker shading. Shown below the graph are regions of the N- and C-terminal domains of the R2 ORF that provide clues to the function of these domains. Black boxes, conserved in all R2 elements; gray boxes, similar residues in at least seven R2 elements. The N-terminal domain contains a C2H2 motif and a c-myb motif. These motifs suggest that the N terminus of the R2 protein is involved in DNA recognition. The C-terminal domain contains a CCHC motif, and a segment similar to the active site of certain restriction enzymes, abbreviated as PD...D. Mutagenesis of these residues demonstrates that they represent part of the active site of the R2 endonuclease ( 93 ). Two other highly conserved amino acid motifs (RHD and K..Y) are also found in all R2 elements, but the sequences are not shown. Because the N-terminal C2H2 and c-myb motifs cannot bind the entire region required for cleavage, the C-terminal domain is also likely involved in DNA binding.
Diagram of other non-LTR retrotransposons with domain structures similar to that of R2. All elements show targetsite specificity to either the 28S gene, spliced leader exons, or TAA repeats within the host’s genome. All elements encode a similar C-terminal domain with CCHC and PD..D motifs, and two other highly conserved amino acid motifs (RHD and K..Y) found in all R2 elements ( 66 , 93 ). The N-terminal domain of these other elements is variable. CRE/SLACS and NeSL elements contain two C2H2 motifs (black bars), but have longer N-terminal extensions of unknown function. NeSL elements contain a putative cysteine protease domain (PRO). The presumed taxonomic ranges of the elements are shown at the right.
Diagram of other non-LTR retrotransposons with domain structures similar to that of R2. All elements show targetsite specificity to either the 28S gene, spliced leader exons, or TAA repeats within the host’s genome. All elements encode a similar C-terminal domain with CCHC and PD..D motifs, and two other highly conserved amino acid motifs (RHD and K..Y) found in all R2 elements ( 66 , 93 ). The N-terminal domain of these other elements is variable. CRE/SLACS and NeSL elements contain two C2H2 motifs (black bars), but have longer N-terminal extensions of unknown function. NeSL elements contain a putative cysteine protease domain (PRO). The presumed taxonomic ranges of the elements are shown at the right.