Chapter 48 : Evolution of DNA Transposons in Eukaryotes

The availability of genomic sequences from all kingdoms of life and sophisticated searching algorithms has revolutionized our ability to detect distantly related transposons purely on the basis of sequence similarity, thus improving our understanding of their macroevolutionary trends. This chapter focuses on DNA transposons in eukaryotic genomes and particularly the four available genomes from plants and animals. Phylogenetic analysis of such a megafamily is fraught with difficulties because only this shared D,D35E domain can be employed, and even then the alignment of many of the amino acids in-between is uncertain without structural information, which is only available for several integrases and the Tn transposase. An interesting feature of this superfamily is that the transposase genes of members of the family have occasionally been recruited to perform host functions. The genomic sequences from and have also revealed many sequences encoding related proteins. Bacterial transposons experience a very different host environment from those in eukaryotes, and even among eukaryotes, there is huge variation in these dynamics due to vagaries of the transposonhost interaction. The chapter focuses what recent eukaryotic genomic sequences tell us about the history of DNA transposons. Many other DNA transposons exist at relatively low copy numbers, for example, in filamentous fungi, and have multiple lineages in particular hosts, for example, the -like elements of plants; however, these situations require extensive analysis of large sequence sets within species and consideration of the evolution of the transposon family across species.

Citation: Robertson H. 2002. Evolution of DNA Transposons in Eukaryotes, p 1093-1110. In Craig N, Craigie R, Gellert M, Lambowitz A (ed), Mobile DNA II. ASM Press, Washington, DC. doi: 10.1128/9781555817954.ch48

DNA Transposons
DNA Transposons in Eukaryotes
Image of Figure 1.
Figure 1.

Relationships within the / superfamily rooted with the IS grouping of bacterial and fungal transposons. Subfamilies of elements are indicated. Bacterial and plant sequence names are in plain text, fungal names are underlined, nematode names are in italic, insect names are in boldface, and vertebrate names are in bold italic. Numbers above the major branches indicate the percentage of 1,000 bootstrap replications in which that branch was present. Transposase sequences were aligned using CLUSTAL X ( ), and the tree was constructed using neighbor joining in PAUP* version 4.0b4(PPC) ( ) with distances corrected for multiple replacements by TREE-PUZZLE v5.0 ( ) using the BLOSUM62 matrix and maximum likelihood with uniform rates.

Image of Figure 2.
Figure 2.

Host genes derived from the mammalian transposons. The tree includes the available transposase translations of the to consensus sequences (from the humrep.ref file of RepBase 6.1 at RepBase Update; http://charon.girinst.org/server/RepBase/) and eight genes in mammalian genomes apparently derived from the / transposons. The mammalian CENP-B proteins were used to root the tree, based on their apparent antiquity and homologs being present in all multicellular eukaryotes. The C-terminal regions, which can be aligned among most of these but not with the CENP-B protein, were excluded; otherwise, see the Fig. 1 legend for phylogenetic methods.

Image of Figure 3.
Figure 3.

Relationships within the hAT superfamily. Clear groupings of transposons are indicated. The tree was rooted at the midpoint in the absence of a convincing outgroup; otherwise, see the Fig. 1 legend for phylogenetic methods and typeface treatment. The published translation for was extended to full length. is not present because no consensus sequence capable of encoding a transposase is available; the other consensus sequences generally still have many ambiguous positions (RepBase 6.1).

Image of Figure 4.
Figure 4.

Relationships of 54 copies in the human genome. The various copies are indicated by the accession number of the sequenced clone containing them. The tree is based on full-length DNA sequences rooted by the hydra and beetle consensus sequences. Phylogenetic methods are as given in the Fig. 1 legend, except that the HKY model was used to correct DNA distances.

Image of Figure 5.
Figure 5.

Relationships of 24 copies in the human genome. In the absence of a close relative, the tree was rooted at the midpoint. See the Fig. 4 legend for details.

Image of Figure 6.
Figure 6.

Relationships of the 66 copies in the nematode genome. Copies are named for the cosmid or yeast artificial chromosome clone in which they occur. Those in boldface are identical to one of the two consensus sequences which differ by a nonsynonymous transition; the others showing no sequence differences from the consensus sequences have small indels. Maximum parsimony was employed using PAUP* version 4.0b4a to build the tree because so few changes have occurred that correction for multiple changes is unnecessary; the tree is rooted at the midpoint.

Image of Figure 7.
Figure 7.

Relationships of the 46 copies in the nematode genome. See the legends to Figs. 1 and 6 for details.

Image of Figure 8.
Figure 8.

Relationships of elements in the genome. The pSX copies are from Merriman et al. ( ), while the remainder are indicated by the GenBank accession (preceded with AE00) in which they are found (scaffolds are from the Celera data), with their chromosomal locations, if known, shown after a period. Phylogenetic methods are as given in the Fig. 6 legend; values are as given in the Fig. 1 legend.

