Journal of Heredity Advance Access originally published online on July 1, 2005
Journal of Heredity 2005 96(5):566-571; doi:10.1093/jhered/esi070
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Brief Communication |
Mutational Bias in Penguin Microsatellite DNA
From the Allan Wilson Centre for Molecular Ecology and Evolution, Institute of Molecular BioSciences, Massey University, Private Bag 102904, North Shore Mail Centre, Auckland, New Zealand
Address correspondence to D. M. Lambert at the address above, or e-mail: d.m.lambert{at}massey.ac.nz.
| Abstract |
|---|
|
|
|---|
Analysis of nucleotide sequence variation at a microsatellite DNA locus revealed extensive size homoplasy of alleles in Adélie penguins (Pygoscelis adeliae). Variation in the flanking regions at this locus allowed discrimination between mechanisms proposed for length changes in microsatellite DNA alleles. We further examined the structure of alleles for the same microsatellite DNA locus across 11 additional species of penguin (Spheniscidae) by mapping allele sequences onto an independent penguin phylogeny. Our analysis indicated that the repeat motifs appear to have evolved independently on several occasions. We observed sequence instability in the region bordering the repeat tract with a transversional bias predominating. We propose that this bias results from inaccurate DNA replication owing to the sequence context of this repeat tract. Because we show that regions flanking repeat sequences exhibit this mutational bias, this cautions against the use of such regions for phylogeny reconstruction.
| Introduction |
|---|
|
|
|---|
Microsatellite DNA loci consist of tandemly repeated nucleotide units that are 16 bp in length. These loci commonly demonstrate high levels of length polymorphism and consequently have become the genetic marker of choice for examining a variety of biological questions. They have been used to examine questions ranging from those specific to individuals, such as identity and sex (e.g., Hagelberg et al. 1991), to parentage and relatedness (e.g., Primmer et al. 1995), population genetic structure (e.g., Roeder et al. 2001) and phylogenetic relationships (e.g., Bowcock et al. 1994). The use of microsatellite DNA data in evolutionary and population genetic studies is dependent on accurate models of microsatellite evolution. In turn, the development of such models necessitates a thorough understanding of the mutational processes occurring at microsatellite loci.
The simplest and most commonly used model of microsatellite evolution is the stepwise mutation model (SMM) (Ohta and Kimura 1973). This model suggests that length change occurs through the addition or deletion of a single microsatellite repeat unit. However, sequence analysis of microsatellite DNA alleles has demonstrated that differences in repeat number are not the only form of variation between microsatellite alleles at a locus. Alleles can also differ by interruptions, such as point mutations in the repeat, as well as by nucleotide substitutions and insertions/deletions (indels) in the flanking sequences. This additional variation present at microsatellite DNA loci can result in the products of distinct evolutionary lineages being categorized as alleles of the same size (alleles are commonly identifies by length alone via electrophoretic mobility). This phenomenon is a form of size homoplasy and can be detected by sequencing alleles. Size homoplasy has been found to be a common feature of microsatellite loci and has been observed in individuals from the same population (e.g.,Viard et al. 1998; Makova et al. 2000), from different populations of the same species (e.g., Orti et al. 1997; Taylor et al. 1999), and between species (e.g., Garza and Freimer 1996; Angers and Bernatchez 1997). In the past, size homoplasy was widely believed to cause problems in the determination of population structure. However, Estoup et al. (2002) concluded that size homoplasy was not a significant problem for many population genetic analyses provided an appropriate model of evolution, such as the SMM model (Ohta and Kimura 1973), and a sufficient number of loci are used. Size homoplasy was predicted to only be problematic in situations involving large population sizes, high mutation rates, and strong allele size constaints (Estoup et al. 2002).
One method to examine the mutational processes occurring at microsatellite DNA loci is to sequence alleles and map them on to a phylogeny constructed from DNA sequence from either the microsatellite flanking region (e.g., Makova et al. 2000) or an independent locus (e.g., Karhu et al. 2000).
In this study, we explored mutational processes and homoplasy at a microsatellite DNA locus within Adélie penguins (Pygoscelis adeliae) and among 11 other species of penguin by sequencing alleles at this locus. The RM3 locus was originally isolated from an Adélie penguin genomic library (Roeder et al. 2001). It has previously been demonstrated to be polymorphic for length in Pygoscelis penguins (Adélie, P. adeliae; Chinstrap, P. antarctica; and Gentoo, P. papua) and monomorphic for length in the remaining 14 penguin species (Roeder et al. 2002).
| Materials and Methods |
|---|
|
|
|---|
RM3 microsatellite alleles were sequenced from penguin samples that had been extracted and genotyped manually on polyacrylamide gels in two previous studies (Roeder et al. 2001, 2002). Homozygotes were sequenced whenever possible. We sequenced alleles from 27 Adélie penguin samples representing four electromorphs (alleles of identical size but not necessarily identical sequence): 177 bp, 179 bp, 181 bp, and 183 bp in length, excluding primers. We also sequenced four alleles from chinstrap penguins representing one electromorph 177 bp in length, excluding primers. The remaining 10 penguin species examined in this study were monomorphic for an electromorph 169 bp in length, excluding primers.
Additional alleles were sequenced for the following number of individuals and species: two little blue (Eudyptula minor), two rockhopper (E. chrysocome), five yellow-eyed (Megadyptes antipodes), three chinstrap (Pygoscelis antarctica), and one each of emperor (Aptenodytes forsteri), African (Spheniscus demersus), Humbolt (S. humbolti), Galápagos (S. mendiculus), Fiordland (E. pachyrhynchus), erect crested (E. sclateri), and Snares penguins (E. robustus). Twenty-four of the Adélie penguins, and all of the samples of other penguin species had been genotyped in the past as homozygotes for allele length at the RM3 locus (Roeder et al. 2001). These samples were directly sequenced from polymerase chain reaction (PCR) products. Three of the Adélie penguin samples were heterozygous for allele length, and therefore each allele was cloned before sequencing. We purified PCR products of both homo- and heterozygotes with the High Pure PCR product purification kit (Roche). Purified PCR products from heterozygotes were cloned using the pGEM-T Easy vector system (Promega) according to the manufacturer's instructions. To avoid cloning or sequencing artefacts, at least six clones per allele were sequenced and used to make a consensus construct for each cloned allele. We sequenced all PCR products in both directions using the BigDye terminator Cycle Sequencing Kit version 1.1 on an ABI Prism 377.
Nucleotide sequences of microsatellite DNA alleles were aligned manually in Sequencher 3.1.1 (Gene Codes) by allowing indels to account for the variation in repeat number and minimizing the number of base mutations in the repeat. Flanking regions contained no indels, allowing straightforward alignment.
We mapped the penguin microsatellite DNA allele sequences onto a maximum-parsimony phylogeny constructed previously for penguins from 985 bp of the small and large subunit of mitochondrial ribosomal RNA (Ritchie 2001). This phylogeny is currently recognised as the best penguin phylogeny available (Banks and Paterson 2004).
| Results |
|---|
|
|
|---|
Eight Adélie and one yellow-eyed penguin electromorph that were homozygous for length exhibited putative point heterozygosity, that is, two different nucleotide bases were present at equal frequency at one or more nucleotide positions either in the repeat or flanking region. In most cases the observed heterozygosity occurred at nucleotide positions that exhibited variation between the different sequence alleles (Tables 1 and 2). These individuals probably contained two alleles of the same length but different sequence and were excluded from further analyses. Additionally we omitted the first 18 bp of the 5' end of the alleles of all species owing to the inability to obtain clean sequence for some alleles when we sequenced with the reverse RM3 primer. This resulted in 136 bp of flanking region, 18 bp preceding the microsatellite repeat region, which we have arbitrarily termed the 5' end, and 118 bp following the repeat, which we have called the 3' end.
|
|
RM3 Sequences in Adélie Penguins
We identified 14 different sequence alleles within the four Adélie penguin electromorphs (Table 1). These sequence alleles differed by mutations in the repeat region and/or in the flanking sequence. These alleles were grouped into two genealogical groups based on two sites where mutations have occurred (Figure 1). Group One was defined by (TA)2 at the 5' end of the microsatellite repeat and T at character 56, and contained five different 177-bp allele sequences and one 181-bp allele. Group Two was defined by a single TA at the 5' end of the microsatellite and a C (or in one case an A) at character 56 and contained four different 177-bp allele sequences, one 179-bp allele, and a 181-bp allele. Within each group, alleles of the same length differed by point mutations in the 3' flanking region. Allele 177j appeared to be intermediate between the two groups because it had a flanking region from each group: a single TA at the 5' end of the microsatellite and T at character 56 in the 3' flank.
|
The RM3 Locus in Other Penguin Species
We sequenced alleles at the RM3 locus in 20 individuals representing 11 other penguin species (Table 2). Apart from chinstrap penguins which were monomorphic for a 177-bp electromorph (and Gentoo penguins, which were not sequenced in this study), all of these penguin species were monomorphic for a 169-bp electromorph at this locus. Although some allele sequences were shared between species, for example yellow-eyed 1 and rockhopper 2 (Table 2), there were also a number of sequence differences among species in both the flanking sequence and repeat region. These alleles could be separated into three groups based on differences in the number of cytosine residues in the repeat region (Figure 2). As in Adélie penguin RM3 alleles, the number of cytosine nucleotides present at the 3' end of the microsatellite DNA repeat region was two (Group A), four (Group B), or six (group C). An exception was the allele sequenced in the little blue penguin, which had five cytosine nucleotides preceded by a thymine nucleotide. Mapping the forms of the RM3 repeat region on to an independent penguin phylogeny (Figure 2) demonstrates that they are not clustered but scattered across the tree.
|
| Discussion |
|---|
|
|
|---|
Our findings illustrate that sequencing penguin microsatellite DNA alleles in a number of closely related species can give insight into the complex mutational processes occurring at these loci, including the occurrence of homoplasy, mutational mechanisms, and flanking sequence instability.
Homoplasy and Mutational Mechanisms of Penguin Microsatellite Alleles
The DNA repeat organization of RM3 alleles detected in this study showed that genetic diversity at this locus is not purely a result of length changes in the microsatellite repeat region. Size homoplasy is inferred for Adélie penguin RM3 alleles because change in repeat number appears to have occurred in alleles with different evolutionary histories (Table 1). For example, the two 181-bp alleles at the RM3 locus were members of different genealogical groups. That is, the 181a allele has a (TA)2 at its 5' end and a T at position 56 in the flanking region; whereas the 181b allele has a single TA at its 5' end and a C at position 56 in the flanking region. Scoring alleles at the RM3 locus by size alone would result in an underestimate of the level of genetic diversity at this locus.
The separation of the Adélie penguin sequence alleles into two groups by markers at the 5' and 3' flanking regions suggests that the majority of length change mutations at the RM3 locus resulted from DNA slippage during replication (Levinson and Gutman 1987) rather than recombination (Harding et al. 1992). However, one allele at the RM3 locus has an intermediate sequence pattern (177j). It could have originated in three possible ways: via recombination, by a back mutation in either the TA repeat or at character 56, or as an ancestral intermediate mutational stage between the two groups.
Instability and Biased Mutation at the End of the RM3 Microsatellite DNA Sequence
A notable observation from Table 1 is the variability in the number of cytosine nucleotides at the 3' end of the Adélie penguin RM3 microsatellite repeat. These cytosines are always present in multiples of two, suggesting that they are the result of a transversion between an A and a C in the microsatellite repeat, rather than the result of slippage during replication of the cytosine nucleotides. Whether the direction of mutation is A
C or C
A, is uncertain because the ancestral state is unknown. Assuming the 3' flanking region mutation at character 56 only occurred once, then the change in the number of cytosine nucleotides has occurred independently in both genealogical groups of alleles. This variation in the number of cytosine nucleotides is also present in the majority of other penguin species resulting in three forms of allele (A, B, and C). These other penguin species are monomorphic for an electromorph of 169 bp, further supporting the hypothesis that change in the number of these cytosine nucleotides has been generated by substitution rather than slippage, because slippage would be expected to generate allele length change.
A mutation at nucleotide site 134, separating all Adélie and chinstrap penguin alleles from those in the other penguin species, suggests that the variation in the number of cytosine nucleotides is a homoplasy between the Pygoscelis group members (i.e., Adélie and chinstrap penguins) and the remaining penguin species in our study. If the mutation at character 134 occurred once, it must have occurred in either a 2C, 4C, or 6C allele, with the remaining two states subsequently regenerating in parallel.
The scattered distribution of the 2C, 4C, and 6C alleles over the phylogeny of the non-Pygoscelis penguins (Figure 2) is also likely to be a result of the independent evolution of the same repeat motif on several occasions. However, an absence of phylogenetically informative flanking region variation in RM3 means that ancestral polymorphism cannot be ruled out. In addition, only a small number of individuals from each species were sampled, so the possibility exists that all forms of the 169-bp allele are present in each species but that not all were detected.
The data suggest that substitutions occur more often at the 3' end of the RM3 CA repeat than in the middle of the repeat region. Instability at the border of (CA)n microsatellite alleles has been observed in previous studies (Grimaldi and Crouau-Roy 1997; Brohede and Ellegren 1999). However, neither of these studies reported a mutational bias toward any particular nucleotide base, as appears to be present at the RM3 locus. In addition, none of the models proposed by Brohede and Ellegren (1999) to explain this instability provide an explanation for why the same mutation, an A
C transversion, has occurred multiple times in the RM3 locus in each Adélie penguin allele group, as well as possibly in the different penguin species.
There are two mutational mechanisms related to the "microsequence context" (Zavolan and Kepler 2001) that could explain the observed mutation directionality at the RM3 locus. In both models, the transversional bias at the RM3 locus is explained by the base substitution being templated by the adjacent guanine nucleotide. First, if the mutations are A
C transversions, then an explanation may be provided by crystallographic studies of DNA duplexes. Timsit (1999) suggested that shifted base pairing can occur during replication of (CA)n tracts when nucleotide bases pair with their direct 5' neighbors on the opposite strand, rather than their Watson-Crick complements. Incorrect nucleotide incorporation templated by the next nucleotide, followed by realignment, could explain the high rate of replication errors at these loci. The observed mutations are also consistent with dislocation mutagenesis, reviewed in Bebenek and Kunkel (2000). Under this model, a mutation results from slippage replication, followed by correct nucleotide incorporation and then the DNA strands realign, causing a terminal mismatch.
Finally, a widely used approach in contemporary evolutionary biology is to map characters on to a phylogeny to better understand their evolutionary histories. A central assumption in this approach is that the phylogeny used is accurate and is independent of the character(s) being studied. In the case of microsatellite DNA loci this approach has been widely used, in many instances using phylogenies derived from flanking sequences (Makova et al. 2000; Orti et al. 1997; Rosetto et al. 2002). Our study suggests the need for caution when using sequences that immediately flank microsatellite DNA loci as phylogenetic markers, because of underlying mutational biases that may occur in these regions. Hence, in these cases, the phylogeny used may not be independent of the loci being investigated.
| Acknowledgments |
|---|
We thank the Marsden Fund of New Zealand (96-MAU-ALS-0030) and Centres of Research Excellence Fund. L.D.S. was supported by a Massey University Masterate Scholarship and a New Zealand Federation of Graduate Women Scholarship. We are grateful to P. Ritchie and A. Roeder for providing technical assistance; L. Perrie, C. S. Baker, and an anonymous reviewer for comments on the manuscript; and C. D. Millar for general advice and comment. We thank the following people for providing samples: G. Elliot, K. Walker, and P. Moore (Department of Conservation, New Zealand); B. Culik (Institut für Meereskunde an der Universität Kiel, Germany); A. Baker (Royal Ontario Museum); C. Hull (University of Tasmania); C. Bradshaw (Otago University); J. González, G. Kooyman (Scripps Institution of Oceanography); J. Darby (Otago Museum), and I. McLean (Otago Museum). We thank Vivian Ward for artwork. Permission to clone Adélie penguin DNA sequence was granted by ERMA permit 97/19.
| Footnotes |
|---|
Corresponding Editor: C. Scott Baker
| References |
|---|
|
|
|---|
-
Angers B and Bernatchez L, 1997. Complex evolution of a salmonid microsatellite locus and its consequence in inferring allelic divergence from size information. Mol Biol Evol 14:230238.[Abstract]
Banks JC and Paterson AM, 2004. A penguin-chewing louse (Insecta: Phthiraptera) phylogeny derived from morphology. Invert System 18:89100.[CrossRef]
Bebenek K and Kunkel TA, 2000. Streisinger revisited: DNA synthesis errors mediated by substrate misalignment. Cold Spring Harbor Symp Quant Biol 65:8191.[CrossRef][Web of Science][Medline]
Bowcock BM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, and Cavalli-Sforza LL, 1994. High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455457.[CrossRef][Medline]
Brohede J and Ellegren H, 1999. Microsatellite evolution: polarity of substitutions within repeats and neutrality of flanking sequences. Proc R Soc Lond B 266:825833.[Medline]
Estoup A, Jarne P, and Cornuet JM, 2002. Homoplasy and mutation model at microsatellite loci and their consequences for population genetic analysis. Mol Ecol 11:15911604.[CrossRef][Medline]
Garza J and Freimer NB, 1996. Homoplasy for size at microsatellite loci in humans and chimpanzees. Genome Res 6:211217.
Grimaldi MC and Crouau-Roy B, 1997. Microsatellite allelic homoplasy due to variable flanking sequences. J Mol Evol 44:336340.[CrossRef][Web of Science][Medline]
Hagelberg E, Bell LS, Allen T, Boyle A, Jones SJ, and Clegg JB, 1991. Analysis of ancient bone DNA: techniques and application. Phil Trans R Soc Lond B 333:339407.[CrossRef]
Harding RM, Boyce AJ, and Clegg JB, 1992. The evolution of tandemly repetitive DNA: recombination rules. Genetics 132:847859.[Abstract]
Karhu A, Dietrich J-H, and Savolainen O, 2000. Rapid expansion of microsatellite sequences in pines. Mol Biol Evol 17:259265.
Levinson G and Gutman G, 1987. Slipped strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203221.[Abstract]
Makova KD, Nekrutenko A, and Baker RJ, 2000. Evolution of microsatellite alleles in four species of mice (Genus Apodemus). J Mol Evol 51:166172.[Web of Science][Medline]
Ohta T and Kimura M, 1973. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet Res 22:201204.[Web of Science][Medline]
Orti G, Pearse DE, and Avise JC, 1997. Phylogenetic assessment of length variation at a microsatellite locus. Proc Natl Acad Sci USA 94:1074510749.
Primmer CR, Moller AP, and Ellegren H, 1995. Resolving genetic relationships with microsatellite markers: a parentage testing system for the swallow Hirundo rustica. Mol Ecol 4:493498.[Medline]
Ritchie PA, 2001. The evolution of the mitochondrial DNA control region in the Adélie penguins of Antarctica Massey University, Palmerston North, New Zealand.
Roeder AD, Marshall RK, Mitchelson AJ, Visagathilagar T, McPartlan HC, Murray ND, Kerry KR, Robinson NA, and Lambert DM, 2001. Gene flow on the ice: genetic differentiation among Adélie penguin colonies around Antarctica. Mol Ecol 10:16451656.[CrossRef][Medline]
Roeder AD, Ritchie PA, and Lambert DM, 2002. New DNA markers for penguins. Conserv Genetics 3:341344.
Rossetto M, McNally J, and Henry RJ, 2002. Evaluating the potential of SSR flanking regions for examining taxonomic relationships in the Vitaceae. Theor Appl Genet 104:6166.[Medline]
Taylor JS, Sanny JSP, and Breden F, 1999. Microsatellite allele size homoplasy in the guppy (Poecilia reticulata). J Mol Evol 48:245247.[Web of Science][Medline]
Timsit Y, 1999. DNA structure and polymerase fidelity. J Mol Biol 293:835853.[CrossRef][Web of Science][Medline]
Viard F, Franck P, Dubois MP, Estoup A, and Jarne P, 1998. Variation of size homoplasy across electromorphs, loci and populations in three invertebrates. J Mol Evol 47:4251.[CrossRef][Web of Science][Medline]
Zavolan M and Kepler TB, 2001. Statistical inferences of sequence-dependent mutation rates. Curr Opin Genet Devel 11:612615.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

