Skip Navigation


Journal of Heredity Advance Access originally published online on December 14, 2004
Journal of Heredity 2005 96(1):40-51; doi:10.1093/jhered/esi005
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
96/1/40    most recent
esi005v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pérez, M.
Right arrow Articles by Presa, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pérez, M.
Right arrow Articles by Presa, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2005 The American Genetic Association

Distribution Properties of Polymononucleotide Repeats in Molluscan Genomes

M. Pérez, F. Cruz, and P. Presa

From the Department of Biochemistry, Genetics and Immunology, University of Vigo, 36810 Spain

Address correspondence to Pablo Presa at the address above, or e-mail: pressa{at}uvigo.es.


    Abstract
 Top
 Abstract
 Materials and Methods
 Results and Discussion
 References
 
A total of 635 DNA sequences from 35 species of mollusks were used as taxonomic support to investigate several distribution features of polymononucleotides in genomic regions of different functionality. We show that all polymononucleotide types in mollusks fit to expectations in exons but not in nonexonic regions, in agreement with a leading role of negative selection on expansions/contractions of transcription-linked poly-(A/T) repeats. The fit of all repeat length types to an exponential decay precludes the existence of a threshold size for replication slippage, a popular but unsatisfactorily explained concept in mutation models for single repeats. The genomic density of poly-(A/T) repeats is not correlated with the DNA content of species, suggesting that the differential density of repeats between species could be better explained by the species-specific performance of its repair mechanisms. This research allows a better understanding of the distribution patterns of single repeats in eukaryotes.


The phylum Mollusca is one of the most evolutionarily prolific groups, and it is expected to become an important model in evolutionary radiation (Schilthuizen 2002) under a molecular perspective. Despite the fact that many basic genomic properties can be addressed in this phylum, molecular genetics in mollusks is far behind the advancements in other eukaryotes. For instance, the 12-fold range in the genome size of mollusks offers an excellent opportunity to test for the correlation between the DNA content of species and several features of their genomes, as the distribution properties of simple nucleotide repeats. Simple repeats or microsatellites are ubiquitous DNA elements of eukaryotic genomes that consist of short combinations of nucleotide sequences repeated in tandem and usually flanked by nonrepetitious sequences (Litt and Luty 1989). Even though microsatellites have been studied in many taxa, their origin, evolutionary mechanisms, functional properties, and genomic organization are not fully understood. It is assumed that most simple repeats have evolved from frameshift mutations through slipped-strand mispairing during DNA replication or repair (e.g., Kornberg et al. 1964). Interhelical junctions during chromosome alignment, base substitutions, and retrotransposition events can also play an unmeasured role in the generation of microsatellites (Wilder and Hollocher 2001).

It has been shown that the overall frequency of microsatellites varies widely across genomes (Lagercrantz et al. 1993), and recent evidence points to their nonrandom genomic distribution (e.g., Bachtrog et al. 1999). Indeed, a differential abundance of repeats in exonic, intronic, and intergenic regions has been observed in different eukaryotes, suggesting that a common genomic strand slippage mechanism is insufficient to explain microsatellite distributions (Ellegren 2000; Tóth et al. 2000; Young et al. 2000). For instance, microsatellite occurrence in exons seems to be limited by nonperturbation of the reading frame and intolerance to expanding amino acid repeat stretches in the encoded proteins (Katti et al. 2001). Among microsatellite motifs, the simple sequences poly-(A/T) and poly-(G/C) have been shown by means of hybridization to be interspersed repetitive elements of eukaryotic DNA (Epplen et al. 1993). It has been suggested that the overrepresentation of short A/T arrays in Mycoplasma genitalium and in yeast, with very few long arrays, is due to their location in coding regions, which makes them unlikely to arise or expand by slippage disrupting open reading frames (ORFs) (Metzgar et al. 2002). In addition, polymononucleotide tracts in eukaryotes form longer runs in introns and nontranslated regions than would be expected on a random basis (Field and Wills 1998). If, as observed in vitro, the elongation rate decreases considerably when the length of the repeat unit increases (Levison and Gutman 1987), then poly-(A/T) runs should exhibit the extreme mutational spectrum among microsatellite motifs, as has been suggested by the A/T content dependence of slippage rates (Schlötterer and Tautz 1992).

In this study we have investigated the occurrence, distribution, density, and length of polymononucleotide repeats in exons, intron-untranslated regions (UTR), and intergenic DNA. For this research we used a database subset of 635 DNA sequences from 35 molluscan genomes, using a genomic DNA library of Mytilus galloprovincialis as the reference species. In order to test if the selective constraints associated with coding DNA could be detected through analysis of the genomic distribution of polymononucleotides, we statistically compared their occurrence in expression-related DNA windows (exons and introns-UTR) versus selectively relaxed intergenic regions. The species-specific mutational bias for repeat genesis and maintenance (tolerance) was indirectly addressed by comparing the distribution properties of polymononucleotides at the three genomic windows considered. The existence of a cutoff threshold size for replication slippage was studied by testing the abundance and distribution of length types within polymononucleotide motifs. Finally, the hypothesized correlation between genome size and repeat density (e.g., Hancock 1995, 1996) was studied by multiple correlation tests between several polymononucleotide features and genome size.


    Materials and Methods
 Top
 Abstract
 Materials and Methods
 Results and Discussion
 References
 
DNA Data Mining and Control Experiments
In order to assess the frequency of poly-(A/T) and poly-(G/C) tracts in molluscan genomes, we made a systematic survey of published DNA sequences (Appendix 1) using GenBank release 127.0 (http://www3.ncbi.nlm.nih.gov/) (Figure 1). The genomic sequences sampled were 225 for class Bivalvia (subclasses Pteriomorphia and Heteroconchia), 125 for class Cephalopoda (subclass Coleoidea), and 285 for class Gastropoda (subclasses Archaeogastropoda, Caenogastropoda, and Pulmonata). The taxonomic groups were defined according to the European Register of Marine Species (available at http://www.erms.biol.soton.ac.uk/lists/full/Mollusca.shtml). After applying several filtering steps using the BLAST software package (Altschul et al. 1997) and coding sequence (CDS) information, a nonredundant dataset of 456 kb of Mollusca DNA was broken down into three genomic windows—e, exons; i, introns plus transcribed but not translated DNA (5'-UTR and 3'-UTR); and nc, intergenic DNA. To discriminate between genome windows with accuracy, we used the information provided in the CDS feature of the GenBank entries. Exons were assigned by comparison of messenger RNA (mRNA) and genomic sequences. Flanking regions 5' and 3' were defined as the adjacent parts of the DNA outward from the 5' and 3' ends of a gene entry. Anonymous sequences derived from entries containing no CDS line or ORFs were classified as noncoding DNA, as in those entries featuring microsatellite library screening (Metzgar et al. 2000). These genome windows were screened for polymononucleotide repeats (n ≥ 5) using the online program Repeat Finder 4.0 (http://www.genet.sickkids.on.ca/~ali/repeatfinder.html), considering as single motifs the repeats with reverse complements of each other (An/Tn and Gn/Cn). In order to assess the reliability of polymononucleotide frequency estimated from database entries (mostly on both copy DNA [cDNA] and microsatellite clones) we sequenced 27 randomly chosen recombinant clones from the genomic library of M. galloprovincialis. The genome library was constructed as described in Presa et al. (2002). Polymononucleotide tracts were systematically screened on those clones as described for database sequences (Figure 1).



View larger version (26K):
[in this window]
[in a new window]
 
Figure 1.. DNA data mining and control experiments.

 
Mathematical Calculations
A quick guide to the calculations performed is shown in Table 1. The haploid DNA length of each species (i.e., nine Bivalvia, three Cephalopoda, and nine Gastropoda) was calculated from their haploid DNA content. The unavailable genome sizes of Loligo spp., Haliotis spp., and Littorina spp. were estimated from the average genome size of their genus. Using the C-value and the observed average spacing between consecutive tracts, we inferred the density and number of poly-(A/T) and poly-(G/C) loci in whole genomes by compensating for the overrepresentation of expressed sequences in databases. The density of polymononucleotide loci in large genomes depends mainly on the amount of noncoding DNA (Bachtrog et al. 1999), although the number of coding sequences gains representativeness in small genomes. Given that the coding DNA fraction is inversely correlated with the genome size (e.g., 54% in Caenorhabditis elegans (C = 0.1 Gb and 10–15% in mammals (C = 1 Gb) (Cavalier-Smith 1978), the average genome size of mollusks (2.02 Gb in this study) could contain a plausible 30% of genes. Since the proportion of exons in coding DNA is inversely correlated to genome complexity (e.g., 13% in humans and 24% in Drosophila) (Gall 1981), we believe that exons span about 30% of the coding DNA of mollusks. Therefore estimates of the total number of repeats in the three windows were obtained by extrapolating figures from this selection to the whole genome.


View this table:
[in this window]
[in a new window]
 
Table 1.. Mathematical calculations employed to compute observed and expected densities of polymononucleotide tracts

 

View this table:
[in this window]
[in a new window]
 
Table 3.. Spacinga and average lengthb of poly-(A/T) tractsc in DNA windowsd of molluscan genomese

 

View this table:
[in this window]
[in a new window]
 
Table 4.. Spacinga and average lengthb of poly-(G/C)c tracts in DNA windowsd of molluscan genomese

 
Statistical Methods
A summary of the statistical methods employed is provided in Table 2. Repeat spacing per window was calculated only if a given repeat type occurred more than once in a window segment. Therefore the spacing of long An/Cn tracts exceeding the screened window size was calculated using the cumulative length screened per taxonomic class. Since repeats were nonnormally distributed within the length type due to the variable per-window genome sizes between species (e.g., 36.46 kb for Anadara trapecia and 3.66 kb for Cepaea nemoralis), there abundance was compared using nonparametric tests. No comparisons of raw data were feasible within repeat type between windows (e.g., A5 in exons versus A5 in introns) due to the different size of windows. The average length and spacing, being less sensitive than the raw abundance to differences in window size, were directly compared between windows using parametric methods. Statistical tests were performed using SPSS 10.1 software.


View this table:
[in this window]
[in a new window]
 
Table 2.. Analytical methods applied in the statistical treatment of data

 

    Results and Discussion
 Top
 Abstract
 Materials and Methods
 Results and Discussion
 References
 
Genomic Abundance of Polymononucleotide Repeats in Mollusks
A total of 635 sequences from 35 Mollusca species were screened to compare the densities of polymononucleotide motifs in exons, intron-UTR, and intergenic DNA. Polymononucleotide tracts were classified into length types from 5 bp (Appendix 2). Repeats larger than (A/T)9 (15 types up to 50 repeats) or (G/C)9 (4 types up to 21 repeats) were observed only once, if any, throughout the screening (96% of A/T tracts and 100% of G/C tracts were shorter than 10 repeats). The distribution of polymononucleotides was hierarchically analyzed per taxa (species, class, and phylum), per window, and per repeat type, and some of their properties, such as the average length and spacing, were interpreted in a genomic context.

The greater abundance of poly-(A/T) tracts (1773, or 85.94%) related to poly-(G/C) tracts (290, or 14.06%) seems to be a consistent feature across the molluscan genomes screened, as has also been observed in plants (90.24% An and 9.76% Gn) (Lagercrantz et al. 1993), arthropods (88.83% An and 11.17% Gn) (Tóth et al. 2000), and vertebrates (82.01% An and 17.99 Gn) (Tautz et al. 1986). This large density of An tracts is thought to be due to a higher slippage susceptibility of the A/T motif with respect to the G/C motif (Schlötterer and Tautz 1992), coupled with a larger A/T content of genomes (Bachtrog et al. 1999). The genomic abundance of polymononucleotide loci in mollusks, calculated using their weighted densities per window (3.18% of the genome: 2.79% for A/T tracts and 0.39% for G/C tracts) (Figure 2) was higher than for most microsatellite motifs reported in eukaryotes; for example, Tetraodon nigroviris (3.21% for overall motifs) (Crollius et al. 2000), Drosophila (0.3% for dinucleotide motifs) (Bachtrog et al. 1999), and sea urchin (0.7% for A/T tracts and 0.2% for G/C tracts) (Tautz and Renz 1984). An interesting working hypothesis arising from this study is the putative causal correlation between a high density of polymononucleotides, considered as protomicrosatellites, and the scarcity of larger microsatellite motifs, as observed in several genomic libraries of M. galloprovincialis (e.g., Presa et al. 2002).



View larger version (13K):
[in this window]
[in a new window]
 
Figure 2.. Percentage of the DNA content occupied by (A/T)≥5 tracts (grey bars) and (G/C)≥5 tracts (white bars) for several mollusks. The names of the species are abbreviated with the initial of the genus in capitals, followed by the first letters of the species' name.

 
Threshold Length of Arrays for Replication Slippage
A great abundance of short arrays has been observed in coding and noncoding DNA of various eukaryotes (Katti et al. 2001; Provan et al. 1999). Those results have been explained as an effective "cutoff" effect consisting of a threshold array size of five to seven repeat units (absorption state), below which replication slippage might not work (Rose and Falush 1998). Therefore if a minimum threshold size for slippage exists, it would preclude the fit of microsatellite abundance to an exponential decay with length. This question has been addressed here by analyzing the abundance and spacing between repeat types within mononucleotides.

We have observed a significant threshold-like size of arrays at (A/T)6-7 and (G/C)6-7 across windows and taxa, below which the abundance (Appendix 3) and spacing (Appendix 4) of length types differed significantly from larger arrays. However, this apparent "cutoff" length would be biologically meaningless since it is the expected result under an exponential decay of abundance with increasing tract length (Figure 3). This result is in agreement with those reported for other eukaryotes (e.g., Wierdl et al. 1997), especially for dinucleotide repeats ≤ 5 of Saccharomyces cerevisiae (Pupko and Graur 1999). Therefore "shorter microsatellites have the potential to expand, with a correspondingly lower probability than longer microsatellites" (Pupko and Graur 1999).



View larger version (12K):
[in this window]
[in a new window]
 
Figure 3.. Expected (diamond) and observed distributions of (A/T)-repeats (up triangle) and (G/C)-repeats (down triangle) in (A) exons, (B) intron-UTRs, and (C) noncoding DNA (intergenic DNA) of molluscan genomes.

 
Differential Density of Repeats in Exons, Introns-UTR, and Intergenic DNA
It has been said that an exponential distribution of length types that follows a Poisson process would be indicative of a uniform distribution of microsatellites in genomes (Kruglyak et al. 2000). However, the statistics used by those authors to test the fit of abundance to an exponential distribution are flexible tools that might be insensitive to both the differential density of tracts between genome windows and the scaled departures from expected exponential distributions by all length types.

In this study, several observations focus to a nonrandom distribution of repeats across genomes. First, if repeats were randomly distributed, their genomic abundance would be similar, either calculating them with the average genome density or using the weighted density per genomic window. The genomic abundance of polymononucleotide loci calculated with the average spacing (one A/T tract every 0.258 kb, one G/C tract every 1.564 kb, and 2.02 ± 0.96 Gb as the average genome size of the species studied) was 7.77 x 106 (A/T)≥5 loci and 1.28 x 106 (G/C)≥5 loci. These figures differed from those calculated by weighting the densities per window (one A/T tract every 0.212 kb and one G/C tract every 1.473 kb) (Tables 3 and 4, respectively), which were 9.52 x 106 (A/T)≥5 loci (6.66 x 106 loci in intergenic DNA, 2.43 x 106 loci in introns, and 0.42 x 106 loci in exons), and in 1.37 x 106 (G/C)≥5 loci (1.10 x 106 loci in intergenic DNA, 0.19 x 106 loci in introns-UTR, and 0.08 x 106 loci in exons).

Second, the weighted abundance of repeats being larger that the average would imply the overrepresentation of repeats in one or more genomic windows. This hypothesis has been addressed by testing the fit of repeat length types to their expected distribution per window. Positive exponential regressions were observed between repeat length type and the abundance of both poly-(A/T) tracts (R2 = 0.817, F = 58.06, P < .001) and poly-(G/C) tracts (R2 = 0.633, F = 18.94, P < .01) in all windows, as expected from their fit to exponential distributions (Kolmogorov-Smirnov test; Z = 0.639, P = .809, and Z = 1.162, P = .134, respectively). Consequently a negative correlation was obtained between the number of tracts and the poly-(A/T) length ({rho} = –0.938, N = 15, P < .001), as well as for poly-(G/C) tracts ({rho} = -0.801, N = 13, P < .001). All (A/T)≥5 types adjusted to their expected densities in exons (G9 = 11.42, P = .304) (Figure 3A) and were overrepresented in introns-UTR (G13 = 70.01, P < .0001) (Figure 3B) and intergenic DNA (G21 = 71.90, P < .0001) (Figure 3C). Poly-(G/C)≥5 tracts were underrepresented in exons (G2 = 353.62, P < .0001) (Figure 3A) and introns-UTR (G9 = 26.23, P < .0001) (Figure 3B). Poly-(G/C)<8 were underrepresented in intergenic DNA (G3 = 24.19, P < .0001), while poly-(G/C)>>8 were overrepresented (G8 = 508.81, P < .0001) (Figure 3C).

In summary, A/T types fit expectations in exons, but were largely overrepresented in introns-UTR and intergenic DNA. Polymononucleotide G/C types were soundly underrepresented in exons and introns-UTR, but all tracts of G/C>8 were overrepresented in intergenic DNA. These results indicate a nonuniform genomic distribution of repeats, evidenced by the overrepresentation of poly-(A/T) tracts (17.75% excess) and poly-(G/C) tracts (4.68% excess) in the genomes analyzed. It should be noted that this overrepresentation does not preclude the fit of all repeat length types to an exponential decay in genomes and windows, as the abundance of all repeat types decreases as the length of the tandem increases.

Third, the differential distribution of repeats between windows can be straightforwardly tested using the weighted spacing per window. The spacing of poly-(A/T) tracts was significantly shorter in introns-UTR (one every 0.171 kb) and intergenic DNA (one every 0.214 kb) than in exons (one every 0.412 kb) (t test; P < .05) (Table 3). The spacing of poly-(G/C) tracts was shorter in intergenic DNA (one every 0.990 kb) than in exons (one every 1.740 kb) (t test; P < .05) or in introns-UTR (one every 3.104 kb) (t test; P < .01) (Table 4). This consistent differential density of polymononucleotide tracts between windows is in agreement with their nonrandom distribution suspected after the calculation of the overall abundance of loci using weighting methods. Nonrandom distribution of microsatellites has also been reported for other taxa (Bachtrog et al. 1999; Field and Wills 1998; Katti et al. 2001; Metzgar et al. 2000; Morgante et al. 2002) and begins to be a consistent feature across eukaryotic genomes.

The general belief in the neutrality of microsatellites comes from several properties, such as their ubiquity (e.g., Estoup et al. 1993), their great abundance and polymorphism (Weber et al. 1990), and their variable persistence times and array sizes. Nevertheless, it seems reasonable that the distribution of coding-linked microsatellites differs from that of their counterparts in noncoding DNA (Nekrutenko and Li 2000). Our results support a differential pattern of microsatellite distribution in the coding and noncoding regions of mollusks as maintained by differential selection (Tóth et al. 2000). For instance, the fit of all (A/T)n-length types to their expected abundance in exons suggests the involvement of selective constraints in the dynamics of their expansion and contraction (Hancock 1995). Also, it has recently been shown in eukaryotes that the combination of a proofreading defect with a mismatch repair deficiency results in an extreme microsatellite instability across the whole genome (Degtyareva et al. 2002). Therefore, to explain the equilibrium distributions of transcription-linked polymononucleotide repeats under a common genomic repair system, the action of selective constraints hampering the free expansion/contraction of repeats should be considered (Nachman 2001).

No significant differences for any distribution property of polymononucleotides were observed between the experimental control window of M. galloprovincialis and the noncoding DNA windows of molluscan species retrieved from databases. The overrepresentation of most length types in intergenic DNA can be indicative of replication slippage coupled with a negligible impact of selection (Field and Wills 1998). However, the weaker constraints on allele length variation that apply to intergenic DNA cannot explain the underrepresentation of (G/C)<8 tracts, which could be due to a nucleotide compositional bias and/or to preferred slippage properties of the growing chain in replication.

Differential Length and Spacing of Repeats
The differential distribution of polymononucleotides between genomic windows can also be tested under a comparative genomics perspective. The significant lower density (larger spacing) of both poly-(A/T) tracts in exons (t test; P = .006) and poly-(G/C) tracts in introns-UTR (t test; P < .01) detected above at the window level were confirmed in most of the species and taxonomic classes analyzed (Appendixes 5 and 6). However, if the forces that shape the distribution of repeats at each window were nuanced per species, they should result in differential average length and spacing of repeats between species and/or between classes of mollusks, as has also been reported for other taxa (Harr and Schlötterer 2000).

Overall, molluscan species showed a different average length of (A/T) tracts between all pairs of windows. Those differences were due to a consistent shorter length of repeats in exons than in both introns-UTR and intergenic DNA (t test; P < .001) (Appendix 7), as has also been observed for other motifs in transcribed regions of plants (Morgante et al. 2002). However, while nonsignificant length differences were observed for both polymononucleotides between species within classes and within windows (Appendix 7), the average length of poly-(A/T) tracts differed in most pairwise comparisons between molluscan classes within windows (t test; P < .001) (Appendixes 5 and 7). In contrast to the average length, the average spacing of poly-(A/T) tracts differed within classes between species (t test; P < .001), but no differential spacing of repeats was observed between pairs of classes at any window (Appendix 7). This greater length divergence between classes suggests the influence of some genetically based, within-class mechanism, such as mismatch repair efficiency (Ellegren 2002b; Harr et al. 2002). However, the conspicuous differences in repeat spacing observed within classes suggest that repeat spacing would not be under the same genetic control as the repeat length, but is probably related to other species-specific evolutionary phenomena such as the DNA content of species (Neff and Gross 2001; and see later in this section).

The mutational properties of simple repetitive DNA suggest that slipping strand mispairing plays a major role in the evolution of many repeats (Miklos 1985). The slippage impact on one genome seems to be dependent on mismatch repair systems (Ellegren 2002b; Harr et al. 2002; Oda et al. 1997) and proofreading defects (e.g., Degtyareva et al. 2002). Therefore the evolutionary conservation of a mismatch repair system between closely related species could explain their similar repeat lengths in all genome windows. It has been suggested from simulation data that the evolutionary persistence of long arrays remains short due to both point mutations (Kruglyak et al. 2000; Santibáñez-Koref et al. 2001) and sister chromatid exchange (Walsh 1987). However, the unequal crossover mechanism is limited primarily by the length of repetitive sequences available for unequal pairing (Jinks-Robertson et al. 1993), thus it is likely that no deep interspecific differences in microsatellite density are due to this mechanism (Bachtrog et al. 1999). It should also be noted that retroposition events mediated by integrative genomic sites (homology-driven integration) can account for poly-A density in many eukaryotes (Nadir et al. 1996; Wilder and Hollocher 2001) and could explain the differential density of poly-A tracts across genomes related to species-specific mobilization of retroposons.

One remarkable observation is the differential behavior of poly-(G/C) tracts as compared to poly-(A/T) tracts, the former showing much smaller length and spacing divergence between taxa and between windows than the latter. This result suggests that slippage in molluscan genomes mainly depends on the A-T content of the sequences involved in replication (Schlötterer and Tautz 1992).

Correlation Between DNA Content and Polymononucleotide Density
The differences in density and distribution of microsatellites among taxa depend not only on a balance of mutational mechanisms (Ellegren 2002a; Zhu et al. 2000), but also on the DNA content of the species (Hancock 1996). If the amount of coding DNA is conserved between close taxa, a putative differential density of repeats between taxa would depend on the amount of noncoding DNA (Neff and Gross 2001; Primmer et al. 1997). To test this hypothesis we examined the relationships between genome size and both density (spacing) and length of polymononucleotide tracts in molluscan genomes of known DNA content. The molluscan species studied had a 12-fold range in genome size (from 0.40 Gb to 4.98 Gb), which is attributable to amplification of noncoding DNA (Cavalier-Smith 1978). A caveat for the estimation of repeat density must be kept in mind since (1) correlations have been performed for only nine species where both DNA content and polymononucleotide density per window were known, and (2) each density figure had a large associated error of variance which we have tried to minimize by multiple regressions and careful screening of sequences.

The average length of A/T tracts was positively correlated with the DNA content of species ({rho} = 0.706, N = 15, P = .034) and also with the number of (A/T)n loci per genome ({rho} = 0.883, N = 15, P < .002), yet this correlation was not observed within windows. Neither the DNA content of species nor the average length were correlated with the spacing of both polymononucleotides across genomes. However, the average length of poly-(A/T) tracts was positively correlated with their spacing in exons ({rho} = –0.547, N = 19, P = .003) and introns-UTR ({rho} = –0.405, N = 18, P = .050), but not in intergenic DNA. Multiple regression analyses for the above parameters in G/C tracts did not show any significant outcome.

The present data indicate that the absolute number of repeats increases with the DNA content (Primmer et al. 1997). This correlation must be interpreted carefully because the DNA content was used to calculate the number of loci, which could lead to a part-whole correlation (Sokal and Rohlf 1995). The positive correlation between the average (A/T)n length and the DNA content of species suggests greater expanding properties of larger genomes, as has been suggested from hybridization experiments on divergent taxa (Hamada et al. 1982). Even though our data on mononucleotide repeats are in general agreement with previous studies on several microsatellite motifs (Metzgar et al. 2000), they are counterintuitive to those reporting an increasing density of microsatellites with genome size in animals (Hancock 1996). Indeed, the absence of correlation between the DNA content of species and repeat spacing argues against an expandability-dependent density of repeats. Furthermore, the significant differences in repeat spacing between species suggest that repeat density could be more influenced by species-specific mechanisms such as short and long interspersed elements (SINEs and LINEs, respectively), which contain poly-(A/T) tails and are thought to be an important source of A-rich microsatellites (Arcot et al. 1995). Despite the positive correlation observed between length and the spacing at exons and introns-UTR, the hypothesis of a differential density of repeats between species, which depends on the amount of noncoding DNA (Neff and Gross 2001), is not seen in the intergenic regions of the mollusks studied. Therefore the abundance of poly-A repeats would tend to increase proportionally with the size of the window, but not its density.

A second hypothesis related with DNA content is if larger genomes would have more and longer polymononucleotide repeats than small genomes. To address this question we compared the two mononucleotide motifs for various parameters. The positive correlations between poly-(A/T) and poly-(G/C) repeats for the number of loci ({rho} = 0.750, N = 9, P = .020), the genome length occupied by repeats ({rho} = 0.683, N = 9, P = .042), and the average allele length ({rho} = 0.406, N = 19, P = .012) (Tables 3 and 4) suggest that the general trend of both motifs is to increase with the DNA content. This could mean that larger genomes would have more and longer polymononucleotide tracts than small genomes. However, the absence of correlation between mononucleotide motifs for both the repeat spacing and the percentage of genome occupation (Figure 2) indicates that their densities would not increase likewise as the DNA content increases. This result agrees with the idea that that under random microsatellite distribution, the density of both repeats should be independent of each other (Bachtrog et al. 1999).

In this data-mining analysis of molluscan sequences we observed a differential pattern of microsatellite distribution in genomic windows of different functionality which could be explained by differential selection. The fit of repeat abundance to an exponential decay with length also suggests the absence of a critical cutoff length for microsatellite expansion and contraction. Under a taxonomic perspective, the larger length divergence between classes suggests the intraclass conservation of genetically based mechanisms as mismatch repair efficiency (Ellegren 2002b; Harr et al. 2002). However, the conspicuous differences in repeat spacing within classes indicates that repeat spacing would not be under the same genetic control as repeat length. Empirical studies focusing on these nonrandom patterns of polymononucleotide distributions as well as on the putative causal correlation between a high density of polymononucleotides as protomicrosatellites and the scarcity of larger microsatellite motifs are necessary to better understand these processes.


    Acknowledgments
 
This research was supported by grant BIO2001-3659 (to P.P.) from the Ministerio Español de Ciencia y Tecnología, using funds from PGE and FEDER, and by PhD grants from MCYT and Xunta de Galicia (to M.P. and F.C., respectively). The authors are indebted to three anonymous reviewers and the editor for their remarks on a previous version of this manuscript. The appendices containing raw data and statistical analyses are downloadable from the authors' website at http://webs.uvigo.es/c03/webc03/XENETICA/XB4/apartados/publications/publications_archivos/Appendices_1.pdf.


    Footnotes
 
Corresponding Editor: Rob DeSalle

Received August 21, 2003
Accepted July 21, 2004


    References
 Top
 Abstract
 Materials and Methods
 Results and Discussion
 References
 

    Ahmed M and Sparks AK, 1967. A preliminary study of chromosomes of two species of oysters (Ostrea lurida and Crassostrea gigas). J Fish Res Board Can 24:2155–2159.

    Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ, 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402.[Abstract/Free Full Text]

    Arcot SS, Wang Z, Weber JL, Deininger PL, and Batzer MA, 1995. Alu repeats: a source for the genesis of primate microsatellites. Genomics 29:136–144.[CrossRef][ISI][Medline]

    Bachtrog D, Weiss S, Zangerl B, Brem G, and Schlötterer C, 1999. Distribution of dinucleotide microsatellites in the Drosophila melanogaster genome. Mol Biol Evol 16:602–610.[Abstract]

    Britten RJ and Davidson EH, 1971. Repetitive and non-repetitive DNA sequences and a speculation on the origin of evolutionary novelty. Q Rev Biol 46:111–138.[CrossRef][Medline]

    Cavalier-Smith T, 1978. Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate, and the solution of the DNA C-value paradox. J Cell Sci 34:247–278.[Abstract]

    Crollius HR, Jaillon O, Dasilva C, Ozouf-Costaz C, Fizames C, Fischer C, Bouneau L, Billault A, Quertier F, Saurin W, Bernot A, and Weissenbach J, 2000. Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. Genome Res 10:939–949.[Abstract/Free Full Text]

    Degtyareva NP, Greenwell P, Hofmann ER, Hengartner MO, Zhang L, Culotti JG, and Petes TD, 2002. Caenorhabditis elegans DNA mismatch repair gene msh-2 is required for microsatellite stability and maintenance of genome integrity. Proc Natl Acad Sci USA 99:2158–2163.[Abstract/Free Full Text]

    Ellegren H, 2000. Microsatellite mutations in the germline: implications for evolutionary inference. Trends Genet 16:551–558.[CrossRef][ISI][Medline]

    Ellegren H, 2002a. Microsatellite evolution: a battle between replication slippage and point mutation. Trends Genet 18:70.

    Ellegren H, 2002b. Mismatch repair and mutational bias in microsatellite DNA. Trends Genet 18:552.[CrossRef][Medline]

    Epplen C, Melmer G, Siedlaczck I, Schwaiger F-W, Mäueler W, and Epplen JT, 1993. On the essence of "meaningless" simple repetitive DNA in eukaryote genomes In: DNA fingerprinting: state of the science (Pena SDP, Chakraborty R, Epplen JT, and Jeffreys AJ, eds). Basel, Switzerland: Birkhäuser Verlag; 29–45.

    Estoup A, Presa P, Krieg F, Vaiman D, and Guyomard R, 1993. (CT)n and (AC)n microsatellites; a new class of genetic markers for Salmo trutta L. (brown trout). Heredity 71:488–496.

    Field D and Wills C, 1998. Abundant microsatellite polymorphism in Saccharomyces cerevisiae, and the different distributions of microsatellites in eight prokaryotes and S. cerevisiae, result from strong mutation pressures and a variety of selective forces. Proc Natl Acad Sci USA 95:1647–1652.[Abstract/Free Full Text]

    Gall JG, 1981. Chromosome structure and the C-value paradox. J Cell Biol 91:3–14.

    Gregory TR, 2001. Animal genome size database (visited May 29, 2003) http://www.genomesize.com.

    Hamada H, Petrino MG, and Kakunaga T, 1982. A novel repeated element with Z-DNA-forming potential is widely found in evolutionarily diverse eukaryotic genomes. Proc Natl Acad Sci USA 79:6465–6469.[Abstract/Free Full Text]

    Hancock JM, 1995. The contribution of slippage-like processes to genome evolution. J Mol Evol 41:1038–1047.[ISI][Medline]

    Hancock JM, 1996. Simple sequences in a "minimal" genome. Nat Genet 14:14–15.[CrossRef][ISI][Medline]

    Harr B and Schlötterer C, 2000. Long microsatellite alleles in Drosophila melanogaster have a downward mutation bias and short persistence times, which cause their genome-wide underrepresentation. Genetics 155:1213–1220.[Abstract/Free Full Text]

    Harr B, Todorova J, and Schlötterer C, 2002. Mismatch repair-driven mutational bias in D. melanogaster. Mol Cell 10:199–205.[CrossRef][ISI][Medline]

    Jinks-Robertson S, Michelitch M, and Ramcharan S, 1993. Substrate length requirements for efficient mitotic recombination in Saccharomyces cerevisiae. Mol Cell Biol 13:3937–3950.[Abstract/Free Full Text]

    Katti MV, Ranjekar PK, and Gupta VS, 2001. Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol 18:1161–1167.[Abstract/Free Full Text]

    Kornberg A, Bertsch LL, Jackson JF, and Khorana HG, 1964. Enzymatic synthesis of deoxyribonucleic acid. XVI. Oligonucleotides as templates and the mechanisms of their replication. Proc Natl Acad Sci USA 51:315–323.[Free Full Text]

    Kruglyak S, Durrett RT, Schug MD, and Aquadro CF, 2000. Distribution and abundance of microsatellites in the yeast genome can be explained by a balance between slippage events and point mutations. Mol Biol Evol 17:1210–1219.[Abstract/Free Full Text]

    Lagercrantz U, Ellegren H, and Andersson L, 1993. The abundance of various polymorphic microsatellite motifs differs between plants and vertebrates. Nucleic Acids Res 21:1111–1115.[Abstract/Free Full Text]

    Levison G and Gutman GA, 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol 4:203–221.[Abstract]

    Litt M and Luty JA, 1989. A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am J Hum Genet 44:397–401.[ISI][Medline]

    Metzgar D, Bytof J, and Wills C, 2000. Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res 10:72–80.[Abstract/Free Full Text]

    Metzgar D, Liu L, Hansen C, Dybvig K, and Wills C, 2002. Domain-level differences in microsatellite distribution and content result from different relative rates of insertion and deletion mutations. Genome Res 12:408–413.[Abstract/Free Full Text]

    Miklos GLG, 1985. Localized highly repetitive DNA sequences in vertebrate and invertebrate genomes In: Molecular evolutionary genetics (Macintyre RJ, ed). New York: Plenum; 241–321.

    Morgante M, Hanfey M, and Powell W, 2002. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet 30:194–200.[CrossRef][ISI][Medline]

    Nachman MW, 2001. Single nucleotide polymorphisms and recombination rate in humans. Trends Genet 17:481–485.[CrossRef][ISI][Medline]

    Nadir E, Margalit H, Gallily T, and Ben-Sasson SA, 1996. Microsatellite spreading in the human genome: evolutionary mechanisms and structural implications. Proc Natl Acad Sci USA 93:6470–6475.[Abstract/Free Full Text]

    Neff BD and Gross MR, 2001. Microsatellite evolution in vertebrates: inference from AC dinucleotide repeats. Evolution 55:1717–1733.[CrossRef][ISI][Medline]

    Nekrutenko A and Li W-H, 2000. Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Res 10:1986–1995.[Abstract/Free Full Text]

    Oda S, Oki E, Maehara Y, and Sugimachi K, 1997. Precise assessment of microsatellite instability using high resolution fluorescent microsatellite analysis. Nucleic Acids Res 25:3415–3420.[Abstract/Free Full Text]

    Presa P, Montse P, and Diz AP, 2002. Polymorphic microsatellite markers for blue mussels (Mytilus spp.). Conserv Genet 3:441–443.[CrossRef]

    Primmer CR, Raudsepp T, Chowdhary BP, Møller AP, and Ellegren H, 1997. Low frequency of microsatellites in the avian genome. Genome Res 7:471–482.[Abstract/Free Full Text]

    Provan J, Soranzo N, Wilson NJ, Goldstein DB, and Powell W, 1999. A low mutation rate for chloroplast microsatellites. Genetics 153:943–947.[Abstract/Free Full Text]

    Pupko T and Graur D, 1999. Evolution of microsatellites in the yeast Saccharomyces cerevisiae: role of length and number of repeat units. J Mol Evol 48:313–316.[CrossRef][ISI][Medline]

    Rose O and Falush D, 1998. A threshold size for microsatellite expansion. Mol Biol Evol 15:613–615.[ISI][Medline]

    Santibáñez-Koref MF, Gangeswaran R, and Hancock JM, 2001. A relationship between lengths of microsatellites and nearby substitution rates in mammalian genomes. Mol Biol Evol 18:2119–2123.[Free Full Text]

    Schilthuizen M, 2002. Mollusca: an evolutionary cornucopia. Trends Ecol Evol 17:8–9.[CrossRef]

    Schlötterer C and Tautz D, 1992. Slippage synthesis of simple sequence DNA. Nucleic Acids Res 20:211–215.[Abstract/Free Full Text]

    Sokal RR and Rohlf FJ, 1995. Biometry San Francisco: W. H. Freeman.

    Tautz D and Renz M, 1984. Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Res 12:4127–4138.[Abstract/Free Full Text]

    Tautz D, Trick M, and Dover GA, 1986. Cryptic simplicity in DNA is a major source of genetic variation. Nature 322:652–656.[CrossRef][Medline]

    Tóth G, Gáspári Z, and Jurka J, 2000. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res 10:967–981.[Abstract/Free Full Text]

    Walsh JB, 1987. Persistence of tandem arrays: implications for satellite and simple-sequence DNAs. Genetics 115:553–567.[Abstract/Free Full Text]

    Weber JL, Kitek AE, May PE, Polymeropoulos MH, and Ledbetter S, 1990. Dinucleotide repeat polymorphism at the DXS453, DXS454 and DXS458 loci. Nucleic Acids Res 18:4037.[Free Full Text]

    Wierdl M, Dominska M, and Petes TD, 1997. Microsatellite instability in yeast: dependence on the length of the microsatellite. Genetics 146:769–779.[Abstract]

    Wilder J and Hollocher H, 2001. Mobile elements and the genesis of microsatellites in dipterans. Mol Biol Evol 18:384–392.[Abstract/Free Full Text]

    Young ET, Sloan JS, and Riper KV, 2000. Trinucleotide repeats are clustered in regulatory genes in Saccharomyces cerevisiae. Genetics 154:1053–1068.[Abstract/Free Full Text]

    Zhu Y, Strassmann JE, and Queller DC, 2000. Insertions, substitutions, and the origin of microsatellites. Genet Res 76:227–236.[CrossRef][ISI][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
96/1/40    most recent
esi005v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Pérez, M.
Right arrow Articles by Presa, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pérez, M.
Right arrow Articles by Presa, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?