Journal of Heredity Advance Access originally published online on December 15, 2004
Journal of Heredity 2005 96(2):161-166; doi:10.1093/jhered/esi023
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Brief Communication |
Power of Microsatellite Markers for Fingerprinting and Parentage Analysis in Eucalyptus grandis Breeding Populations
From the Plant Genetics Laboratory EMBRAPA Recursos Genéticos e Biotecnologia, Parque Estação Biológica, Brasília 70770-970 DF, Brazil (Kirst, Cordeiro, and Grattapaglia); Research and Technology Center Aracruz Celulose S.A. C.P. 331011 Aracruz 29197-000 ES, Brazil (Rezende); and Genomic Sciences Laboratory, Universidade Catolica de Brasília SGAN 916 modulo B, Brasília 70790-160 DF (Grattapaglia)
Address correspondence to Dario Grattapaglia, Laboratório de Genética de Plantas EMBRAPA Recursos Genéticos e Biotecnologia, Parque Estação Biológica, Brasília 70770-970 DF, Brazil, or e-mail: dario{at}cenargen.embrapa.br.
| Abstract |
|---|
|
|
|---|
We report the genetic analysis of 192 unrelated individuals of an elite breeding population of Eucalyptus grandis (Hill ex Maiden) with a selected set of six highly polymorphic microsatellite markers developed for species of the genus Eucalyptus. A full characterization of this set of six loci was carried out generating allele frequency distributions that were used to estimate parameters of genetic information content of these loci, including expected heterozygosity, polymorphism information content (PIC), power of exclusion, and probability of identity. The number of detected alleles per locus ranged from 6 to 33, with an average of 19.8 ± 9.2. The average expected heterozygosity was 0.86 ± 0.11 and the average PIC was 0.83 ± 0.16. Using only three loci, it was possible to discriminate all 192 individuals. The overall probability of identity considering all six EMBRA microsatellite markers combined was lower than 1 in 2 billion. An analysis of the sample size necessary to estimate expected heterozygosity with minimum variance indicated that at least 64 individuals have to be genotyped to characterize this parameter with adequate accuracy for most microsatellites in Eucalyptus. The high degree of multiallelism and the clear and simple codominant Mendelian inheritance of the set of microsatellites used provide an extremely powerful system for the unique identification of Eucalyptus individuals for fingerprinting purposes and parentage testing.
Deoxyribonucleic acid (DNA) polymorphisms provide a powerful tool for quantifying the existing levels of genetic variation in breeding and production populations of forest trees. Molecular markers can be used to estimate the extent of genetic divergence between individuals selected to compose such populations, resolve several issues of individual identity, including varietal protection, and investigate alleged parentage in open-pollinated breeding systems. Several technologies are available today to resolve such questions. In Eucalyptus, dominant markers such as random amplified polymorphic DNA (RAPD) or amplified fragment length polymorphism (AFLP) have been used for clonal fingerprinting of eucalypts (Keil and Griffin 1994), analyses of genetic variation in germplasm banks (Nesbitt et al. 1995) and seed orchards (Marcucci et al. 2003), and estimation of outcrossing rates in breeding populations (Gaiotto et al. 1997). Dominant markers are very limited in their ability to precisely determine parentage and frequently present problems when conclusively establishing absolute identity between two individual trees due to artifact polymorphisms. Dominant markers can readily be used to establish that two individuals are not the same, but the statement that two individuals are identical is usually only approximate and no formal statistics can be attached to this assertion.
A simple polymerase chain reaction (PCR) assay of polymorphisms at simple sequence repeats (SSRs), also known as microsatellites, is certainly the most efficient way of resolving issues of identity. SSRs are typically codominant and multiallelic, with expected heterozygosity frequently greater than 0.7, allowing precise discrimination even of closely related individuals. Due to the specificity of the PCR assay and its high information content, it also allows the determination of identity between individuals based on formal estimates derived from allele frequencies. As microsatellite marker information can be easily shared between laboratories based on primer sequences, interlaboratory comparisons of data are straightforward, improving cooperative efforts in the standardization of fingerprinting data.
Microsatellite markers have been developed for some of the main plantation forest tree species, such as Pinus sylvestris (Kostia et al. 1995), Pinus radiata (Smith and Devey 1994), Quercus (Dow et al. 1995), and poplar (Rahman and Rajora 2002; van der Schoot et al. 2000). Large microsatellite collections are available for Pinus taeda (245 SSRs) (Williams and Auckland 2002) and more than 500 SSRs are being developed based on the first draft of the Populus trichocarpa genome sequence (Tuskan J, personal communication). We have recently reported the development, genetic characterization, and linkage mapping of 70 SSR loci in Eucalyptus grandis and Eucalyptus urophylla and showed their excellent potential for mapping and individual identification (Brondani et al. 1998, 2002). Rapid and reliable typing of perennial crops has been achieved with microsatellites, including grapevines (Thomas et al. 1994), apples (Guilford et al. 1997), Citrus (Kijas et al. 1995), pine (Devey et al. 2002), and poplar (Rahman and Rajora 2002; Rajora and Rahman 2003). Recently interest in this technology has grown even more with the possibility of using multilocus SSR profiles as part of the evidence for the development and registration of new cultivars in annual and perennial crop variety protection (Diwan and Cregan 1997; Dore et al. 2001).
In this study we extend the characterization of microsatellites in Eucalyptus and demonstrate their application for DNA fingerprinting by analyzing the multilocus profiles of 192 individuals of a E. grandis breeding population. Our objectives were (1) to estimate allele frequencies for these six loci in E. grandis; (2) to characterize these SSR loci for several parameters of genetic information content, including number of alleles, expected heterozygosity, polymorphism information content (PIC), power of exclusion, and probability of identity; (3) evaluate the minimum number of individuals necessary to estimate expected heterozygosity for microsatellite loci in E. grandis; and (4) verify the power of discrimination of this set of loci for individual fingerprinting in this breeding population.
| Materials and Methods |
|---|
|
|
|---|
Plant Material and DNA Extraction
Plant material used for DNA extraction was obtained from 192 individual trees selected in provenance/progeny trials established at Aracruz Celulose S.A. (Espírito Santo, Brazil). These trials were established with seeds collected from selected individuals growing in natural populations in Australia and one individual selected from a commercial plantation in Zimbabwe. These 192 individuals constitute one of the long-term breeding populations of E. grandis and are used in several breeding procedures that require correct individual identification, including controlled crosses and seed orchard establishment. Total genomic DNA was extracted from adult leaf tissue following a modified protocol described earlier (Brondani et al. 1998).
Microsatellite Genotyping
Six microsatellite loci developed earlier (Brondani et al. 1998) were selected for this study: EMBRA4, EMBRA5, EMBRA10, EMBRA11, EMBRA15, and EMBRA16. These loci were selected based on the following criteria: (1) ease of interpretation of the amplified alleles in regular silver-stained polyacrylamide gels to allow widespread use by several labs that do not necessarily have access to automated sequencers; (2) minimum stutter of PCR products; (3) high polymorphism based on a preliminary characterization; (4) absence of null alleles as observed in parallel paternity testing studies (Ribeiro V, unpublished data); and (5) map position so as to satisfy the premise of independent segregation to allow the use of the product rule in individual identification. Markers EMBRA11 and EMBRA16 belong to the same linkage group 1, however, they are effectively independent, as their map distance is greater than 50 cM (Brondani et al. 1998). Microsatellite marker amplification and detection were performed as described earlier (Brondani et al. 1998). The amplified products were separated on 4% denaturing polyacrylamide gels stained with silver nitrate (Bassam et al. 1991) and sized by comparison to a 10 bp DNA ladder standard (GIBCO, Rockville, MD) on a computer screen. The three most frequent alleles for loci EMBRA4, EMBRA5, EMBRA10, EMBRA15, and EMBRA16 had their size confirmed by sequencing in an automated DNA sequencer (PerkinElmer/ABI Prism 377) and the information was used to assess the accuracy of allele scoring. Allele sizes were estimated using the software Seqaid II (Rhoads and Roufa 1990), taking into consideration the expected allelic series in base pairs for the locus. Multiplex loading in the same gel was carried out for up to three microsatellite loci simultaneously.
Data Analysis
The gels were scanned and scoring was carried out manually on a computer screen. Based on the estimates of allele frequencies from genotypes at the EMBRA microsatellites in this sample of 192 individuals, the following parameters of genetic information content were estimated: (1) expected heterozygosity; (2) PIC (Botstein et al. 1980); (3) probability of genetic identity (I) (Paetkau et al. 1995), which corresponds to the probability of two random individuals displaying the same genotype; and (4) paternity exclusion probability (Q) (Weir 1996), which corresponds to the power with which a locus excludes an erroneously assigned individual tree from being the parent of an offspring. The combined probability of paternity exclusion, QC = 1[
(1Qi)], and the combined probability of genetic identity, IC =
Ii, were also estimated for the combined battery of loci.
Minimum Sample Size for Estimation of Expected Heterozygosity
In order to evaluate the minimum number of individuals required to adequately estimate expected heterozygosity for the microsatellite markers in E. grandis, the variance of the expected heterozygosity for each locus, var(
el)', was calculated considering samples of size 16, 32, 64, and 128 using the gene diversity variance formula (Weir 1996):
![]() | (1) |
Based on the variance, estimates of the expected gene diversity,
(
el)', the standard error,
el(
el)', and bias, Biasl(
el)', were obtained:
![]() | (2) |
![]() | (3) |
![]() | (4) |
![]() | (5) |
Genetic Distances Among Individuals
Multilocus genotypes for the 192 individuals were used to estimate genetic distance. Considering the heterozygous nature of the genotype data the "shared allele distance" estimator between individuals was used (Bowcock et al. 1994; Jin and Chakraborty 1993):
![]() | (6) |
| Results and Discussion |
|---|
|
|
|---|
Microsatellite Markers
The six EMBRA loci amplified a total of 119 alleles, yielding a minimum of six (EMBRA11) and a maximum of 33 alleles (EMBRA16), with an average of 19.8 ± 9.2 alleles per locus (Table 1). Other similar studies involving the analysis and characterization of microsatellites in Eucalyptus species have also detected loci with highly variable numbers of alleles and, consequently, information content (Brondani et al. 1998; Byrne et al. 1996; Marcucci et al. 2003). Some of this variability is explained by the fact that SSRs may be preferentially located in low-copy transcribed regions of plant genomes (Morgante et al. 2002). Microsatellite EMBRA5 showed the highest expected heterozygosity (He = 0.936) and PIC (= 0.933) values (Table 1), reflecting the large number of alleles observed and similar allele frequency distribution in the population when compared to the other microsatellites (Figure 1). The lowest He and PIC were found in microsatellite EMBRA11 (0.645 and 0.524, respectively). Very similar results were obtained in a study that involved the analysis of four microsatellite loci in a population of 20 unrelated individuals of Eucalyptus nitens, where the expected heterozygosities ranged from 0.72 to 0.91 (Byrne et al. 1996), showing that Eucalyptus microsatellites are usually very informative.
|
|
Probability of Identity and Power of Exclusion
The probability of identity (I) expresses the likelihood of finding two individuals with the same genotype for a certain loci in the population. This probability was very low in most cases, ranging from 0.17 (EMBRA11) to less than 0.01 (EMBRA5). Considering that every locus segregates independently, assuming therefore that they are in linkage equilibrium, the chance of finding identical genotypes in the population when the six EMBRA microsatellites are included in the analysis corresponds to the product of probabilities of the identity of every locus. The combined estimate was 2 x 109, meaning that the chance of finding two individuals with the same genotype in the population is almost null. This power of discrimination was confirmed when we analyzed the whole group of individuals with the microsatellite that showed the highest He and PIC values (EMBRA5) and detected that 43% of them had a unique genotype. When a second microsatellite was included in the analysis (EMBRA16), this proportion increased to 95%. Finally, with a third microsatellite (EMBRA15), all individuals were discriminated. Similar results were obtained with a breeding population of Eucalyptus dunnii, when four microsatellite markers could discriminate nearly all 46 individual trees selected to compose a seed orchard (Marcucci et al. 2003). The power of exclusion (Q) indicates the probability of excluding a nonparent from paternity or maternity. As expected, the power of exclusion was high for all the microsatellites analyzed. It ranged from 0.434 (EMBRA11) to 0.941 (EMBRA4), with an average of 0.774. The combined power of exclusion, which is the exclusion probability considering all six EMBRA loci, was greater than 99.99%, indicating that these loci are appropriate to determine parentage in eucalypts beyond any reasonable doubt.
Accuracy of Estimates of Heterozygosity
EMBRA microsatellites with large numbers of alleles in low frequencies showed a small reduction in the coefficient of variation of the mean-squared error when samples of 64 and 128 individuals were considered, indicating that this size should be adequate for accurate estimations of He. Most of the microsatellites analyzed in this study were included in this situation, with the exception of EMBRA11. To obtain the same accuracy that was obtained with the other loci, more than 128 individuals would have to be analyzed with this microsatellite (Figure 2). These results refer only to the microsatellites that were analyzed in this study and for the population of individuals that were typed. However, considering that microsatellites have a similar pattern of allelic distribution in different species (Valdes et al. 1993), we can infer that several studies involving their characterization may be incurring significant errors, indicating incorrect He values when these were obtained with small samples. A sample size of 64 was considered satisfactory when we analyzed the bias of the He of all microsatellites (Figure 2). It is important to notice that the bias of He always had negative values. This behavior was related to the increasing number of rare alleles that were observed when the sample size was extended, contributing to the detection of higher He values, and was particularly interesting because it showed that He was always underestimated in relation to the population parameter.
|
Genetic Distances Among Individuals
Based on the information obtained at the six microsatellite loci, shared allele distance coefficients were estimated for all the pairwise comparisons of the 192 trees (data not shown). The average distance between individuals was high (0.857) and ranged from 0.333 to 1. Of the 18,336 pairwise distances, there were 3 between 0 and 0.4, 24 between 0.4 and 0.5, and 458 between 0.5 and 0.6, that is, 97% of the pairwise distances between individuals were estimated to be greater than 0.6.
In conclusion, this study demonstrates the power of discrimination of microsatellite markers in E. grandis and their high potential for future use in issues concerning plant variety protection, as well as their application in the assessment of the existing genetic variation, genetic distance, and determination of parentage among individuals of a breeding population. Genetic analysis with microsatellites should be of interest to eucalypt tree breeders as complementary descriptors for varietal protection, particularly of vegetatively propagated elite clones. Interest in the use of microsatellites for this purpose has become even greater after acceptance by the USDA Plant Variety Protection Office of SSR allelic profiles as evidence of the uniqueness of new cultivars (Diwan and Cregan 1997). Microsatellites have been shown to be almost twice as informative as dominant markers (RAPD and AFLP) and much more informative than RFLPs in soybean (Powell et al. 1996), and approximately six times more informative than RAPD and nine times more informative than allozyme in poplar (Rajora and Rahman 2003), being the ideal marker for discriminating individuals and for parentage determinations. Information on the genetic distance among individuals in a breeding population is an important tool for forest tree breeders performing guided crosses, which are more likely to produce more widely segregating progenies and consequently increase the probability of obtaining higher-value transgressive individuals to be propagated as elite clones.
The microsatellites used in this study could be confidently scored using a simple and accessible silver-stained detection technology, allowing widespread use by several laboratories. We are now optimizing these same loci and others in larger multiplex systems based on fluorescent detection. Although less accessible in general, fluorescent detection improves scoring of dinucleotide repeats, as demonstrated for Eucalyptus sieberi (Glaubitz et al. 2001). We expect that these systems will add a significant power of resolution for distinctness, uniqueness, and stability (DUS) tests in the varietal protection of eucalypt clones, especially when closely related individuals are under scrutiny. A database of multilocus genetic profiles of elite clones could be immediately used to implement genetic identity tests by electronic comparison of multilocus profiles between questioned and reference samples in quality control procedures in the context of breeding programs and varietal protection.
| Acknowledgments |
|---|
We would like to thank Fernanda Gaiotto, Rosana P. V. Brondani, and Veridiana J. Ribeiro for their valuable advice. This work was supported by the Brazilian Ministry of Science and Technology through a FVA-FINEP competitive grant in biotechnology (to D.G.), an M.Sc. fellowship (to M.K.), and a research fellowship (to D.G.) from the Brazilian National Research Council CNPq.
| Footnotes |
|---|
Corresponding Editor: Irwin Goldman
Received October 1, 2003
Accepted August 20, 2004
| References |
|---|
|
|
|---|
-
Bassam BJ, Caetano-Anollés G, and Gresshoff PM, 1991. Fast and sensitive silver staining of DNA in polyacrylamide gels. Anal Biochem 196:8083.[CrossRef][Web of Science][Medline]
Botstein D, White RL, Skolnick M, and Davis RW, 1980. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 32:182190.
Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, and Cavalli-Sforza LL, 1994. High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455457.[CrossRef][Medline]
Brondani RPV, Brondani C, and Grattapaglia D, 2002. Towards a genus-wide reference linkage map for Eucalyptus based exclusively on highly informative microsatellite markers. Mol Gen Genomics 267:338347.[CrossRef][Web of Science][Medline]
Brondani RPV, Brondani C, Tarchini R, and Grattapaglia D, 1998. Development, characterization and mapping of microsatellite markers in Eucalyptus grandis and E. urophylla. Theor Appl Genet 97:816827.[CrossRef]
Byrne M, Marques-Garcia MI, Uren T, Smith DS, and Moran GF, 1996. Conservation and genetic diversity of microsatellite loci in the genus Eucalyptus. Aust J Bot 44:331341.[CrossRef]
Devey ME, Bell JC, Uren TL, and Moran GF, 2002. A set of microsatellite markers for fingerprinting and breeding applications in Pinus radiata. Genome 45:984989.[Medline]
Diwan N and Cregan PB, 1997. Automated sizing of fluorescent-labeled simple sequence repeat (SSR) markers to assay genetic variation in soybean. Theor Appl Genet 95:723733.[CrossRef]
Dore C, Dosba F, and Baril C, 2001. ISHS International Symposium on Molecular Markers for Characterizing Genotypes and Identifying Cultivars in Horticulture Acta Hort 546.
Dow BD, Ashley MV, and Howe HF, 1995. Characterization of highly variable (GA/CT)n microsatellites in the bur oak, Quercus macrocarpa. Theor Appl Genet 91:137141.[Web of Science]
Gaiotto FA, Bramucci M, and Grattapaglia D, 1997. Estimation of outcrossing rate in a breeding population of Eucalyptus urophylla with dominant RAPD and AFLP markers. Theor Appl Genet 95:842849.[CrossRef]
Glaubitz JC, Emebiri LC, and Moran GF, 2001. Dinucleotide microsatellites from Eucalyptus sieberi: inheritance, diversity, and improved scoring of single-base differences. Genome 44:10411045.[Medline]
Guilford P, Prakash S, Zhu JM, Rikkerink E, Gardiner S, Basse TT, and Forster R, 1997. Microsatellite in Malus x domestica (apple): abundance, polymorphism and cultivar identification. Theor Appl Genet 94:241248.[CrossRef]
Jin L and Chakraborty R, 1993. Estimation of genetic distance and coefficient of gene diversity from single-probe multilocus DNA fingerprinting data. Mol Biol Evol 11:120127.
Keil M and Griffin AR, 1994. Use of random amplified polymorphic DNA (RAPD) markers in the discrimination and verification of genotypes in Eucalyptus. Theor Appl Genet 89:442450.
Kijas JMH, Fowler JCS, and Thomas MR, 1995. An evaluation of sequence tagged microsatellite site markers for genetic analysis within Citrus and related species. Genome 38:349355.[Medline]
Kostia S, Varvio SL, Vakkari P, and Pulkkinen P, 1995. Microsatellite sequences in a conifer, Pinus sylvestris. Genome 38:12441248.[Medline]
Marcucci Poltri SN, Zelener N, Rodriguez Traverso J, Gelid P, and Hopp HE, 2003. Selection of a seed orchard of Eucalyptus dunnii based on genetic diversity criteria calculated using molecular markers. Tree Physiol 23:625632.[Medline]
Morgante M, Hanafey M, and Powell W, 2002. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet 30:194200.[CrossRef][Web of Science][Medline]
Nesbitt KA, Potts BM, Vaillancourt RE, West AK, and Reid JB, 1995. Partitioning and distribution of RAPD variation in a forest tree species, Eucalyptus globulus (Myrtaceae). Heredity 74:628637.
Paetkau D, Calvert W, Stirling I, and Strobeck C, 1995. Microsatellite analysis of population structure in Canadian polar bears. Mol Ecol 4:347354.[Medline]
Powell W, Morgante M, Andre C, Hanafey M, Vogel J, Tingey S, and Rafalski A, 1996. The comparison of RFLP, RAPD, AFLP and SSR (microsatellite) markers for germplasm analysis. Mol Breed 2:225238.
Rajora OP and Rahman MH, 2003. Microsatellite DNA and RAPD fingerprinting, identification and genetic relationships of hybrid poplar (Populus x canadensis) cultivars. Theor Appl Genet 106:470477.[Medline]
Rahman MH and Rajora OP, 2002. Microsatellite DNA fingerprinting, differentiation, and genetic relationships of clones, cultivars, and varieties of six poplar species from three sections of the genus Populus. Genome 45:10831094.[Medline]
Rhoads DD and Roufa DJ, 1990. Seqaid II 3.80 Manhattan, KS: Molecular Genetics Laboratory, Kansas State University.
Smith DN and Devey ME, 1994. Occurrence and inheritance of microsatellites in Pinus radiata. Genome 37:977983.[Medline]
Thomas MR, Cain P, and Scott NS, 1994. DNA typing of grapevines: a universal methodology and database for describing cultivars and evaluating genetic relatedness. Plant Mol Biol 25:939949.[Web of Science][Medline]
Valdes AM, Slatkin M, and Freimer N, 1993. Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133:737749.[Abstract]
van der Schoot J, Pospiskova M, Vosman B, and Smulders MJM, 2000. Development and characterization of microsatellite markers in black poplar (Populus nigra L.). Theor Appl Genet 101:317322.[CrossRef]
Weir JL, 1996. Genetic data analysis Sunderland, MA: Sinauer Associates.
Williams CG and Auckland L, 2002. Conifer microsatellite handbook College Station: Texas A&M University.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







