Skip Navigation


Journal of Heredity Advance Access originally published online on December 23, 2004
Journal of Heredity 2005 96(2):85-88; doi:10.1093/jhered/esi017
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
96/2/85    most recent
esi017v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by DeWoody, Y. D.
Right arrow Articles by DeWoody, J. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by DeWoody, Y. D.
Right arrow Articles by DeWoody, J. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The American Genetic Association. 2004. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org

On the Estimation of Genome-wide Heterozygosity Using Molecular Markers

Y. D. DeWoody, and J. A. DeWoody

From the Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN 47907–1159

Address correspondence to Andrew DeWoody at the address above, or e-mail: dewoody{at}purdue.edu.


    Abstract
 Top
 Abstract
 Appendix
 References
 
Coltman and Slate (2003) recently performed a meta-analysis on studies that investigated the association between genetic variation at microsatellite loci and phenotypic trait variation. One factor not explicitly addressed in their meta-analysis is the actual estimation of genome-wide heterozygosity via molecular markers. Many authors still associate marker-estimated heterozygosity with genome-wide heterozygosity, despite allozyme-based evidence that such correlations are usually very weak or nonexistent. Here, we show that genome-wide heterozygosity is poorly estimated not only by allozymes but also by microsatellite loci and by single-nucleotide polymorphisms (SNPs). Thus, associations between fitness (or other phenotypes) and heterozygosity should be established firmly on causative factors and not on simple correlations.


Correlations between evolutionary fitness and zygosity at marker loci have been documented in a wide variety of organisms (Coltman and Slate 2003). In general, the idea of a heterozygote advantage (i.e., overdominance) has received considerable support (Mitton 1997). Specifically, heterozygosity-fitness correlations fall into one of three primary categories (David 1998; Hansson and Westerberg 2002): (1) the "direct effect" hypothesis claims that a heterozygote advantage is specifically due to the assayed loci (e.g., enzyme polymorphisms); (2) the "local effect" hypothesis claims that marker loci are closely linked to fitness loci; and (3) the "general effect" hypothesis claims that a heterozygote advantage is conveyed not by the scored loci (or tightly linked loci) but by genome-wide effects. Here, the primary focus is on heterozygosity-fitness correlations (HFCs) due to the general effect—that is, genomic zygosity.

The mean level of individual heterozygosity across all loci in the genome is a parameter, H, that can be estimated with a suite of molecular markers. For example, heterozygosity can be measured at each of 20 allozymes, and the mean heterozygosity across these 20 loci can be represented as h. Molecular markers can be used to search for genomic heterozygosity-fitness correlations (HFCs) if h provides a robust estimate of H.

Genomic (i.e., genome-wide) HFCs are often reported in the literature, and this is somewhat surprising—not necessarily because heterozygosity and fitness are unrelated, but because of known problems in estimating H using only a few molecular markers. Twenty-five years ago, Mitton and Pierce (1980) used computer simulations to show that correlations between H and individual heterozygosity as estimated by molecular markers (h) are generally quite low. Shortly thereafter, Chakraborty (1981) provided an analytical formula to calculate the expected correlation between the parameter H and statistic h; he too found that genome-wide heterozygosity is poorly estimated using a suite of < 20 independent loci.

Although the original work of Mitton and Pierce (1980) and Chakraborty (1981) was based on conventional protein markers such as allozymes, their results are applicable to all kinds of molecular markers. Assuming mutation-drift equilibrium, Chakraborty (1981) showed that the expected correlation (designated {rho}) between H and h can be calculated as:

(1)
where n is the number of loci in the genome, r is the number of markers assayed, and h is the average heterozygosity of the markers assayed. How does the correlation {rho} respond to changes in h and/or r? Clearly, these variables differ with regard to assorted marker systems. Below, we compare microsatellites and single-nucleotide polymorphisms to allozymes.

Mean heterozygosity in vertebrates is an order of magnitude higher at microsatellite loci than at allozyme loci (DeWoody and Avise 2000). Unfortunately, correlations between H and h actually decline slightly as heterozygosity increases (Figure 1). For example, a sample of 20 homozygous markers (i.e., mean h = 0.0) from a genome consisting of 50,000 genes produces an expected correlation between H and h of 0.0245, whereas 20 heterozygous markers (i.e., mean h = 1.0) produce an expected correlation of 0.0200 (Chakraborty 1981). This means that, on average, 20 allozyme markers will give marginally better estimates of H than will 20 microsatellite markers. Thus, genome-wide HFCs may be slightly stronger (albeit still generally tenuous) in allozyme studies than in microsatellite studies.



View larger version (13K):
[in this window]
[in a new window]
 
Figure 1.. The expected correlation coefficient ({rho}) as a function of effort (r/n) when the estimated heterozygosity h = 0.0, 0.5, and 1.0. The correlation parameter {rho} is between the parameter H (true heterozygosity averaged across all loci in the genome) and the statistic h (estimated heterozygosity averaged across all loci surveyed). Clearly, {rho} is primarily determined by the ratio of r (the number of markers assayed) to n (the number of loci in the genome), whereas h has relatively little influence on {rho}. The exact relationship between {rho}, r, n, and h is given by Equation 1, but here we use the general approximation (r – 1)/(n – 1) {approx} r/n to express {rho} strictly as a function of the ratio r/n and h. Note that the correlation coefficient is 1.0 when the r/n ratio is 1.0.

 
In theory, the correlation ({rho}) between h and H depends on n, r, and h; but in practice, {rho} is determined primarily by effort (r/n). It can be shown analytically that {rho} is a decreasing function of heterozygosity; {rho} is largest when h = 0 and smallest when h = 1 (Figure 1). That is, the minimal correlation between h and H occurs when h = 1:

(2)
When n is reasonably large (as for most vertebrate genomes), the ratio r/n approximates (r – 1)/(n – 1). Thus, the maximum expected correlation when h = 0 can be expressed strictly as a function of effort (r/n):

(3)
When r/n is small, the correlation can be approximated closely with just the first-order term (i.e., effort); and thus irregardless of heterozygosity, {rho} remains tightly constrained by effort:

Heterozygosity (h) and effort (r/n) differ not only between allozymes and microsatellites, but also with regard to single-nucleotide polymorphisms (SNPs). SNPs are potentially attractive for correlating heterozygosity with fitness because their average heterozygosity is quite low, while effort is high relative to allozymes and microsatellites. SNPs are usually biallelic, and the allele frequencies typically are skewed so that one allele is rare ("minor") and one ("major") common (Glaubitz et al. 2003; Marth et al. 2001). Minor allele frequencies often range from 0.01 to 0.20; and thus, expected heterozygosity under Hardy-Weinberg equilibrium conditions ranges from about 2% to 32%. This is significantly lower than expected heterozygosity at a typical microsatellite locus but roughly equivalent to allozyme loci. However, for the purpose of heterozygosity-fitness association studies, the main advantage to SNPs is the number of loci that can be genotyped.

SNPs now can be assayed at hundreds or even thousands of loci in model organisms (Kwok 2001). This could, in principle, make SNPs very attractive, because the correlation between H and h is dictated primarily by the proportion of loci sampled. Unfortunately for empiricists, Figure 2 shows that the correlation between h and H is weak when the r/n ratio falls below 0.1 and becomes even more tenuous when the r/n ratio drops below 0.01. This means that for a genome with ~30,000 genes (as in humans), a herculean survey of 3,000 markers will produce a modest correlation of less than 0.40. More realistic surveys of a few dozen markers in organisms with similar genome sizes results in correlation coefficients < 0.05 (Figure 2).



View larger version (16K):
[in this window]
[in a new window]
 
Figure 2.. The relationship between marker-estimated heterozygosity (h) and the correlation coefficient {rho} for biologically realistic ratios of r/n. For example, a sample of 30 independent loci from a genome consisting of 30,000 genes gives an r/n ratio of 0.001.

 
Given the weak correlations between h and true genome-wide H, how then can we explain HFCs generated from molecular markers? Widespread reporting of genomic HFCs is due in part to publication bias (see Coltman and Slate 2003). In truth, most researchers have reported "hfCs," correlations between marker-estimated heterozygosity (h) and some correlate of fitness (f) (Figure 3). In terms of the three different hypotheses considered by Hansson and Westerberg (2002), the "direct effect" and the "local effect" rely on correlations between h and the statistic f or the parameter F (overall fitness), whereas the "general effect" relies on correlations between H and f or F. In terms of Figure 3, the null hypothesis for a direct or local effect would be that {rho}2 or 5 = 0, whereas the null for the general effect hypothesis would be that {rho}4 or 6 = 0.



View larger version (9K):
[in this window]
[in a new window]
 
Figure 3.. The relationships among genome-wide heterozygosity (H), marker-estimated heterozygosity (h), a fitness correlate (f), and overall fitness (F). True correlation coefficients are denoted by the parameter {rho}, whereas empirically derived "sample" correlation coefficients are denoted by the statistic r. Note that {rho}1 is defined by Equation 1. See Appendix for details.

 
Empirical correlations may be reasonable under the "direct effect" hypothesis for protein-coding or SNP loci, but they are much less tenable for microsatellites or other neutral markers. The "local effect" hypothesis, however, is plausible for any marker system—but its advocates face the burden of ruling out the unpalatable possibility of spurious correlation. (Avoiding spurious correlations is not impossible, but this requires a rigorous experimental design incorporating many markers of various types and, often, an accurate pedigree; see Hansson et al. (2004) for an exceptional example of a "local effect".) The "general effect" hypothesis is supported only when "hfCs" accurately represent "HFCs".

Given the poor correlation between h and H, the sampled correlation (r2) between h and f must be very strong to detect a significant population-level correlation ({rho}4) between H and F. If we assume, for simplicity's sake, that the statistic f accurately represents the parameter F, and yet we account for the poor correlation between h and H, then the sample correlation between H and f can be estimated as r6 {approx} r2{rho}1 (see Appendix). The situation deteriorates rapidly as we further extrapolate to sample correlations between genome-wide heterozygosity (H) and overall fitness (F). We are left to conclude that most published "HFCs" are in truth "hfCs" and that the discrepancy between the two is largely due to the difference between marker-based heterozygosity and genome-wide heterozygosity.

In summary, our ability to detect significant heterozygosity-fitness correlations is constrained by our ability to estimate genome-wide heterozygosity. This is unlikely to change until we can develop high-density genetic maps that reflect recombination rates and subsequently select markers based on their genomic distribution (e.g., haplotype blocks; see Wall and Pritchard [2003]). For those who work on nonmodel organisms, the prospects for estimating individual genomic heterozygosity with a few randomly distributed molecular markers remain bleak.


    Appendix
 Top
 Abstract
 Appendix
 References
 
As in Figure 3, to test for a significant correlation between genome-wide heterozygosity and a correlate of fitness (Ho: {rho}6 = 0), one needs to compute the sample correlation (r6) between H and f:

(4)
where One can estimate SHH and SHf by utilizing the expected correlation between H and h ({rho}1 given by equation [1]) and the asymptotic result that the least squared regression line approaches the identity line (i.e., h = H). Therefore, the least squared regression slope, b = SHh/Shh, converges to unity as the number of markers assayed approaches the number of loci in the genome. It now follows that

This implies that the following estimates are both reasonable and asymptotically true. Substitution of these estimates into equation (4) gives

Thus, r2 must be weighted by {rho}1 in order to reflect genome-wide heterozygosity.


    Acknowledgments
 
We thank J. Avise, D. Bos, J. Busch, J. Glaubitz, J. Rudnick, D. Triant, S. Turner, and R. Williams for their input. This research was supported in part by Purdue University and by a U.S. Department of Agriculture National Research Initiative grant (#2003-03616). This is publication #ARP17260 from the School of Agriculture at Purdue University.


    Footnotes
 
Corresponding Editor: Brian Bowen

Received June 1, 2004
Accepted August 15, 2004


    References
 Top
 Abstract
 Appendix
 References
 

    Chakraborty R, 1981. The distribution of the number of heterozygous loci in an individual in natural populations. Genetics 98:461–466.[Free Full Text]

    Coltman DW and Slate J, 2003. Microsatellite measures of inbreeding: a meta-analysis. Evolution 57:971–983.[CrossRef][Web of Science][Medline]

    David P, 1998. Heterozygosity-fitness correlations: new perspectives on old problems. Heredity 80:531–537.

    DeWoody JA and Avise JC, 2000. Microsatellite variation in marine, freshwater, and anadromous fishes compared with other animals. J Fish Biol 56:461–473.[CrossRef]

    Glaubitz JC, Rhodes OE, and DeWoody JA, 2003. Prospects for inferring pairwise relationships with single nucleotide polymorphisms. Mol Ecol 12:1039–1047.[CrossRef][Medline]

    Hansson B and Westerberg L, 2002. On the correlation between heterozygosity and fitness in natural populations. Mol Ecol 11:2467–2474.[CrossRef][Medline]

    Hansson B, Westerdahl H, Hasselquist D, Akesson M, and Bensch S, 2004. Does linkage disequilibrium generate heterozygosity-fitness correlations in great reed warblers?. Evolution 58:870–879.[CrossRef][Web of Science][Medline]

    Kwok P-Y, 2001. Methods for genotyping single nucleotide polymorphisms. Annu Rev Genomics Hum Genet 2:235–258.[CrossRef][Web of Science][Medline]

    Marth G, Yeh R, Minton M, Donaldson R, Li Q, Duan SG, Davenport R, Miller RD, and Kwok P-Y, 2001. Single-nucleotide polymorphisms in the public domain: how useful are they?. Nat Genet 27:371–372.[CrossRef][Web of Science][Medline]

    Mitton JB and Pierce BA, 1980. The distribution of individual heterozygosity in natural populations. Genetics 95:1043–1054.[Abstract/Free Full Text]

    Mitton JB, 1997. Selection in natural populations Oxford: Oxford University Press.

    Wall JD, and Pritchard JK, 2003. Assessing the performance of the haplotype block model of linkage disequilibrium. Am J Hum Genet 73:502–515.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J HeredHome page
J. A. Ivy, A. Miller, R. C. Lacy, and J. A. DeWoody
Methods and Prospects for Using Molecular Data in Captive Breeding Programs: An Empirical Example Using Parma Wallabies (Macropus parma)
J. Hered., July 1, 2009; 100(4): 441 - 454.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
A. K. Townsend, A. B. Clark, K. J. McGowan, E. L. Buckles, A. D. Miller, and I. J. Lovette
Disease-mediated inbreeding depression in a large, open population of cooperative crows
Proc R Soc B, June 7, 2009; 276(1664): 2057 - 2064.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
J. M. Aparicio, J. Ortego, and P. J. Cordero
Can a Simple Algebraic Analysis Predict Markers-Genome Heterozygosity Correlations?
J. Hered., January 1, 2007; 98(1): 93 - 96.
[Abstract] [Full Text] [PDF]


Home page
Behav EcolHome page
P. M. Waser and J. A. De Woody
Multiple paternity in a philopatric rodent: the interaction of competition and choice
Behav. Ecol., November 1, 2006; 17(6): 971 - 978.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
96/2/85    most recent
esi017v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by DeWoody, Y. D.
Right arrow Articles by DeWoody, J. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by DeWoody, Y. D.
Right arrow Articles by DeWoody, J. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?