Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Carleos, C.
Right arrow Articles by Corral, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Carleos, C.
Right arrow Articles by Corral, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Journal of Heredity 2003:94(2)
© 2003 The American Genetic Association 94:175-179


Brief Communication

Asymptotic Variances of QTL Estimators With Selective DNA Pooling

C. Carleos, J. A. Baro, J. Cañon, and N. Corral

From the Departamento de Estadística, Facultad de Ciencias, C/Calvo Sotelo 33007 Oviedo, Asturias, Spain (Carleos); Departamento de Ciencias Agroforestales, Universidad de Valladolid, Palencia, Spain (Baro); Departamento de Produccion Animal, Universidad Complutense, Madrid, Spain (Cañon); and Departamento de Estadistica, Universidad de Oviedo, Oviedo, Spain (Corral).

Address correspondence to Carlos Carleos at the address above, or e-mail: carleos{at}pinon.ccu.uniovi.es.


    Abstract
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix
 References
 
Investigation on QTL-marker linkage usually requires a great number of observed recombinations, inferred from combined analysis of phenotypes and genotypes. To avoid costly individual genotyping, inferences on QTL position and effects can instead make use of marker allele frequencies. DNA pooling of selected samples makes allele frequency estimation feasible for studies involving large sample sizes. Linkage studies in outbred populations have traditionally exploited half-sib family designs; within the animal production context, half-sibships provide large families that are highly suitable for DNA pooling. Estimators for QTL position and effect have been proposed that make use of information from flanking markers. We present formulas derived by the delta method for the asymptotic variance of these estimators.

The half-sib design is particularly well suited to animal genetics, for both laboratory and breeding species (Georges et al. 1995; Weller et al. 1990). DNA pooling techniques applied to selected samples allow direct estimation of marker allele frequencies within the best and worst performing animals of a class of half-sib progeny; this allows great savings in terms of genotyping and data collection compared to individual genotype determination. This method does have its drawbacks, such as loss of information about joint marker inheritance—allelic frequencies are known but genotypic frequencies are not. Additionally, imprecise values stemming from technical error constitute an additional source of inaccuracy (Lipkin et al. 1998).

Darvasi and Soller (1994) showed how DNA pooling can be combined with selection to find association between a marker and a QTL using a backcross, an F2, or a half-sib family. Dekkers (2000) extended this method to consider two-marker (interval) mapping and to allow the estimation of the QTL position.


    Methods
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix
 References
 
We consider a QTL Q (with alleles Q and q) flanked by two markers (M and N), each with two alleles (M, m and N, n, respectively). A half-sib family design is considered, where the common sire has haplotypes MQN/mqn, while not sharing marker alleles with its mates (backcross-like). The family size is n. The recombination fraction between the markers is {theta}. The recombination rate between M and Q is {theta}M. Progeny receiving the Q allele (respectively, the q allele) from the sire has a phenotypic distribution following an N(µQ,{sigma}) (respectively, an N[µQ,{sigma}]). The parameter of interest is . Within the common framework of selective DNA pooling (Darvasi and Soller 1994), {alpha} is experimentally determined by mixing DNA samples from either the top (upper tail) or worst (lower tail) performing animals for a given trait. Further, marker frequencies are determined within each tail. Information on the overall distribution is assumed sufficient to suppose the grand mean µ and {sigma} known. Algebraic approximations to the variances of the estimators of {theta}M and {alpha} will be given.

Let pUM denote the frequency of progeny in the upper tail that received the allele M, that is, PR[M|U ]. The rest of the subscripts have analogous meanings. The observed proportion corresponding to pum is pum. The estimator of {theta}M based on a single tail (Dekkers 2000) is


where


Let µU and µL be the means of phenotypes in the upper and the lower tails, respectively. Regarding {alpha}, if


then an estimator for the other allele, q, is


where is is the selection intensity associated with a tail of size s (Falconer 1989).

Distribution of the Position Estimator
As seen in the Appendix, the {delta}-method (Bishop et al. 1975) states that {theta}M is approximated by a normal distribution with mean {theta}M and variance-covariance matrix


where


and nM is the observed allele frequencies, being its variance Var(nM) as shown in equation (9).

Distribution of the Effect Estimator
The distribution of phenotypes did not affect the position estimation, except through the probability of QTL alleles in the tails. For the estimation of {alpha}, we assume normal distribution of phenotypes in both groups: offspring inheriting the Q allele, and offspring inheriting the q allele.

Some additional notation is described:

  • µU and are the mean and the variance of the phenotypes above the u threshold; it is easily seen that for an overall normal distribution (µ, {sigma}),


    and (Johnson and Kotz 1970)


    where u' = . In our case, the phenotypic distribution is a mixture of normals, so these expressions should be altered accordingly.

  • µUQ is the mean of offspring above the u threshold inheriting the q allele.

Analogous definitions apply for L and q subscripts.

Let the unobserved sample consist of (Xi, Yi) values, with , where Xi represents the parental QTL allele of individual i:


and Yi is the continuous phenotype with conditional Gaussian distribution:


Let us define:

  • the indicator


    signaling whether the individual i is in the upper tail; analogously, ;

  • the tail size


    the number of individuals in the upper tail; analogously, nL;

  • the estimator of the tail mean


    analogously for µL;

  • the unobserved proportion estimators


    actually estimated by UQ (2); let Uq = 1 - UQ analogously for the lower tail.

The covariance matrix is computed taking into account:

  • Var(µU): it is seen (10):


    and similarly for the variance (11):


    where the notation E[1/U] implies substituting zero for the inverse of nU in the highly unlikely case of an empty upper tail (); adequate bounds are computed as shown in the Appendix (p stands for pUQ):

  • lower bound (Equation 12)



  • upper bound (Equation 13)


    and according to Lynch and Walsh (1998, p. 818)


    Note that this result is not adequate to directly approximate E(1/U).

Despite unavailability of E(1/U), the above bounds show that the approximation achieved with is adequate for (5) with n large enough.

The following relations are obtained by the same reasoning:


Eventually:


The matrix of partial derivatives of the estimator (3) of {alpha} with respect to (µL, µU, pLQ, pUQ) is:


therefore, after application of the {delta}-method,



    Results
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix
 References
 
Simulations were performed to check the adequacy of the proposed approximations. The experimental design and genetic model, both described in the Methods section, led to the results shown in Table 1 for 10,000 iterations. The "Simulation" column displays observed standard deviations of the sampling distribution of the estimator across the 10,000 iterations. The "Predicted" column was obtained from formulas 4 and 6, with parameters replaced with population values, derived from the values chosen for simulation.


View this table:
[in this window]
[in a new window]
 
Table 1.. Variances of position and effect estimators.

 
The study explored several combinations of sample sizes, interval widths, and substitution effects. Unless specified in Table 1, reference values for those parameters are taken: a family of 5,000 half-sibs, an interval 50 cM wide, and a substitution effect {alpha} of 0.5 environmental standard deviations. Upper and lower tails comprise 10% of progeny each.


    Discussion
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix
 References
 
It can be concluded from our results that estimation of the {theta}M variance is accurate for marker brackets wider than 20 cM. Shorter intervals lead to distribution of the position estimator not holding within the parameter space (corresponding to the intermarker gap); thus, tail values agglomerate at boundaries. The presented formulas can be used to compute the probability of erroneously locating a QTL exactly at a marker position.

It was also noted that the proposed approximations degrade when the effect {alpha} exceeds one standard deviation. A likely explanation is the departure from normality of the overall phenotypic distribution when the Q, q mixture components are too separated.

All of the limitations mentioned so far are related to the single fact that variances of estimators are computed making use of asymptotic theory, which notably relies on regularity conditions.

This study did not address issues such as influence of phase for small QTL effects, and narrow marker intervals, on normality of asymptotic distribution of the effect and position estimators. These deserve further study.

Large samples, above 5,000 half-sibs, are required for the proposed formulas to achieve fair results. Study of small sample distributions must account for lack of normality, and inferences should no longer rely on asymptotic theory. Exploration of the behavior of the estimators under small-sample scenarios requires additional research. A novel approach based on resampling methods is being developed by the authors (Carleos et al. 2002).


    Appendix
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix
 References
 
Distribution of the Position Estimator
Assuming fixed selection thresholds, the unobserved absolute genotypic frequencies follow a multinomial distribution:


being nlmn, the absolute frequency of genotype MN in the lower tail; analogously for the other subscripts. Here, C indicates a central class, comprising the individuals not selected.

The "observed" (i.e., estimated by means of DNA pooling) allelic frequencies are:


where counts n carry subscripts indicating tail and allele, and with A, the matrix that relates observed and unobserved absolute frequencies, being


so


The estimator (1) of {theta}M, averaged over the two tails, is rewritten as


The multinomial frequencies (7) can be approximated by a normal distribution. The observed frequencies (8) are a linear transformation of those, so they are asymptotically normal. The estimator {theta}M is a nonlinear function of the observed frequencies, defined on an open subset and differentiable at their expected values, . Should these conditions hold, the {delta}-method states that {theta}M is approximated by a normal distribution with mean {theta}M and variance-covariance matrix


where


and Var(nM) is (9). G can be estimated by:


Expectation of µU


Variance of µU


Appropriate bounding values for the expectation constituting the last factor, E[1/nU] in (11), can be determined as follows (p denotes pUQ):

lower bound


upper bound



    Acknowledgments
 
This work was supported by the European Regional Development Fund project 1FD97-0042.


    Footnotes
 
Corresponding Editor: R. C. Woodruff Back

Received June 10, 2002
Accepted December 31, 2002


    References
 Top
 Abstract
 Methods
 Results
 Discussion
 Appendix
 References
 

    Bishop YMM, Fienberg S, Holland P, 1975. Discrete multivariate analysis. Cambridge: MIT Press.

    Carleos C, Corral N, Baro JA, Cañon J, 2002. Enhanced precision in QTL mapping with selective DNA pooling. 7th World Congress on Genetics Applied in Livestock Production, Montpellier, France.

    Darvasi A, Soller M, 1994. Selective DNA pooling for determination of linkage between a molecular marker and a quantitative trait locus. Genetics. 138:1365-1373.[Abstract]

    Dekkers JCM, 2000. Quantitative trait locus mapping based on selective DNA pooling. J Anim Breed Genet. 117:1-16.[CrossRef]

    Falconer DS, 1989. Introduction to Quantitative Genetics. New York: Longman.

    Georges M, Nielsen D, Mackinnon M, 1995. Mapping quantitative trait loci controlling milk production in dairy cattle by exploiting progeny testing. Genetics. 139:907-920.[Abstract]

    Johnson NL, Kotz S, 1970. Continuous univariate distributions—1. New York: John Wiley and Sons.

    Lipkin E, Mosig MO, Darvasi A, Ezra E, Shalom A, Friedmann A, Soller M, 1998. Quantitative trait locus mapping in dairy cattle by means of selective milk DNA pooling using dinucleotide microsatellite markers: analysis of milk protein percentage. Genetics. 149:1557-1567.[Abstract/Free Full Text]

    Lynch M, Walsh B, 1998. Genetics and analysis of quantitative traits. Sunderland, MA: Sinauer Associates.

    Weller JI, Kashi Y, Soller M, 1990. Power of daughter and granddaughter designs for determining linkage between marker loci and quantitative trait loci in dairy cattle. J Dairy Sci. 73:2525-2537.[Abstract]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
A. Korol, Z. Frenkel, L. Cohen, E. Lipkin, and M. Soller
Fractioned DNA Pooling: A New Cost-Effective Strategy for Fine Mapping of Quantitative Trait Loci
Genetics, August 1, 2007; 176(4): 2611 - 2623.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (2)
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Carleos, C.
Right arrow Articles by Corral, N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Carleos, C.
Right arrow Articles by Corral, N.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?