Journal of Heredity 2003:94(2)
© 2003 The American Genetic Association 94:175-179
Brief Communication |
Asymptotic Variances of QTL Estimators With Selective DNA Pooling
From the Departamento de Estadística, Facultad de Ciencias, C/Calvo Sotelo 33007 Oviedo, Asturias, Spain (Carleos); Departamento de Ciencias Agroforestales, Universidad de Valladolid, Palencia, Spain (Baro); Departamento de Produccion Animal, Universidad Complutense, Madrid, Spain (Cañon); and Departamento de Estadistica, Universidad de Oviedo, Oviedo, Spain (Corral).
Address correspondence to Carlos Carleos at the address above, or e-mail: carleos{at}pinon.ccu.uniovi.es.
| Abstract |
|---|
|
|
|---|
Investigation on QTL-marker linkage usually requires a great number of observed recombinations, inferred from combined analysis of phenotypes and genotypes. To avoid costly individual genotyping, inferences on QTL position and effects can instead make use of marker allele frequencies. DNA pooling of selected samples makes allele frequency estimation feasible for studies involving large sample sizes. Linkage studies in outbred populations have traditionally exploited half-sib family designs; within the animal production context, half-sibships provide large families that are highly suitable for DNA pooling. Estimators for QTL position and effect have been proposed that make use of information from flanking markers. We present formulas derived by the delta method for the asymptotic variance of these estimators.
The half-sib design is particularly well suited to animal genetics, for both laboratory and breeding species (Georges et al. 1995; Weller et al. 1990). DNA pooling techniques applied to selected samples allow direct estimation of marker allele frequencies within the best and worst performing animals of a class of half-sib progeny; this allows great savings in terms of genotyping and data collection compared to individual genotype determination. This method does have its drawbacks, such as loss of information about joint marker inheritanceallelic frequencies are known but genotypic frequencies are not. Additionally, imprecise values stemming from technical error constitute an additional source of inaccuracy (Lipkin et al. 1998).
Darvasi and Soller (1994) showed how DNA pooling can be combined with selection to find association between a marker and a QTL using a backcross, an F2, or a half-sib family. Dekkers (2000) extended this method to consider two-marker (interval) mapping and to allow the estimation of the QTL position.
| Methods |
|---|
|
|
|---|
We consider a QTL Q (with alleles Q and q) flanked by two markers (M and N), each with two alleles (M, m and N, n, respectively). A half-sib family design is considered, where the common sire has haplotypes MQN/mqn, while not sharing marker alleles with its mates (backcross-like). The family size is n. The recombination fraction between the markers is
. The recombination rate between M and Q is
M. Progeny receiving the Q allele (respectively, the q allele) from the sire has a phenotypic distribution following an N(µQ,
) (respectively, an N[µQ,
]). The parameter of interest is
. Within the common framework of selective DNA pooling (Darvasi and Soller 1994),
is experimentally determined by mixing DNA samples from either the top (upper tail) or worst (lower tail) performing animals for a given trait. Further, marker frequencies are determined within each tail. Information on the overall distribution is assumed sufficient to suppose the grand mean µ and
known. Algebraic approximations to the variances of the estimators of
M and
will be given.
Let pUM denote the frequency of progeny in the upper tail that received the allele M, that is, PR[M|U ]. The rest of the subscripts have analogous meanings. The observed proportion corresponding to pum is pum. The estimator of
M based on a single tail (Dekkers 2000) is
|
|
|
|
Let µU and µL be the means of phenotypes in the upper and the lower tails, respectively. Regarding
, if
|
|
|
|
Distribution of the Position Estimator
As seen in the Appendix, the
-method (Bishop et al. 1975) states that
M is approximated by a normal distribution with mean
M and variance-covariance matrix
|
|
|
|
Distribution of the Effect Estimator
The distribution of phenotypes did not affect the position estimation, except through the probability of QTL alleles in the tails. For the estimation of
, we assume normal distribution of phenotypes in both groups: offspring inheriting the Q allele, and offspring inheriting the q allele.
Some additional notation is described:
- µU and
are the mean and the variance of the phenotypes above the u threshold; it is easily seen that for an overall normal distribution
(µ,
),
and (Johnson and Kotz 1970)
where u' =
. In our case, the phenotypic distribution is a mixture of normals, so these expressions should be altered accordingly.
- µUQ is the mean of offspring above the u threshold inheriting the q allele.
Analogous definitions apply for L and q subscripts.
Let the unobserved sample consist of (Xi, Yi) values, with
, where Xi represents the parental QTL allele of individual i:
|
|
|
|
Let us define:
- the indicator
signaling whether the individual i is in the upper tail; analogously,
;
- the tail size
the number of individuals in the upper tail; analogously, nL;
- the estimator of the tail mean
analogously for µL;
- the unobserved proportion estimators
actually estimated by
UQ (2); let
Uq = 1 -
UQ analogously for the lower tail.
The covariance matrix
is computed taking into account:
- Var(µU): it is seen (10):
and similarly for the variance (11):
where the notation E[1/
U] implies substituting zero for the inverse of nU in the highly unlikely case of an empty upper tail (
); adequate bounds are computed as shown in the Appendix (p stands for pUQ):
- lower bound (Equation 12)

- upper bound (Equation 13)
and according to Lynch and Walsh (1998, p. 818)
Note that this result is not adequate to directly approximate E(1/
U).
Despite unavailability of E(1/
U), the above bounds show that the approximation achieved with
is adequate for (5) with n large enough.
The following relations are obtained by the same reasoning:
|
|
|
|
with respect to (µL, µU, pLQ, pUQ) is:
|
|
-method,
|
|
| Results |
|---|
|
|
|---|
Simulations were performed to check the adequacy of the proposed approximations. The experimental design and genetic model, both described in the Methods section, led to the results shown in Table 1 for 10,000 iterations. The "Simulation" column displays observed standard deviations of the sampling distribution of the estimator across the 10,000 iterations. The "Predicted" column was obtained from formulas 4 and 6, with parameters replaced with population values, derived from the values chosen for simulation.
|
The study explored several combinations of sample sizes, interval widths, and substitution effects. Unless specified in Table 1, reference values for those parameters are taken: a family of 5,000 half-sibs, an interval 50 cM wide, and a substitution effect
of 0.5 environmental standard deviations. Upper and lower tails comprise 10% of progeny each. | Discussion |
|---|
|
|
|---|
It can be concluded from our results that estimation of the
M variance is accurate for marker brackets wider than 20 cM. Shorter intervals lead to distribution of the position estimator not holding within the parameter space (corresponding to the intermarker gap); thus, tail values agglomerate at boundaries. The presented formulas can be used to compute the probability of erroneously locating a QTL exactly at a marker position.
It was also noted that the proposed approximations degrade when the effect
exceeds one standard deviation. A likely explanation is the departure from normality of the overall phenotypic distribution when the Q, q mixture components are too separated.
All of the limitations mentioned so far are related to the single fact that variances of estimators are computed making use of asymptotic theory, which notably relies on regularity conditions.
This study did not address issues such as influence of phase for small QTL effects, and narrow marker intervals, on normality of asymptotic distribution of the effect and position estimators. These deserve further study.
Large samples, above 5,000 half-sibs, are required for the proposed formulas to achieve fair results. Study of small sample distributions must account for lack of normality, and inferences should no longer rely on asymptotic theory. Exploration of the behavior of the estimators under small-sample scenarios requires additional research. A novel approach based on resampling methods is being developed by the authors (Carleos et al. 2002).
| Appendix |
|---|
|
|
|---|
Distribution of the Position Estimator
Assuming fixed selection thresholds, the unobserved absolute genotypic frequencies follow a multinomial distribution:
|
|
The "observed" (i.e., estimated by means of DNA pooling) allelic frequencies are:
|
|
|
|
|
|
The estimator (1) of
M, averaged over the two tails, is rewritten as
|
|
M is a nonlinear function of the observed frequencies, defined on an open subset and differentiable at their expected values,
. Should these conditions hold, the
-method states that
M is approximated by a normal distribution with mean
M and variance-covariance matrix
|
|
|
|
|
|
|
|
|
|
lower bound
|
|
|
|
| Acknowledgments |
|---|
This work was supported by the European Regional Development Fund project 1FD97-0042.
| Footnotes |
|---|
Corresponding Editor: R. C. Woodruff
Received June 10, 2002
Accepted December 31, 2002
| References |
|---|
|
|
|---|
-
Bishop YMM, Fienberg S, Holland P, 1975. Discrete multivariate analysis. Cambridge: MIT Press.
Carleos C, Corral N, Baro JA, Cañon J, 2002. Enhanced precision in QTL mapping with selective DNA pooling. 7th World Congress on Genetics Applied in Livestock Production, Montpellier, France.
Darvasi A, Soller M, 1994. Selective DNA pooling for determination of linkage between a molecular marker and a quantitative trait locus. Genetics. 138:1365-1373.[Abstract]
Dekkers JCM, 2000. Quantitative trait locus mapping based on selective DNA pooling. J Anim Breed Genet. 117:1-16.[CrossRef]
Falconer DS, 1989. Introduction to Quantitative Genetics. New York: Longman.
Georges M, Nielsen D, Mackinnon M, 1995. Mapping quantitative trait loci controlling milk production in dairy cattle by exploiting progeny testing. Genetics. 139:907-920.[Abstract]
Johnson NL, Kotz S, 1970. Continuous univariate distributions1. New York: John Wiley and Sons.
Lipkin E, Mosig MO, Darvasi A, Ezra E, Shalom A, Friedmann A, Soller M, 1998. Quantitative trait locus mapping in dairy cattle by means of selective milk DNA pooling using dinucleotide microsatellite markers: analysis of milk protein percentage. Genetics. 149:1557-1567.
Lynch M, Walsh B, 1998. Genetics and analysis of quantitative traits. Sunderland, MA: Sinauer Associates.
Weller JI, Kashi Y, Soller M, 1990. Power of daughter and granddaughter designs for determining linkage between marker loci and quantitative trait loci in dairy cattle. J Dairy Sci. 73:2525-2537.[Abstract]
This article has been cited by other articles:
![]() |
A. Korol, Z. Frenkel, L. Cohen, E. Lipkin, and M. Soller Fractioned DNA Pooling: A New Cost-Effective Strategy for Fine Mapping of Quantitative Trait Loci Genetics, August 1, 2007; 176(4): 2611 - 2623. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


















