Journal of Heredity Advance Access published online on June 9, 2008
Journal of Heredity, doi:10.1093/jhered/esn049
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Experimental Designs and Statistical Methods for Mapping Quantitative Trait Loci Underlying Triploid Endosperm Traits without Maternal Genetic Variation
From the College of Crop Science, Fujian Agriculture & Forestry University, Fuzhou, Fujian 350002, People's Republic of China (Wen and Wu); the College of Computer and Information Science, Fujian Agriculture & Forestry University, Fuzhou, Fujian 350002, People's Republic of China (Wen); and the College of Agriculture & Biotechnology, Zhejiang University, Hangzhou, Zhejiang 310029, People's Republic of China (Wu)
Address correspondence to W. Wu at the address above, or e-mail: wrwu2005{at}yahoo.com.cn.
Many endosperm traits are related to grain quality in cereal crops. Endosperm traits are mainly controlled by the endosperm genome but may be affected by the maternal genome. Studies have shown that maternal genotypic variation could greatly influence the estimation of the direct effects of quantitative trait loci (QTLs) underlying endosperm traits. In this paper, we propose methods of interval mapping of endosperm QTLs using seeds of F2 or BC1 (an equal mixture of F1 x P1 and F1 x P2 with F1 as the female parent) derived from a cross between 2 pure lines (P1 x P2). The most significant advantage of our experimental designs is that the maternal effects do not contribute to the genetic variation of endosperm traits and therefore the direct effects of endosperm QTLs can be estimated without the influence of maternal effects. In addition, the experimental designs can greatly reduce environmental variation because a few F1 plants grown in a small block of field will produce sufficient F2 or BC1 seeds for endosperm QTL analysis. Simulation studies show that the methods can efficiently detect endosperm QTLs and unbiasedly estimate their positions and effects. The BC1 design is better than the F2 design.
Grains of cereal crops are human's staple food and nutrition resource. Improving grain quality is a major goal of cereal breeding. The grain quality depends on many traits in the endosperm (e.g., amylose content, gelatinization temperature, and protein content of rice), which are usually quantitatively inherited. To improve the grain quality, understanding the genetic basis of these traits through the mapping of underlying quantitative trait loci (QTLs) becomes increasingly important to cereal breeding. Studies have shown that endosperm traits are mainly controlled by the triploid endosperm genome but may also be affected by the diploid genome of maternal plants (Zhu and Weir 1994; Shi et al. 1996, 1999, 2000). Hence, a QTL for endosperm traits (termed as endosperm QTL) may exhibit effects from the endosperm genome (direct effects) and/or from the maternal genome (maternal effects).
Because endosperms are triploid, there are 4 possible genotypes for a QTL with 2 different alleles (say, Q and q) in the endosperm genome, namely, QQQ, QQq, Qqq, and qqq. Among them, there are 2 heterozygous genotypes. Hence, apart from the direct additive effect (allele substitution effect), there are 2 types of direct dominance effect (deviation from additive expectation), named as the first dominance effect (for QQq) and the second dominance effect (for Qqq), respectively (Mo 1987). So, endosperm traits have different genetic pattern from diploid traits. To analyze endosperm QTLs, statistical methods based on triploid models are needed.
Several methods for endosperm QTL mapping based on triploid models have been developed in recent years. Most of the methods proposed are based on experimental designs in which seeds are phenotyped individually (Wu, Lou, et al. 2002; Wu, Ma, et al. 2002; Xu et al. 2003; Hu and Xu 2005; Wen and Wu 2007); fewer are based on experimental designs in which seeds are phenotyped in bulk (Xu et al. 2003; Wen and Wu 2006). This is probably because bulk phenotyping could not provide sufficient information for complete partition of the direct effects and maternal effects of an endosperm QTL (Wen and Wu 2006). In other words, to separate the direct effects from the maternal effects of an endosperm QTL, seeds have to be phenotyped individually. Fortunately, convenient technologies for measuring endosperm traits of single seeds without damaging the seeds such as single-kernel near infrared measurement have been developed and extensively applied in practical studies (Bramble et al. 2002, 2006; Iwami et al. 2005; Armstrong 2006).
The earlier proposed methods adopting the strategy of single-seed phenotyping do not take maternal effects into account (Wu, Lou, et al. 2002; Wu, Ma, et al. 2002; Xu et al. 2003). However, simulation studies have shown that the estimates of endosperm QTL effects obtained by a model ignoring maternal effects may be biased when maternal effects exist (Hu and Xu 2005). To solve this problem, Hu and Xu (2005) proposed a method on the basis of Xu et al. (2003) by adding maternal effects into the model. However, their simulation studies showed that whereas the direct and maternal additive effects of an endosperm QTL could be well estimated when the QTL's heritability is high and the sample size is large, the precisions of estimates of both direct and maternal dominance effects are very low in general. This might be largely because the direct effects and the maternal effects of an endosperm QTL are highly correlated. To improve the estimation of endosperm QTL effects (especially dominance effects), we recently proposed a method based on a 2-stage hierarchical experimental design (Wu, Ma, et al. 2002; Wen and Wu 2007). Our simulation studies showed that with additional marker information from offspring embryos, estimates of all sorts of QTL effects are significantly improved; however, the estimates of both direct and maternal dominance effects are still not quite satisfactory. This suggests that direct dominance effects of endosperm QTLs could not be well estimated when maternal effects exist.
In this paper, we propose methods for endosperm QTL analysis based on 2 experimental designs using F2 seeds (named F2 design) and BC1 seeds (BC1 design), respectively. The most significant advantage of our methods is that maternal effects do not contribute to the endosperm trait variation in F2 and BC1 seeds, and so the direct effects of endosperm QTLs could be estimated without the influence of maternal effects.
| Materials and Methods |
|---|
|
|
|---|
Experimental Designs
Suppose seeds of F2 or BC1 (an equal mixture of the progeny of F1 x P1 [denoted as BC1.1] and F1 x P2 [denoted as BC1.2] with F1 as the female parent) are derived from a cross between 2 pure lines (P1 x P2). The embryo (or the plant developed from the embryo) of each seed is assayed for marker genotypes, and the corresponding endosperm is phenotyped for endosperm traits. For simplicity, we assume that all markers are codominant.
Genetic Models
For F2 or BC1 seeds, there is no genetic segregation among the maternal (F1) plants. Therefore, the QTL effects displayed in F2 or BC1 endosperms would only come from the endosperm genome. Consider a QTL with 2 different alleles (say, Q and q). There are 4 possible genotypes with equal frequencies in F2 or BC1 endosperms, namely, QQQ (denoted by g1), QQq (g2), Qqq (g3), and qqq (g4). Let a, d1, and d2 are the direct additive, first dominance and second dominance effects of the QTL, respectively. Thus, the single-QTL model would be
|
| (1) |

, and z
are dummy variables that define additive and dominance effects by taking values depending on the QTL genotype of the ith endosperm (Table 1); and oi is residual error following a normal distribution N(0, 

).
|
According to model (1), yi is assumed to follow a normal distribution N(µj,

) when the genotype of the putative QTL is gj (j = 1, 2, 3, 4). However, because the genotype of the putative QTL is unknown, we have to assume that yi follows a mixture distribution consisting of 4 component distributions:
|
| (2) |

), respectively, and pij is the proportion of the jth component distribution given the information of flanking marker genotypes (Table 2). The mixture distribution model provides the basis of QTL interval mapping (Lander and Botstein 1989).
|
Interval Mapping
The principle of interval mapping is to use 2 adjacent markers to test for the existence of a putative QTL in the interval by performing a likelihood ratio test at every position in the interval (Lander and Botstein 1989). The mixture distribution (2) contains a set of 6 unknown parameters, namely,
![]() |
The EM algorithm is an iterative procedure beginning from a set of initial values of the parameters to be estimated. Each iteration cycle contains an E step and an M step. The E-step is to calculate
, which is an n x 4 matrix consisting of elements {Pij}, where Pij is the posterior probability of gj in the ith F2 or BC1 seed for a given position inside the interval being tested, defined as
![]() |
The M-step is to solve the following equations under given
:
|
|
|
|
The following log-likelihood ratio statistic can be used to test the existence of the putative QTL:
|
|
0 (corresponding the full model, i.e., model [1]), respectively. The significance threshold of log of odds (LOD) score can be estimated via simulation or permutation tests (Churchill and Doerge 1994).
After a putative QTL is mapped, we can further test the first and the second dominance effects of the QTL either jointly or individually based on the following hypotheses:
![]() |
Under these hypotheses, the full model (model [1]) will be reduced as
![]() |
|
|
Simulation Studies
To examine the feasibility and efficiency of our method, we performed 2 parts of simulation studies. In part I, we assumed that a chromosome was 100 cM in length with 11 evenly spaced markers and a QTL located at 55 cM, of which a = 1.0, d1 = 0.8, d2 = 0.5. Three sample sizes of F2 or BC1 population (200, 500, and 1000) were considered. Each BC1 population consisted of equal number of seeds from BC1.1 and BC1.2. Six broad-sense heritabilities (1%, 5%, 10%, 20%, 30%, and 60%) of the QTL were considered. The heritability of a QTL is defined as
|
| (3) |
|
| (4) |
Based on (3) and (4), VE can be calculated for each given h2 and thus phenotypic values of individual endosperms can be sampled. For each combination of heritability, sample size, and population type, simulation was replicated for 100 times, and a LOD threshold at the overall significance level of 0.05 was estimated by simulation (5000 replicates) under null (no QTL) hypothesis.
Estimation bias from the true value of each parameter (QTL position, additive effect, first dominance effect, and second dominance effect) was examined by t-test, for which the t statistic was: t = (
–
)/s
, where
,
, and s
were the true value, sample mean, and sample standard error of the parameter, respectively. Because there were a large number (144 in total) of cases (null hypotheses) to be tested (Table 3), multiple test correction was required. We used the Bonferroni method to control the family-wise error rate (Bland and Altman 1995).
|
The second part of simulations was to examine the feasibility and efficiency of testing the dominance effects of a mapped QTL. Four cases of the QTL effects were considered. Case I: a = 1, d1 = 0, d2 = 0 (serving as negative control); Case II: a = 1, d1 = 1, d2 = 0; Case III: a = 1, d1 = 0, d2 = 1; and Case IV: a = 1, d1 = 1, d2 = 1. The heritability of the QTL was set to be 10%, and the sample size was set to be 500 in both of the F2 design and the BC1 design. A significance level of 0.05 was used for each test.
| Results |
|---|
|
|
|---|
The simulation results of part I are summarized in Table 3. Under the overall significance level of 0.05, the estimates of parameters were not significantly different from their true values in most of the cases; only 3 (2.1%) cases showed significant differences between the estimates and the true values. Therefore, the parameter estimation should be unbiased in general. The statistical power of QTL detection and the precision of parameter estimation depended on the QTL heritability and sample size. Both the power and the precision were improved as the heritability and sample size increased.
In the F2 design, a QTL with 10% heritability could be efficiently detected and its position and additive effect could be precisely estimated using a small sample (200 seeds). For a QTL with 5% heritability, a medium sample (500 seeds) appeared to be adequate. However, the estimates of dominance effects (d1 and d2) were much less precise. Even in the case of 30% heritability and large sample (1000 F2 seeds), the standard deviations (SDs) of the 2 dominance effects were still slightly larger than that of the additive effect obtained in the case of 10% heritability and small sample (200 seeds). Nonetheless, because the parameter estimation is unbiased in general, reliable estimates of the dominance effects should be obtainable as long as a sufficiently large sample is used.
In comparison with the F2 design, the BC1 design yielded obviously better results (Table 3). A 100% statistical power of QTL detection was achieved in all cases including low heritability (1%) and small sample size (200 seeds). For all parameters, the SDs of estimates obtained by the BC1 design were always smaller than those obtained by the F2 design. Hence, the BC1 design can get higher statistical power and more precise estimates of the parameters. This is especially significant in the estimation of dominance effects and in the case of lower heritability and smaller sample size.
The simulation results of part II are summarized in Table 4. The significance rates in Case I (the negative control) were all close to the controlled type I error rate (5%) as expected. The F2 design showed lower statistical powers and appeared to be only suitable for detecting d1 and d2 jointly but very inefficient for detecting the 2 dominance effects individually. The BC1 design was much more powerful. The 2 dominance effects could be detected with a relatively high statistical power both jointly and individually.
|
| Discussion |
|---|
|
|
|---|
The simulation results indicate that compared with the additive effect, dominance effects have much lower estimation precision in general (Table 3). The reason can be found from formula (4). For simplicity, let us consider the case of d1 = d2 = d. In this case, formula (4) is reduced as
|
|
The above formula indicates that even in the case of a = d, the dominance effects only account for one-sixth of the genetic variance of a QTL. Only in the case of
(this might be very rare) would the dominance effects and the additive effect make equal contribution to the genetic variation. Hence, dominance effects cannot be estimated as precisely as the additive effect in general. This also explains why the statistical powers of detecting dominance effects were not very high in the simulations (Table 4).
The simulation results suggest that the BC1 design is better than the F2 design for endosperm QTL analysis. The major reason might be that the QTL genotype in the BC1 design is more clearly determined than that in the F2 design. It can be seen from Table 2 that in the F2 design, every marker genotype corresponds to 4 possible QTL genotypes, whereas in the BC1 design, except for the marker genotype M1m1M2m2 that also corresponds to 4 possible QTL genotypes, all other marker genotypes only correspond to 2 possible QTL genotypes each (note: the 2 marker genotypes, M1M1m2m2 and m1m1M2M2, do not exist in the BC1 design). In addition, it is seen that in the F2 design, the 2 heterozygous QTL genotypes, QQq (g2) and Qqq (g3), cannot be distinguished from each other according to their conditional probabilities under all marker genotypes, whereas in the BC1 design, the 2 heterozygous QTL genotypes can be clearly distinguished in light of their conditional probabilities under most of the marker genotypes. This explains why the dominance effects can be more efficiently detected and more precisely estimated under the BC1 design.
Although the BC1 design is statistically more desirable for endosperm QTL analysis, the experimental operation may be somewhat difficult for autogamous cereal crops because the process of artificial hybridization could probably hurt the female flower so as to affect seed development. Therefore, the BC1 design might not be quite suitable for autogamous cereal crops (e.g., rice and wheat) but for allogamous cereal crops (e.g., maize) in practical studies.
We have seen that both the F2 design and the BC1 design have the advantage of avoiding maternal influence in the estimation of the direct effects of endosperm QTLs. There is another important advantage in these 2 experimental designs compared with all other designs used before (Wu, Lou, et al. 2002; Wu, Ma, et al. 2002; Xu et al. 2003; Hu and Xu 2005; Wen and Wu 2007). Because a few F1 plants grown in a small block of field will produce sufficient F2 or BC1 seeds for endosperm QTL analysis, environmental variation due to uneven soil fertility, different flowering times (when maternal plants are genetically segregated), and other spatial or temporal factors can be efficiently controlled. This will increase QTL heritability and therefore increase the statistical power of QTL detection and the precision of QTL mapping and effect estimation.
| Funding |
|---|
|
|
|---|
National Basic Research Program (973 Program) of China (grant no.: 2006CB101708).
| Footnotes |
|---|
Corresponding Editor: William Tracy
Received June 25, 2007
Accepted May 12, 2008
| References |
|---|
|
|
|---|
-
Armstrong PR. Rapid single-kernel nir measurement of grain and oil-seed attributes. Appl Eng Agric (2006) 22(5):767–772.[Web of Science]
Bland JM, Altman DG. Statistics notes: multiple significance tests: the Bonferroni method. BMJ (1995) 310:170.
Bramble T, Dowell FE, Herrman TJ. Single-kernel near-infrared protein prediction and the role of kernel weight in hard red winter wheat. Appl Eng Agric (2006) 22(6):945–949.[Web of Science]
Bramble T, Herrman TJ, Loughin T, Dowell F. Single kernel protein variance structure in commercial wheat fields in western Kansas. Crop Sci (2002) 42:1488–1492.
Churchill GA, Doerge RW. Empirical threshold values for quantitative trait mapping. Genetics (1994) 138:963–971.[Abstract]
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc Ser B (1977) 39:1–38.
Hu Z, Xu C. A new statistical method for mapping QTLs underlying endosperm traits. Chin Sci Bull (2005) 14:1470–1476.
Iwami A, Osborne BG, Huynh HN, Anderssen RS, Wesley IJ, Kajiwara Y, Takashita H, Omori T. The measurement of structural characteristics of barley for Shochu using single-kernel characterization system 4100 crush-response profiles. J Inst Brew (2005) 111(2):181–189.
Lander ES, Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics (1989) 121:185–199.
Mo HD, Weir BS, Goodman MM, Eisen EJ, Namkoong G. Genetic expression for endosperm traits. In: Proceedings of the 2nd International Conference on Quantitative Genetics (1987) Sunderland (MA): Sinauer Associates. 478–487.
Shi CH, Chen G, Zhu J, Zang RC, Chen SY. Analysis of embryo, endosperm, cytoplasmic and maternal effects for amylose content trait in Indica rice. Acta Agronomica Sinica (in Chinese) (2000) 26(6):833–838.
Shi CH, Yu YG, Xue JM, Yang XE, Zhu J. Analysis of genetic effects for nutrient quality traits in indica rice. Theor Appl Genet (1996) 92:1099–1102.[CrossRef][Web of Science]
Shi CH, Zhu J, Wu JG, Yang XE, Yu YG. Analysis of embryo, endosperm, cytoplasmic and maternal effects for heterosis of protein and lysine content in indica hybrid rice. Plant Breed (1999) 118(6):574–576.[CrossRef]
Wen YX, Wu WR. Methods for mapping QTLs underlying endosperm traits based on random hybridization design. Chin Sci Bull (2006) 51(16):1976–1981.[CrossRef]
Wen YX, Wu WR. Interval mapping of quantitative trait loci underlying triploid endosperm traits using F3 seeds. J Genet Genomics (2007) 34(5):429–436.[CrossRef][Medline]
Wu R, Lou XY, Ma CX, Wang X, Larkins BA, Casella G. An improved genetic model generates high resolution mapping of QTL for protein quality in maize endosperm. Proc Natl Acad Sci USA (2002) 99:11281–11286.
Wu R, Ma CX, Gallo-Meagher M, Littell RC, Casella G. Statistical methods for dissecting triploid endosperm traits using molecular markers: an autogamous model. Genetics (2002) 162:875–892.
Xu C, He X, Xu S. Mapping quantitative trait loci underlying triploid endosperm traits. Heredity (2003) 90:228–235.[CrossRef][Web of Science][Medline]
Zhu J, Weir BS. Analysis of cytoplasmic and maternal effects: II. genetic models for triploid endosperms. Theor Appl Genet (1994) 89:160–166.[Web of Science]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



