Journal of Heredity Advance Access originally published online on March 15, 2008
Journal of Heredity 2008 99(4):421-425; doi:10.1093/jhered/esn017
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Brief Communications |
Defining the Assumptions Underlying Modeling of Epistatic QTL Using Variance Component Methods
From the Linnaeus Centre for Bioinformatics, SE-75124 Uppsala, Sweden (Rönnegård and Carlborg); Roslin Institute, Midlothian, UK (Pong-Wong); Örjan Carlborg is now at the Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, SE-75007 Uppsala, Sweden
Address correspondence to Lars Rönnegård at the address above, or e-mail: lars.ronnegard{at}lcb.uu.se.
Variance component models are commonly used to detect quantitative trait loci (QTL) in general pedigrees. The variance–covariance structure of the random QTL effect is given by the identity by descent (IBD) between genotypes. Epistatic effects have previously been modeled, both for unlinked and linked loci, as a random effect with a variance–covariance structure given by the Hadamard product between the IBD matrices of the direct QTL effects. In the original papers, the model was given but not derived. Here, we identify the underlying assumptions of this previously proposed model. It assumes that either an unlinked QTL or a fully informative marker (i.e., all marker alleles are unique in the base generation) is located between the loci. We discuss the need of developing a general algorithm to estimate the variance–covariance structure of the random epistatic effect for linked loci.
Understanding the genetic architecture of complex traits controlled by many genes and environmental factors is currently one of the grand challenges in genetics. Mapping of quantitative trait loci (QTL) can be used to identify individual loci involved in the genetic regulation of a multifactorial trait by identifying cosegregation between the phenotype and genetic variation in markers at a limited number of positions in the genome. These methods are based on statistical analysis and are relatively straightforward as long as it can be assumed that the studied complex trait is controlled by one major gene (Lander and Botstein 1989). However, it has been shown that it is common for complex traits to be controlled by multiple loci with individually measurable effects and that these loci tend to interact. Such epistatic effects have been detected in many experimental populations (Carlborg and Haley 2004). Furthermore, developments in theoretical genetics suggest that epistasis is likely to evolve between linked loci (Liberman and Feldman 2006). Consequently, development of statistical models designed to detect interacting QTL in linkage is important.
In QTL studies of pedigrees in outbred populations, variance component (VC) models are commonly used. Their use in experimental line crosses may also be warranted, especially if the base generation alleles are segregating within lines (Perez-Enciso and Varona 2000; Rönnegård and Carlborg 2006; Rönnegård and Carlborg 2007). The single QTL variance model for general pedigrees was first given by Fernando and Grossman (1989) and Goldgar (1990), and the assumptions of the model are given in detail in, for example, Rönnegård and Carlborg (2007). The QTL effect is assumed random, that is, the founders of the mapping population have QTL alleles with effects drawn from a distribution of allelic effects in the entire population. The covariance structure of the random QTL genotype effects in the studied pedigree is given by an identity-by-descent (IBD) matrix.
An extension of the single QTL VC model including epistatic QTL effects was applied by Stern et al. (1996) in a study of human diabetes. In this model, the IBD matrix for pairwise epistatic effects is calculated as the direct Hadamard product between the IBD matrices of the 2 direct effects. The model was later described in more detail by Mitchell et al. (1997) and Blangero and Almasy (1997) and has also been implemented in the SOLAR computer analysis package (Almasy and Blangero 1998). The motivation of the model was, however, not given in these papers.
The aim of this report is to identify and present the assumptions of the VC model of Stern et al. (1996) that were not shown in the original publications. Our report consists of 3 parts. We present the VC model for pairwise additive-by-additive QTL effects and show that the model by Stern et al. (1996) assumes unlinked QTL. Thereafter, the deviations in the epistatic IBD matrix are given for linked QTL in full- and half-sib families. The last part consists of a brief discussion of practical implications.
| Theory |
|---|
|
|
|---|
The definition of an IBD matrix follows from the definition of the random effect included in the VC model. By clearly defining the random allelic QTL effects, it is straightforward to see how the VC model is related to the flow of alleles through a pedigree (Rönnegård and Carlborg 2007). The VC model is presented below without polygenic effects, with no fixed effects except for an overall mean, and with independent residual terms. This gives a simple notation while keeping the major features of the model intact. Following Stern et al. (1996), no dominance effects are considered, and consequently, only additive-by-additive epistatic effects are included in the model and only random effects having a multivariate normal distribution are considered.
The VC model in QTL mapping is given in terms of an IBD matrix where the relationship between the phenotypic values (y) of n related individuals and the random effects of a putative QTL is as follows:
|
| (1) |
Here, µ is the overall mean, v is the vector of QTL genotype effects (length n), 
is the variance of QTL genotype effects in the population that the base generation alleles were drawn from, 
is the residual variance, and I is the identity matrix. Instructions and algorithms for calculating the IBD matrix
are found in, for example, Fernando and Grossman (1989), Goldgar (1990), Wang et al. (1995), Almasy and Blangero (1998), and Pong-Wong et al. (2001).
Stern et al. (1996) and Mitchell et al. (1997) extended model 1 to include epistatic effects between 2 loci A and B such that
|
| (2) |
AB for the epistatic effects in the corresponding VC model,
|
| (3) |
A) and B (
B):
|
| (4) |
Here, 
and 
are the genotype VCs for QTL loci A and B, respectively. The VC for the additive-by-additive epistatic effects is given by 
.
The model presented above was given in terms of genotypic variances, but to clarify the assumptions of the model, we will use the alternative allelic representation. We use the following notation to distinguish allelic effects from allele types: small letters a and b with subscripts denote allele effects, whereas capital letters A and B with both subscripts and superscripts denote allele types, for example, A
denotes maternally inherited allele in individual i at locus A. Furthermore, ami and api are the maternally and paternally inherited allele effects, respectively, for a QTL at locus A and (a
)i denotes the genotype effect for individual i with (a
)i = ami + api because additivity is assumed. There are 4 effects covering the interaction between the maternally and paternally inherited alleles for each locus given by (ambm)i, (apbp)i, (ambp)i, and (apbm)i, where, for instance, (ambm)i is the interaction effect between the maternally inherited allele at locus A and the maternally inherited allele at locus B. Let (a
b
)i be the sum of these 4 allelic interaction effects. In Equation 3, an element in row i and column j in the matrix
A
is defined as the expectation of
given the marker information M, and elements in
AB
are defined as expectation of
given M.
The allelic representation of model 2 for individual i is as follows:
|
|
![]() | (5) |
We now have all the necessary theory to derive the underlying assumptions of the Hadamard product in Equation 4. The covariance in Equation 5 consists of 16 covariances between the different allele combinations, where the first of these is Cov((ambm)i,(ambm)j). This covariance is nonzero only if alleles in both loci A and B are IBD. Thus, the expected covariance, given the genetic marker information M, is as follows:
![]() | (6) |
The probability in Equation 6 is one of the 16 gametic IBDs that give the elements of
AB. The definition of joint and conditional probabilities gives
![]() |
|
| (7) |
The relationship in Equation 7 is true if the QTL allele state in locus A does not give any additional information about the QTL allele state in locus B, given the marker information. Thus, the relationship
AB =
A
B in Equation 4 is true if one of the following statements holds:
- QTL A and B are unlinked (i.e., located on different chromosomes) and
- if QTL A and B are linked (same chromosome), there is at least one informative marker between the QTL positions for all animals included in the pedigree.
The second condition follows from the fact that a QTL allele state in locus A does not give any additional information about the QTL allele state in locus B if there is an informative marker between the 2 loci.
Hereon, the correctly calculated IBD matrix will be referred to as the "correct IBD matrix" and will be denoted
AB, whereas the IBD matrix obtained from the Hadamard product (Equation 4) assuming unlinked loci will be referred to as the "Hadamard approximation" and will be denoted 
.
| Numerical Examples |
|---|
|
|
|---|
Below we give 2 cases showing that
AB and 
may differ substantially for linked QTL. In our Supplementary Appendix, we also give a general algorithm to calculate
AB when QTL A and B are on the same side of a fully informative marker.
Half-Sib Pedigree with Epistatic QTL Located at Marker Positions
For the half-sib pedigree in Table 1, the epistatic IBD is calculated for the interaction of 2 QTL located at marker positions. The markers are not fully informative, and the Hadamard product, therefore, gives an incorrect IBD matrix when the QTL are linked and there is no additional marker information.
|
The genotype IBD matrices for the main effects in loci A and B are given by the following:
![]() |
The only terms in
that are affected by linkage between loci A and B are the ones between the half sibs. Thus,
is equal to the following:
![]() |
In Table 1, it is not known which of the maternal alleles have been transmitted to the half sibs. To calculate the epistatic IBD, we only need to derive
because all other allelic combinations in Equation 5 will not be IBD between the 2 half sibs. This probability is equal to
and because the maternal marker is noninformative, we only need to derive
and
The latter equality is derived from the fact that we have
given
if there is a recombination in both meiosis (with probability r2) or no recombination in either meiosis of the mother (with probability
). The elements in
is a quarter of the sum of allelic IBD probabilities; hence, the element x in
above is equal to
Full-Sib Pedigree with Epistatic QTL Not Located at Marker Positions
We give a full-sib pedigree in Table 2 where a single marker is simulated on the same chromosome as QTL A and B. It is located closest to QTL A with a recombination frequency of rAM = 0.3 between QTL A and the marker. QTL B is located further away from the marker with a recombination frequency between loci A and B of rAB = 0.1. For this case, the correct IBD matrix is (calculations shown in the Supplementary Appendix) as follows:
![]() |
![]() |
|
| Discussion |
|---|
|
|
|---|
We have shown that the epistatic VC model based on the Hadamard product assumes unlinked QTL or that there is a highly informative marker between 2 linked QTLs, where each marker allele in the base generation is unique. A single-nucleotide polymorphism (SNP) marker, for instance, is not fully informative, but if there are SNP haplotypes with many SNP markers between linked QTL, then the Hadamard approximation is expected to be a good one. More specifically, if there is, between 2 linked loci, a unique sequence of SNP markers for each haplotype in the base generation, then the epistatic IBD matrix for these 2 loci can be calculated directly from the Hadamard product in Equation 4.
Epistasis may be included in a VC model for 2 different reasons: as a nuisance parameter to improve the estimates of the main effects (e.g., Stern et al. 1996) or as the main parameter of interest to detect epistasis. In the former case, we do not expect that the error in the Hadamard approximation for linked loci will have any substantial effect on the analyses. However, if the aim is to detect epistasis for linked QTL, correctly calculated IBD matrices will be essential.
The power to detect epistatic QTL for linked loci has not been assessed, which would require a general multimarker algorithm. We, therefore, propose that an epistatic IBD matrix estimation algorithm for linked loci should be developed in the near future, especially because epistasis is likely to evolve between linked loci (Liberman and Feldman 2006).
| Supplementary Appendix |
|---|
|
|
|---|
Supplementary appendix can be found at http://www.jhered.oxfordjournals.org/.
| Funding |
|---|
|
|
|---|
Swedish Research Council FORMAS (2004-1115). R.P.W. was supported by an European Union motility grant (HPRI-CT2001-00153) to visit The Linnaeus Centre for Bioinformatics.
| Footnotes |
|---|
Corresponding Editor: Jerry Dodgson
Received November 6, 2007
Accepted January 25, 2008
| References |
|---|
|
|
|---|
-
Almasy L, Blangero J. Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. (1998) 62:1198–1211.[CrossRef][Web of Science][Medline]
Blangero J, Almasy L. Multipoint oligogenic linkage analysis of quantitative traits. Genet Epidemiol (1997) 14:959–964.[CrossRef][Web of Science][Medline]
Carlborg Ö, Haley CS. Epistasis: too often neglected in complex trait studies? Nat Rev Genet. (2004) 5:618–625.[CrossRef][Web of Science][Medline]
Fernando RL, Grossman M. Marker-assisted selection using best linear unbiased prediction. Genet Sel Evol. (1989) 21:467–477.[CrossRef][Web of Science]
Goldgar DE. Multipoint analysis of human quantitative genetic variation. Am J Hum Genet. (1990) 47:957–967.[Web of Science][Medline]
Lander ES, Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics (1989) 121:185–199.
Liberman U, Feldman MW. Evolutionary theory for modifiers of epistasis using a general symmetric model. Proc Natl Acad Sci USA. (2006) 103:19402–19406.
Mitchell BD, Ghosh S, Schneider JL, Birznieks G, Blangero J. Power of variance component linkage analysis to detect epistasis. Genet Epidemiol (1997) 14:1017–1022.[CrossRef][Web of Science][Medline]
Pérez-Enciso M, Varona L. Quantitative trait loci mapping in F2 crosses between outbred lines. Genetics (2000) 155:391–405.
Pong-Wong R, George AW, Woolliams JA, Haley CS. A simple and rapid method for calculating identity-by-descent matrices using multiple markers. Genet Sel Evol. (2001) 33:453–471.[CrossRef][Web of Science][Medline]
Rönnegård L, Carlborg Ö. A new efficient method for QTL mapping in divergent intercrosses incorporating within line variation. (2006) Proceedings of the 8th World Conference on Genetics Applied to Livestock Production; 2006 August; Belo Horizonte, Brazil. Horizonte (Brazil): Instituto Prociencia.
Rönnegård L, Carlborg Ö. Separation of base allele and sampling term effects gives new insights in variance component QTL analysis. BMC Genetics (2007) 8:1.[CrossRef][Medline]
Stern MP, Duggirala R, Mitchell BD, Reinhart LJ, Sivakumar S, Shipman PA, Uresandi OC, Benavides E, Blangero J, O'Connell P. Evidence for linkage of regions on chromosome 6 and 11 to plasma glucose concentrations in Mexican Americans. Genome Res (1996) 6:724–734.
Wang T, Fernando RL, Van der Beek S, Grossman M, Van Arendonk JAM. Covariance between relatives for a marked quantitative trait locus. Genet Sel Evol. (1995) 27:251–274.[CrossRef][Web of Science]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






