Journal of Heredity Advance Access published online on February 14, 2008
Journal of Heredity, doi:10.1093/jhered/esm103
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computer Note |
MBP (Version 1.0): A Software Package to Optimize Maize Breeding Procedures Based on Doubled Haploid Lines
From the University of Hohenheim, Institute of Plant Breeding, Seed Science, and Population Genetics, D-70593 Stuttgart, Germany
Address correspondence to G. A. Gordillo at the address above, or e-mail: gordillo{at}uni-hohenheim.de.
We developed MBP (version 1.0), a software package for optimizing maize (Zea mays L.) breeding procedures based on doubled haploid lines. This software accounts for both recurrent selection and the development of hybrid parent lines. Based on quantitative genetic model calculations, MBP (version 1.0) maximizes the expected genetic gain per year as a function of various genetic parameters and operational variables under the restriction of a given annual breeding budget. Exact formulae for the prediction of the effective population size are implemented, which allows to optimize breeding procedures under limited relative annual loss of genetic variance.
The use of doubled haploid (DH) lines is increasingly replacing the traditional development of inbred lines in commercial hybrid maize breeding (Röber et al. 2005). Yet, the success of employing DH lines depends on the choice of an efficient breeding procedure and the optimum allocation of technical and monetary resources to the individual breeding steps. Quantitative genetic model calculations are a useful tool for optimizing breeding procedures. Herein, the number of testers, test units, locations, and replicates are varied systematically in order to find an allocation, which maximizes the expected gain from selection under restricted monetary and technical resources. When optimizing recurrent selection (RS) procedures, the breeder may want to set an upper limit for the reduction of the genetic variance due to random genetic drift and selection. For this, the effective population size (Ne) should not fall below a minimum level (Crow and Kimura 1970). Without selection and with a normal distribution of the number of offspring contributed by each parent to the next generation, Ne only depends on the number of parents used to start a new RS cycle and their coancestry and inbreeding coefficient (Nomura 2005). Selection additionally reduces Ne due to an increased variance of the number of offspring per parent (Robertson 1961). Wray and Thompson (1990) first derived formulae for predicting Ne under consideration of the cumulative effect of selection using an approach based on the rate of inbreeding (
F). More recently, Santiago and Caballero (1995) proposed an analogous prediction referring to the variance of gene frequency (drift approach). Ensuring a minimum Ne restricts the achievable selection intensity. This is especially critical when crosses are made every year to start a new selection program such that the whole breeding population is divided into multiple, timely staggered, subpopulations, as it is commonly practiced in hybrid maize breeding. An option to increase the overall selection intensity, while still preserving a minimum Ne, is to include lines from "neighboring" staggered subpopulations in the fraction of lines to be intercrossed for starting a new RS cycle. Herein, this approach is defined as "subpopulation interlinking." Predicting selection gain in this situation is made complicated by the fact that lines selected for interlinking trace back to differently performing subpopulations.
Our software package allows the user to optimize 7 alternative maize breeding schemes based on DH lines. It is applicable to RS, to hybrid parent line development (LD), as well as to an integrated RS/LD approach. MBP (version 1.0) maximizes the expected genetic gain per year by means of quantitative genetic model calculations under the restriction of a given annual budget. It is applicable to 1-, 2-, and 3-stage testcross selection and 2 different strategies of subpopulation interlinking. The approach of Santiago and Caballero (1995) is used to predict Ne. To limit the reduction of genetic variance, upper limits can be chosen for the term
, where Y is the cycle length in years. This term quantifies the relative annual loss of genetic variance and is equivalent to the annual inbreeding rate. MBP (version 1.0) builds on previously developed computer programs for optimizing hybrid breeding plans in rye (Tomerius 2001). It needs comparatively little computing time and is therefore a valuable tool for evaluating alternative maize breeding procedures.
| Breeding Schemes |
|---|
Seven alternative DH-based breeding schemes are accounted for in MBP (version 1.0). Scheme 1 is considered as a standard procedure. It comprises the following steps:
- Creating new variation by intercrossing selected lines for starting a new breeding cycle. Each of n selected parent lines may be intercrossed at random with 1 up to n – 1 of the remaining lines. The software allows the user to determine the optimum number of cross combinations per parent.
- In vivo haploid induction in generation F1 (
S0).
- Chromosome doubling of haploid seedlings and selfing of the resulting DH plants in generation D0 to produce the first DH line generation (D1).
- Per se evaluation of D1 lines in observation plots and, in parallel, selfing of D1 plants to produce D2 lines. The user may specify the selected fraction of D1 lines and the number of observation plots for evaluating each line per se.
- Production of testcross seed of D2 candidate lines with one or more testers from an opposite gene pool.
- Evaluation of the testcross performance of D2 lines in multienvironment yield trials. The number of finally selected lines and the number of evaluation stages in the RS and in the LD procedures may be specified separately by the user. For instance, a one-stage RS scheme may be optimized in combination with a 1-, 2- or 3-stage LD procedure. The described breeding cycle requires 6, 8, or 10 seasons, respectively for 1-, 2-, or 3-stage testcross selection. The subsequent breeding phase for building up and evaluating experimental hybrids is not accounted for.
The alternative breeding schemes differ in the cycle length, the genetic material for starting a new RS cycle (single, double, or top crosses), the generation (S0, S1, or S2) in which in vivo haploids are induced as well as the type of test units.
| Mating Schemes |
|---|
Herein, the term "breeding population" designates a gene pool (heterotic group) of a hybrid maize breeding program. In MBP (version 1.0), the user may assume that the breeding population remains undivided, with each breeding step occurring only once during the time required to complete one RS cycle. Alternatively, it may be assumed that crosses are made every year to start a new breeding program such that the whole breeding population is divided into multiple, timely staggered, subpopulations of the same gene pool. This way, the number of staggered subpopulations corresponds to the cycle length in years. Moreover, subpopulations may be interlinked by intercrossing lines selected from different staggered subpopulations. Two alternative strategies of subpopulation interlinking are accounted for: Recombination units may be composed of lines derived directly after completing an RS cycle in a given subpopulation plus lines selected 1 year earlier from a preceding subpopulation (interlinking Strategy 1) or lines selected from a subsequent subpopulation (interlinking Strategy 2).
| Genetic Model |
|---|
A test unit corresponds to the testcross progeny of a line with a given set of testers. The phenotypic variance between test unit means is defined as
|
|

refers to the genotypic variance between test units, 
the variance of genotype x year interaction, 
the variance of the genotype x location interaction, 
the variance of the genotype x location x year interaction, 
the error variance, L denotes the number of test locations, R the number of replicates, and T the number of testers. A 1-year testing is assumed at all selection stages. The parameter
is defined according to Griffing (1956) as |
|
and
are the general combining ability (GCA) and specific combining ability variances, respectively. Parameters
,
, and
are defined accordingly. Assuming that epistatic and maternal effects are absent,
and
are linear functions of the additive (
) and dominance variance (
) (Kempthorne 1957, p. 426): |
|
and
denote the probability that 2 random individuals of a given testcross progeny have received alleles identical by descent from the candidate line, respectively, the tester.
It is assumed that NDH:C DH lines from each of NC crosses enter the testcross evaluation stage. Accordingly, the genotypic variance between DH line testcrosses
is subdivided into components due to crosses
and DH lines within crosses
. The total genotypic variance between test units is calculated as
|
|
The genotypic variance between testcrosses of S2 lines and between DH lines within S2 lines (Scheme 7) is calculated analogously.
| Gain Criterion |
|---|
The gain criterion is the GCA of the selected lines for a base index (Brim et al. 1959; Williams 1962) composed of the testcross performance for grain yield and grain dry matter content:
|
|
No genotypic correlation is assumed between traits considered in line selection and those targeted in testcross selection. Therefore, no correlated response in testcross performance is accounted for. However, the cost of evaluating the lines per se is subtracted from the total budget. Hence, the budget remaining for the evaluation of the testcrosses depends on how much effort is spent on line per se selection.
| Prediction of Selection Gain |
|---|
The gain in GCA from one stage of index selection for grain yield without interlinking of subpopulations is computed by
|
|
and
are the correlation coefficient and covariance, respectively, between the index value and GCA for grain yield,
is the standard deviation of GCA for grain yield,
I is the standard deviation of the index variance,
is the covariance between the GCA effects for grain yield and grain dry matter content, and
is the phenotypic covariance between grain yield and grain dry matter content. The expected gain in GCA for grain dry matter content is estimated analogously. The gains in grain yield and grain dry matter content sum up to the total genetic gain:
![]() |
To predict the gain from multistage selection, exact formulae derived by Cochran (1951) for 2 stages and extended by Utz (1969) to 3 stages are applied. The general formula for the selection gain from m selection stages is
|
|
is the final selected fraction,
is the correlation between the index value and GCA at selection stage s, zs is the ordinate of the univariate normal distribution at the truncation point ks of selection stage s, and Ims is the incomplete area of the standardized (m – 1)-variate normal integral. Uni-, bi-, and trivariate normal integrals or alternatively their lower truncation limits are determined by numerical methods (Tomerius 2001).
The phenotypic covariances among the first, second, and third selection stage are required to calculate the m-variate normal integrals. These covariances are defined as
|
|
is the covariance between the phenotypic values at stages s and s' (s < s'),
is the variance of the GCA x locations interaction,
is the number of locations common to selection stages s and s', and Ls and Ls' are the number of locations at stages s and s', respectively (Utz 1969). For calculating the selection intensity, an infinite population size is assumed, although, in reality, the population size is finite. However, comparisons have shown that the 2 assumptions (finite vs. infinite population size) result in negligible differences regarding the optimum allocation and the expected selection gain (Utz 1969).
When considering subpopulation interlinking, the differences in the performance levels of staggered subpopulations are accounted for predicting the annual response to selection (
). For instance, with interlinking Strategy 1,
|
|
and
refer to the predicted response to one selection cycle in 2 successive subpopulations
j and
j – 1, respectively,
and
denote the effective generation intervals for subpopulations
j and
j – 1, and
and
the proportion of recombination units derived from the respective subpopulations. The latter expression is equivalent to the one proposed by Rendel and Robertson (1950) for predicting the response to selection in populations with overlapping generations. The annual genetic gain (
) is estimated analogously for interlinking Strategy 2. | Optimization Criterion |
|---|
The user may choose among the following optimization criteria (OC): 1) genetic gain from RS alone, 2) genetic gain from LD alone, or 3) any combination of the 2 foregoing OC. The latter option takes into account that RS is usually integrated into the procedure for developing new hybrid parent lines in commercial breeding. Thus, we define the OC as
|
|
and
designate the respective gains in GCA, and YRS and YLD denote the pertinent cycle lengths. | Prediction of the Effective Population Size |
|---|
The prediction of Ne is based on the approach of Santiago and Caballero (1995), which accounts for the cumulative effect of selection on genetic drift. Assuming that a constant number of parent lines is randomly recombined each cycle and that the distribution of the number of lines per cross combination is random, Ne can be calculated for a given trait T from the following equation:
|
|
is a parameter accounting for the cumulative effect of selection for trait T, and
refers to the respective variance due to differences in the selective advantage of individual crosses.
QT is approximated by
, where
is the proportion of genetic variance remaining after selection for trait T. Herein, an infinitesimal model of gene effects is assumed such that selection induces a gametic-phase disequilibrium, which leads to a reduction of genetic variance. An asymptotic value for the genetic variance is reached, when the reduction of variance due to selection is balanced by the increase of variance from random mating following selection (Wricke and Weber 1986; Gomez-Raya and Burnside 1990). For this situation,
2 (Bulmer 1980; Santiago and Caballero 1995), where
is
is the asymptotic correlation between the index value and GCA for trait T.
is estimated according to Milkman (1978) as
|
|
is the asymptotic intraclass correlation of full sibs (lines within a cross) for trait T. An extended formula for predicting
according to Santiago and Caballero (1995) is used: |
|
FS is the coefficient of coancestry (Malecot 1948) of full sibs (e.g.,
FS = 0.5 for single crosses with inbred parents and
FS = 0.25 for double crosses with inbred grandparents). The asymptotic covariance between the index value and the GCA for a given trait T is defined as
![]() |
![]() |
The latter expressions for
and
correspond to the formulae for the asymptotic genetic variance and asymptotic heritability, respectively, derived by Gomez-Raya and Burnside (1990) for a single-trait situation.
The correlation between index value and GCA (
) for m-stage selection is calculated as
![]() |
When subpopulations are interlinked, the effective population size is first estimated separately for each interlinked subpopulation and thereafter summed up to obtain the total Ne. The relative annual loss of genetic variance for a given trait T is calculated as
|
|
|
|
|
|
The relative importance of dry matter content is calculated analogously. Optimizations may be carried out under the restriction of an upper limit for
, which can be specified arbitrarily by the program user.
| Optimization Procedure |
|---|
Optimum values for the number of cross combinations per parent line, testers, test units, locations, and replicates at each selection stage are calculated by the software, as well as the number of lines to be selected for recombination according to the upper limit for
. The optimization procedure follows an n-dimensional grid search approach suggested by Tomerius (2001). Only allocations making full use of the budget are considered. For 3-stage testcross evaluation procedures, the selected fraction of candidate lines at each testcross stage is restricted to a maximum value of
| Quantitative Genetic Parameters and Operational Variables |
|---|
The input file contains specifications regarding the breeding scheme, the index weights, the budget, as well as the dimensioning parameters, which may be modified arbitrarily by the user. MBP (version 1.0) uses standard values of variance component estimates obtained from testcross data of several large samples of Central European inbred lines and DH lines made available by collaborating breeding companies (Gordillo and Geiger 2004). The underlying labor cost data and haploid induction parameters (e.g., haploid induction rate, average number of D1 lines per induced plant) were assessed likewise.
| Output File |
|---|
The output file comprises the relevant information specified by the user in the input file and the optimum allocation data. In addition, the output file prints the expected gain in the optimization criterion and in GCA for grain yield and grain dry matter content from RS and LD. Finally, the predicted relative annual loss of genetic variance,
, at the optimum allocation is stated in case of RS procedures. | Features |
|---|
MBP (version 1.0) was programed in Borland C++ BuilderX and runs under Microsoft Windows 95/98/2000/NT. The software package including the compiled routine of MBP (version 1.0), the alternative DH breeding schemes, and a user's manual with various examples are available on request. For noncommercial or academic use, it will be distributed free of charge. Commercial users may obtain access to the software for a nominal charge. Please contact the corresponding author.
| Funding |
|---|
German Bundesministerium für Wirtschaft und Arbeit (AiF No. 13991); the Gemeinschaft zur Förderung der privaten deutschen Pflanzenzüchtung e.V.
| Acknowledgments |
|---|
The authors are grateful to the breeding companies Südwestdeutsche Saatzucht GmbH & Co. KG (SWS), KWS SAAT AG, and Monsanto Agrar Deutschland GmbH for providing experimental and labor cost data and to F. K. Röber, W. Schmidt, and E. Holzhausen for helpful discussions.
| Footnotes |
|---|
Corresponding Editor: Perry Gustafson
| References |
|---|
-
Barwick SA, Henzell AL. Development successes and issues for the future in deriving and applying selection indexes for beef breeding. Aust J Exp Agric (2005) 45:923–933.[CrossRef]
Brim CA, Johnson HW, Cockerham CC. Multiple selection criteria in soybeans. Agron J (1959) 51:42–46.
Bulmer MG. The mathematical theory of quantitative genetics. (1980) Oxford: Clarendon Press.
Cochran WG. Improvement by means of selection. In: Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability—Neyman J, ed. (1951) Berkeley (CA): University of California Press. 449–470.
Crow JF, Kimura M. An introduction to population genetics theory (1970) New York: Harper and Row.
Gomez-Raya L, Burnside EB. The effect of repeated cycles of selection on genetic variance, heritability, and response. Theor Appl Genet. (1990) 79:568–574.[Web of Science]
Gordillo GA, Geiger HH. Estimating quantitative-genetic parameters of European maize populations to optimize hybrid breeding methods by model calculations (poster abstract). In: Proceedings of the XVIIth EUCARPIA General 2004 Sep 8–11, Tulln, Vienna, Austria: BOKU—University of Natural Resources and Applied Life Sciences—Vollmann J, Grausgruber H, Ruckenbauer P, eds. (2004) Congress, p. 484.
Griffing B. A generalized treatment of the use of diallel crosses in quantitative inheritance. Heredity (1956) 10:31–50.[Medline]
Kempthorne O. An introduction to genetics statistics (1957) New York: John Wiley and Sons.
Malecot G. Les mathematiques de l'heredite (1948) Paris: Masson et Cie.
Milkman R. Selection differentials and selection coefficients. Genetics (1978) 88:391–403.
Nomura T. Developments in prediction theories of the effective size of populations under selection. J Anim Sci. (2005) 76:87–96.[CrossRef]
Rendel JM, Robertson A. Estimation of genetic gain in milk yield by selection in a closed herd of dairy cattle. J Genet. (1950) 50:1–8.[Web of Science]
Röber FK, Gordillo GA, Geiger HH. In vivo haploid induction in maize—performance of new inducers and significance of doubled haploid lines in hybrid breeding. Maydica (2005) 50:275–284.
Robertson A. Inbreeding in artificial selection programmes. Genet Res. (1961) 2:189–194.[Web of Science]
Santiago E, Caballero A. Effective size of populations under selection. Genetics (1995) 139:1013–1030.[Abstract]
Tomerius AM. Optimizing the development of seed-parent lines in hybrid rye breeding [PhD dissertation]. (2001) [Stuttgart (Germany)]: University of Hohenheim.
Utz HF. Mehrstufenselektion in der Pflanzenzüchtung [PhD dissertation]. (1969) [Stuttgart (Germany)]: University of Hohenheim.
Williams JS. The evaluation of a selection index. Biometrics (1962) 18:375–393.[CrossRef][Web of Science]
Wray NR, Thompson R. Prediction of rates of inbreeding in selected populations. Genet Res. (1990) 55:41–54.[Web of Science][Medline]
Wricke G, Weber WE. Quantitative genetics and selection in plant breeding (1986) Berlin (Germany): Walter de Gruyter.
This article has been cited by other articles:
![]() |
G. A. Gordillo and H. H. Geiger Alternative Recurrent Selection Strategies Using Doubled Haploid Lines in Hybrid Maize Breeding Crop Sci., May 1, 2008; 48(3): 911 - 922. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




