Journal of Heredity Advance Access originally published online on February 4, 2008
Journal of Heredity 2008 99(3):323-334; doi:10.1093/jhered/esm125
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computer Note |
Parentage Analysis with Few Contributing Breeders: Validation and Improvement
From the Département de biologie, Université Laval, Ste-Foy, Québec, Canada G1K 7P4 (Duchesne); the Laboratoire Ecosystèmes Lagunaires, UMR 5119, cc093, Université Montpellier 2, place Bataillon, 34095 Montpellier, Cedex 05, France (Meldgaard and Berrebi); and the Institut des Sciences de l'Evolution, UMR 5554, cc065, Université Montpellier 2, place Bataillon, 34095 Montpellier, Cedex 05, France (Berrebi)
Address correspondence to P. Duchesne at the address above, or e-mail: pierre.duchesne{at}bio.ulaval.ca.
Validation of parental allocation using PAPA software (Duchesne P, Godbout MH, Bernatchez L. 2002. PAPA (package for the analysis of parental allocation): a computer program for simulated and real parental allocation. Mol Ecol Notes. 2:191–193.) was investigated under the assumption that only a small proportion of potential breeders contributed to the offspring sample. Inbreeding levels proved to have a large impact on allocation error rate. Consequently, simulations from artificial, unrelated parents may strongly underestimate allocation error, and so, whenever possible, simulations based on the actual parental genotypes should be run. An unexpected and interesting finding was that ambiguity (the highest likelihood is shared by several parental pairs) rates below 10% stood very close to exact allocation error rates (true proportions of wrong allocations). Hence, the ambiguity rate statistic may be viewed as a ready-made indicator of the resolution power of a specific parental allocation run and, if not exceeding 10%, used as an estimate of allocation error rate. It was found that the PAPA simulator, even with few contributing breeders, can be trusted to output reasonably accurate estimates of allocation error as long as those estimates do not exceed 15%. Indeed, most discrepancies between exact and estimated error then stood below 3%. Reproductive success variance had little impact on error estimate discrepancies within the same range. Finally, a (focal set) method was described to correct the estimated family sizes computed directly from parental allocations. Essentially, this method makes use of the detailed structure of the allocation probabilities associated with each parental pair with at least 1 allocated offspring. The allocation probabilities are expressed in matrix form, and the subsequent calculations are run based on standard matrix algebra. On average, this method provided better estimates of family sizes for each investigated combination of parameter values. As the size of offspring samples increased, the corrections improved until a plateau was finally reached. Typically, samples comprising 250, 500, and 1000 offspring would bring corrections in the order of 10–20%, 20–30%, and 30–40%, respectively.
Parentage analyses have become increasingly necessary in studies of evolutionary ecology and conservation genetics (Avise 1994; Hughes 1998; Jones and Ardren 2003), and the development of highly polymorphic microsatellite markers has improved the statistical power of such analyses (Luikart and England 1999). The objective of a parental allocation process based on genetic information is to find parental genotypes corresponding to the true parents of each of a set of offspring genotypes (Duchesne and Bernatchez 2007). This, in turn, makes it possible to investigate the "genetic mating system" of the species, to evaluate the actual reproductive success of breeders (Pemberton et al. 1992; Coltman et al. 1999; Bekkevold et al. 2002), and to construct pedigrees. Parentage analysis allows one to confirm monogamy in some species (Ribble 1991; Brotherton et al. 1997) and to demonstrate extrapair copulation in others (Girman et al. 1997; Goossens et al. 1998). They are especially useful when direct mating observations are difficult (Clapham and Palsboll 1997) such as in studies of different spawning strategies in fish (Garcia-Vazquez et al. 2001).
PAPA is a parental pair allocation and simulator program specifically designed to perform parental analysis when all contributing parents belong to the set of scored potential parents (closed systems). Its allocation method is based on the likelihood of each candidate parental pair producing the multilocus genotype found in the offspring being tested. The choice of the PAPA parental analysis software in this study was motivated by a combination of several distinctive statistical and structural features. First, PAPA outputs ambiguity and "correctness rate" statistics (see definitions in the Glossary). Second, the PAPA simulator mimics the entire allocation procedure although the allocation procedure itself does not involve the simulator. This uncoupling of the simulator and allocation procedures favors reliable performance assessments of parental analysis. Also significantly enhancing the reliability of the PAPA simulator is the possible use of the real parental genotypes (the ones collected by the user) as opposed to program-generated, therefore unrelated, parental genotypes. Due to the very circumstances under which genetic parental allocations are run, the proportion of correct allocations over all allocations (the correctness rate) cannot generally be assessed out of direct, empirical observations. Therefore, one has to resort to simulations in order to estimate correctness rates within specific parental analyses. Besides generating "Poisson" distributions of reproductive success, most current simulators generate offspring from artificial, totally noninbred, parental sets.
Goals of This Study
Within several reproduction systems, both in natural and aquaculture settings, it seems that only a small proportion of potential breeders have any offspring at all. Studies involving a wide range of species such as Pacific oyster (Li and Hedgecock 1998), salmon (Garant et al. 2001), cheetah (Kelly 2001), rainbow trout (Palti et al. 2006), abalone (Hara and Sekino 2007), Bullock's oriole (Richardson and Burke 2001), giant Galapagos tortoise (Milinkovitch et al. 2004), black rat snake (Blouin-Demers et al. 2005), brown rat (Heiberg et al. 2006), Nile tilapia (Fessehaye et al. 2006), red flour beetle (Bernasconi et al. 2006), and cod (Wesmajervi et al. 2006) have found high variance levels of reproductive success that are totally incompatible with the Poisson model of the distribution of reproductive success among potential breeders. The Poisson model is based on the assumptions that breeders mate at random, have the same reproductive capability, and produce equally fit offspring.
The problem of potential effects of a strong departure from the Poisson model over both the reliability and the validation of parental analysis came up in the context of an experimental study bearing on the hybridization of marble trout (MT) with brown trout (BT) (see details in Meldgaard et al. 2007). This experimentation from which the genotype data were obtained is based on 35 potential parents, 126 offspring, and 9 microsatellite loci. It was carried out in order to improve knowledge on MT and stocked BT hybridization in the Soca River (Slovenia), which is threatening the native local form (Berrebi et al. 2000).
Parental analysis can and often does provide the cornerstone for research bearing on the ecology of small and sometimes endangered populations. Enhanced knowledge of the specifics of endangered populations is certainly an asset from a conservation perspective. However, breeders within small natural populations are often inbred, sometimes to a very high degree. This is one reason to try to better understand the effects of high inbreeding levels over parental allocation performance. Moreover, populations that are kept in captivity for various purposes such as food production, strain selection, and supplementation programs are sometimes monitored for trait inheritance and fecundity through parental analysis. Those populations also tend to be highly inbred with possible consequences for the quality of both the allocations and the estimation of how good those allocations are. As of now, those consequences still remain to be investigated.
The consequences of the discrepancy between inbred parental structure and assumed parental unrelatedness are evaluated here when only a small proportion of potential breeders contribute offspring (thereafter referred to as the SPC condition). Another aspect that is investigated under SPC is the impact over parental analysis of the variance of reproductive success among the few contributing breeders.
More generally, this paper investigates various statistical properties of the PAPA parental allocation system under the SPC condition. A second goal is to describe and assess a method (the "focal set method") to improve estimates of family sizes over the ones obtained directly from allocation runs.
This study has been conducted under the assumptions that scoring is error free and that all contributing parents belong to the set of scored potential parents (closed parental system).
| Methods |
|---|
|
|
|---|
All artificial parental as well as offspring genotypes were generated based on real BT and MT specimens sampled in the Driselpoh stream (Slovenia). Nine loci have been scored, and a summary of these microsatellite data is provided in Table 1. Note that the relationships that are investigated in this paper can be generalized to any genetic data set as long as they are interpreted in a qualitative, structural way. For the sake of clarity, the methods pertaining to the 2 distinct goals will be described in 2 sections.
|
Allocations: Error Rates and Ambiguity
Simulation Method
First, several sets of parental genotypes were constructed. Offspring were generated from each set and allocated according to the PAPA algorithm. The exact correctness rates were computed. Estimated correctness rates were obtained from the PAPA parental simulator that uses the parental genotypes. Thus, exact and estimated correctness rates could be compared under various parameter conditions. We now proceed to a detailed description of genotype generation (parents and offspring) and of the allocation procedure.
Genotype Generation
Four sets of 35 parents (19 males and 16 females) were generated. Three of these sets were obtained as offspring from grandparental sets of various sizes S = 4, 8, and 16, each comprising equal numbers of males and females generated based on the allelic frequencies found in the trout population (see Case Study below). These parental sets will be denoted as grp22, grp44, and grp88, respectively. A fourth parental set, the unrel set, was generated directly from these same allelic distributions. In addition, the grandparental sets were given a Russian doll structure so that, for instance, the largest grandparental set (16) included the second largest (8) and the latter the next largest (4). This construction ensured that the larger the grandparental set, the less inbred their offspring, that is, the corresponding parental set. The Mxy values, the number of shared alleles per locus averaged over all loci, and all parental pairs (Blouin et al. 1996), for parental sets grp22, grp44, and grp88, were 0.465, 0.380, and 0.353, respectively. The Mxy statistic is especially pertinent in the context of parental analysis because allele sharing is directly linked to higher levels of ambiguity and allocation error. The size of the grandparental set will thereafter be considered as the "inbreeding parameter."
Allocation Procedure
The allocation procedure involved 3 components: an allocation matrix, a family size array, and the allocation of random offspring.
Allocation matrix.
For each of the 4 parental sets, 10 000 offspring were generated using the PAPA parental simulator based on the first 5, 7, and 9 available loci. This resulted in 4 x 3 = 12 sets of offspring that were then analyzed in terms of the 19 x 16 = 304 possible parental pairs. Given a set of offspring originating from a parental pair Pi, the number of offspring not allocated due to ambiguity and the number of offspring (family size) allocated to each parental pair, including Pi, was then recorded. Subsequently, those numbers were divided by their sum to become allocation proportions associated with ambiguity or with Pi. Each of those allocation proportions was interpreted as an estimate of the probability that the Pi offspring are nonallocated (ambiguous) or allocated to P1, P2, ..., Pi, ..., PL, respectively. The estimates were denoted as PrPi
Pj and put together to form the allocation matrix <MA>:
![]() |
Family size arrays <N> = <N1, N2, ..., Ni, ..., NL>.
The artificial offspring sample sizes within each allocation run involved 125, 250, 500, 1000, 2500, or 5000 offspring. These offspring were distributed among 18 (6% of all 304 possible pairs) randomly chosen parental pairs according to one of 3 types of family size arrays: Poisson, "aqua," and "trout," in ascending order of reproductive success variance. The Poisson 500 offspring sample was obtained from a Poisson random number generator. The aqua 500 offspring sample was distributed following the family size distribution found in a recent aquaculture investigation that involved allocating 573 rainbow trout offspring to 20 potential parental pairs (Palti et al. 2006). The trout 500 offspring samples were distributed very much as in the Case Study presented below. Offspring samples of other sizes were distributed in the same fashion but in proportion to the 500 samples of the same type. Overall, 6 (sample sizes) x 3 (types: Poisson, aqua, and trout) = 18 family size arrays <N> were used in association with each allocation matrix. Note that each artificial sample was collected based on a new, distinct set of 18 randomly chosen parental pairs. Family size arrays are partially represented in Figure 1.
|
Allocation of random offspring based on <MA>.
Given a specific allocation matrix <MA>, associated with a parental set and a number of loci, each run of allocations was performed according to the following procedure:
- Eighteen parental pairs (6%) were chosen at random from among the 304 possible pairs.
- A family size was ascribed at random to each of the 18 parental pairs according to some family size array <N>.
- Each offspring "bred" by 1 of the 18 chosen pairs, say Pi, was randomly allocated to 1 of the 304 possible pairs with probability PrPi
Pj or PrPi
amp as found in the current allocation probability matrix <MA>.
All calculations involved in this section were performed by running a specifically designed Maple© v9.5 code (MapleSoft 1981–2004). The sampling and allocation of a single offspring within this procedure are schematically depicted in Figure 2.
|
PAPA Estimates
The PAPA estimates of ambiguity and allocation error associated with each of the 4 parental sets (grp22, grp44, grp88, and unrel) and each number of loci (5, 7, and 9) were obtained directly from the corresponding allocation matrices. Note that these estimates involved all possible 304 pairs as would be the case if one ran the PAPA simulator.
Output Variables
Data were collected for exact allocation error rate, ambiguity rate, and discrepancy between exact allocation error rate and estimated error rate from the PAPA simulator. These variables were investigated in relation to genetic contents, as measured by number of loci (5, 7, and 9), inbreeding level (grp22, grp44, grp88, and unrel), and reproductive success variance (Poisson, aqua, and trout).
Focal Set Method: Family Size Estimates
Description
Small proportions of contributors will generally translate into allocations being ascribed to a small proportion of possible parental pairs. This may be turned into an asset by investigating the detailed allocation structure of the set of parental pairs that obtained at least 1 allocation, hereafter referred to as the "focal set." Practically, this means estimating, through simulations, the proportion of Pi offspring that are allocated (correctly) to Pi or (erroneously) to any of the other pairs belonging to the focal set of parents. In other words, the first step in applying the focal set method consists of building an allocation matrix analogous to the previously described <MA>, but restricted to the focal set of parental pairs. We hoped that this method would improve the estimates of the family sizes of parental pairs. Note that the purpose of the focal set method is not to correct the allocations themselves. A complete description of the focal set method is provided in the Appendix.
Example
The following example serves to illustrate the main idea behind the focal set method. Suppose one knew that a bunch of 100 specimens belong to either population X or Y. After some classification procedure, 70 specimens are allocated to X and 30 to Y. Now the problem is that the procedure is not flawless. In fact, a simulation study has shown that 80% of (true) X specimens are allocated to X and 20% are (mis)allocated to Y. The allocation probabilities for true Y are 70% and 30%, respectively. Based on the above data, one can therefore write the following linear equation system:
|
|
The above system can be solved easily by running some standard computer solver as found in Maple© (MapleSoft 1981–2004) or Excel© packages. In the above case, the solution is NX = 80 and NY = 20.
Under most circumstances, the corrected numbers will be closer to the true numbers than the original allocation numbers. However, the method provides no indication as to which allocations are correct or not, and so the allocations per se are not corrected. It is worth noting that the improvement in allocation number estimates are obtained without any additional information bearing on the specimens such as improved criteria or enhanced measurement precision. The extra information bears strictly on the performance of the allocation procedure itself.
The focal set method is a direct application of this same correction technique where parental pairs (the Pi) stand for populations and family sizes for allocation numbers. Because the number of parental pairs can be very high and therefore the size of the equation system very large, the allocation probabilities have to be expressed in matrix form in order to be solved efficiently by computer solvers.
Output Statistic
In order to assess the efficiency of the focal method relative to family size estimation in comparison with estimates obtained directly from allocations, we used the ratio statistic (dni – dNi)/dni x 100%, where
- dni: Euclidean distance between the family size array from allocations and that of the exact family size array corresponding to the 18 parental pairs and
- dNi: Euclidean distance between the family size array estimate after the "focal set correction" and that of the exact family size array.
- dNi: Euclidean distance between the family size array estimate after the "focal set correction" and that of the exact family size array.
The comparisons between the 2 family size array estimates, from the focal set method and from the allocations, were performed for the same parameter combinations as given in Allocations: Error Rates and Ambiguity involving inbreeding level, variance of reproductive success, and number of loci.
Case Study
The Driselpoh stream is a tributary of the Soca and Idrijca River basins. The studied part is closed to immigration from downstream. The experimental stretch was emptied by electrofishing each year from 1996 to 1999 to remove unwanted trout from the system. In September 1999, 215 MT and 215 BT reared in a fish farm were released as 1+ (individuals of at least 1 year of age), hereafter named the parental cohort. As the existing pure MT populations in the Soca and Idrijca river basin have a low level of genetic variation at microsatellite level (Fumagalli et al. 2002), it was necessary to increase the overall variability to facilitate genetic analyses. This was done by creating a parental stock using males from 1 population and females from another (Lipovscek and Trebuscica; for details, see Meldgaard et al. 2007). The system was sampled by electrofishing in September each following year.
Using PAPA, we were initially able to allocate 411 of the 447 offspring individuals unambiguously (91.9%). Fifteen individuals were allocated to more than 1 couple, and 21 could not be allocated. The average correctness level among the 411 offspring as estimated from the built-in simulations was 0.949. The offspring were allocated to 20 families out of 294 possible pairs. Allocated number of offspring (family size) ranged from 2 to 79 per pair.
In order to use the focal set method, an allocation matrix for the 20 contributing parental pairs was constructed. Allocation of 10 000 generated multilocus genotypes to the 20 families showed large variations in correct allocation proportions between pairs (Table 2). Seven pairs received 10 000 self-allocations, indicating that all offspring from the focal couple would be allocated to the correct pair. Seven pairs received between 9500 and 10 000 self-allocations and the remaining 4 pairs between 6146 and 9500 self-allocations.
|
Two of the new estimates of family sizes were nonpositive integers (Table 3). These zero or slightly negative sizes occur when pairs are "filled" mainly by offspring that are erroneously allocated to those pairs. When the number of erroneous allocations to a parental pair exceeds the number of correct allocations, the solutions may involve some negatives. In this case, a second round of allocations may be performed without the "empty pairs" followed by a second application of the focal set with the remaining pairs. Here Bf3 x Mm5 and Bf3 x Mm9 were removed from the focal set of parents, and self-allocation was considered only among the remaining 18 pairs. The results are reported in Table 3 where it may be seen that although most family sizes remained unchanged when compared with the first allocation round, some were substantially modified. These modifications are certainly sufficient to influence the estimates of the variance of reproductive success between pairs and also between individuals within the same sex group.
|
| Results |
|---|
|
|
|---|
As in the Methods, the results will be reported first for matters of allocation ambiguity and correctness rates and then for the focal set method.
Allocations: Error Rates and Ambiguity
Variables Affecting Allocation Error Rates
Parental set inbreeding level had a strong effect on allocation error rates (Figure 3). This was especially true when the genetic information contents were low relative to other genetic information contents found in the present study. For instance, with 5 loci, the error rate ranged from 5% to 45% over the unrel, grp88, grp44, and grp22 inbreeding levels. When 9 loci were used, the error rate ran only from 0% to 7% over the same range of inbreeding levels. The variance of reproductive success as defined by the Poisson, aqua, and trout family size arrays had a negligible effect on allocation error rate (data not shown).
|
Accuracy of Allocation Error Rates Estimated from PAPA Simulations
The average discrepancy between the true allocation error and the estimated error from the PAPA simulator decreased as the estimated error also decreased (Figure 4). In other words, high correctness estimates were, on average, better estimates than lower correctness estimates. Under all simulation conditions, the average discrepancy between true and PAPA estimated error was lower than 4%, 3%, and 2% when the estimates were higher than 85%, 95%, and 99%, respectively.
|
Reproductive variance had a significant, albeit relatively small, impact on average values of the allocation error discrepancy. As expected, the simulator did best with the Poisson success distribution and better with the aqua than with the trout" distribution. Given nearly identical estimated correctness levels, the impact of reproductive success variance over discrepancy decreased with lower parental inbreeding (Figure 5).
|
Relationships between Allocation Error and Ambiguity Rates
Allocation error rates dropped more quickly than ambiguity with decreasing inbreeding levels (Figure 6). This was especially true when going from high (grp44) to moderate (grp88) levels of inbreeding. Below the 10% ambiguity rate, the allocation error and ambiguity rates stood close together. In fact, the difference between the averages of these 2 allocation variables never exceeded 3% under all conditions of reproductive variance, number of loci, or parental inbreeding, provided the ambiguity rate stood below the 10% level.
|
Although ambiguity provided a quick estimate of allocation error that was often doing slightly better or worse than estimates from the PAPA simulator, it was significantly less accurate than simulations within the intermediate inbreeding level range (Table 4). Because the latter condition cannot be easily characterized in specific allocation contexts, it seems safer to rely on the PAPA simulator over the entire range of parental allocation parameters.
|
Focal Set Method: Family Size Estimates
Under all simulation conditions, there was a noticeable average improvement on the family size array as a result of applying the focal set method (Figure 7). The level of improvement increased with sample size. However, based on simulations involving 10 000 and 20 000 offspring samples, there seems to be an upper bound to the improvement over family size estimates (data not shown). This limit depended on the parameters of the parental system in an apparently unpredictable way but generally stood within the 30–60% interval. Other than size of offspring sample, there were no other clear and systematic effects from variance of reproductive success, inbreeding level of the parental set, or number of loci.
|
Summary of Main Results
- The higher the inbreeding level of parental sets, the larger the allocation error rate.
- Ambiguity rates below 10% provide a quick and easy estimate of allocation error rate.
- The focal set method improves family size arrays under all parameter combinations.
- Reproductive success variance has a negligible effect on allocation error rates.
- The higher the correctness estimates from PAPA, the more accurate.
| Discussion |
|---|
|
|
|---|
The study of the impact of parental inbreeding level on allocation error rate (see Figure 3) shows that the more inbred the parental set, the larger the allocation error rate. This effect may be large and becomes even more noticeable as the genetic contents are decreased. One immediate consequence is that simulators based on the generation of artificial, therefore unrelated, parents from allelic frequency distributions are likely to output underestimates of allocation error. These overly optimistic estimates may be provided overtly or covertly in the form of some threshold statistical value associated with a predefined allocation confidence level. Therefore, one should try, whenever feasible, to validate parental allocation based on the actual parental genotypes as in parental PAPA simulations as opposed to genotypes generated from sampled allelic frequencies as, for example, in preparental PAPA simulations. In the absence of collected parental genotypes, correctness estimates from artificial parental genotypes are still useful in providing an upper bound for allocation correctness.
It was also found that the higher the correctness level provided by PAPA, the more accurate. As a rule of thumb, estimated correctness rates
85% or, equivalently, allocation error rates
15% correspond to exact correctness rates reaching at least 80% under most parental allocation conditions. PAPA correctness scores of at least 95% can be trusted to correspond to 90–100% exact correctness with near certainty. Very high correctness estimates (
99%) may be taken practically for granted. Note that these results refer to the parental PAPA simulations wherein the genotypes of the collected parents are used to generate artificial offspring.
Increasing the reproductive variance also increased the allocation error discrepancy between the PAPA estimates and exact values. This was to be expected because parental allocation simulators assume that all potential parents have the same a priori reproductive capability that leads to a Poisson type of reproductive success distribution. Although success variance increased error discrepancy under all parameter combinations, this effect was found to be larger with more inbred parental sets. This makes sense because the presence of both highly fecund and closely related parents will generally raise the allocation error probability. However, when estimated correctness levels exceeded 85%, reproductive success variance had little impact on estimated error discrepancies (Figure 4). Therefore, success variance can in fact be ignored when estimating allocation error, as it is in parental simulators, provided the estimated allocation correctness scores are high.
A somewhat surprising finding was that ambiguity and allocation error rates are similar as long as ambiguity does not exceed 10%. This interesting statistical property provides a back-of-the-envelope way of estimating allocation error rates in the lower range of ambiguity rates. It is also worth noting that contrary to simulation procedures, ambiguity calculations are assumption free.
Ambiguity rates in the middle range do not decrease as quickly as allocation error rates as a response to extra genetic information. In other words, extra loci are more efficient at eliminating false parents than at differentiating similar candidate parents. A possible explanation is that ambiguous cases sometimes involve inbred parents so that extra loci will also reflect this connection. Consequently, allocation error estimates from the simulator are generally more reliable except when ambiguity is quite low (
10%).
On average, the focal set correction method provided more accurate family size estimates than did the original allocations under all investigated parameter combinations. The corrections would sometimes reach as high as 70%. Also they depended on the offspring sample size, larger sizes producing more important improvements. However, there always seemed to be an upper, asymptotic limit to the average level of possible correction under each parental context. Clearly, the accuracy of the focal set allocation matrix is enhanced through enlargement of the offspring sample but only up to a point.
A large proportion of the problems raised within parental analysis studies may be tackled by analyzing family sizes associated with parental pairs without specifying the parent–offspring connections. In fact, all questions focusing on the formation of mating pairs fall into this category. A classic example is sexual selection and the variables that are driving it within various species such as size and major histocompatibility complex allele combinations. Reproductive success variance and number of partners for males and females can obviously be calculated through family sizes without explicitly identifying the offspring. In aquaculture settings, parental allocation improves the efficiency of selective breeding programs in several ways (Duchesne and Bernatchez 2007), for instance, by assessing fertility success (Selvamani et al. 2001) for which family size estimates are sufficient. The relationships between various breeding environment factors, such as diet, drug intake, etc., and fitness are readily assessed by compiling and comparing family size estimates. Clearly, all the above applications of parental analysis and several others could benefit from the improved family size estimates brought about by the focal set method.
Unfortunately, the focal set correction has not yet been implemented in PAPA or, to our knowledge, in any of the currently used parental allocation programs. However, given pertinent research goals, performing this correction "manually" can be worth the extra time and effort.
Limitations of This Study
This study does not take allele scoring errors into account although such errors are commonplace (Bonin et al. 2004). However, we believe that most of our findings can carry over to parental systems where scoring error does not exceed usual rates that stand in the vicinity of 1–3% on an allelic basis. The potentially large impact of inbreeding level on allocation error rate would certainly remain. The results bearing on discrepancies between exact and PAPA-estimated allocation error would still hold with allocations run under the most stringent error model wherein no error is accepted. Under such a model, there would certainly be a loss of allocations due to scoring errors leading to some offspring not being allocated but the remaining offspring would behave essentially as in the present study.
However, we can only speculate about the statistical properties of allocation runs allowing for some degree of error scoring. Broadening the allocation error model certainly increases both the allocation error and the ambiguity rates so that the connection between these 2 random variables when ambiguity stands in the low range may be essentially the same. As for the relationship between PAPA error allocation estimates and associated discrepancies, it would probably differ to some extent from the 1 observed in absence of scoring error. However, we believe that PAPA correctness estimates in the high range (
85%) would generally be associated with low discrepancies (exact vs. estimated allocation error) provided scoring error stays within the usual bracket.
This entire study was run under the assumption that only a small proportion of potential parents have indeed contributed to the offspring sample. Although this prevents the specific results to be generalized sensu stricto to parental allocation with larger proportions of contributors, they certainly are conservative evaluations of the capability of the PAPA simulator to estimate allocation error. Also, there is little doubt that the relationships between ambiguity and allocation error rate would essentially carry over to non-SPC parental systems. In particular, low ambiguity (<10%) should signal similar allocation error rates. Clearly, the use of the focal set method to improve family size estimates is not restricted to low proportions of contributors. However, the size of allocation matrices grows as fast as the square number of contributors with subsequent extra time and effort to build them manually.
| Glossary |
|---|
|
|
|---|
- Allocation error rate: The proportion of incorrect allocations among all allocations (excluding cases of ambiguity). As with correctness rates, it may be exact or estimated. The correctness rate and the allocation error rate are complementary statistics; their sum always equals 1.
- Ambiguity: An ambiguous result associated with a specific offspring is one where the highest likelihood is shared by several parental pairs. "Ambiguous" is one of the possible output from the PAPA allocation and simulation procedures.
- Ambiguity rate: The proportion of ambiguity cases among offspring processed for parental allocation.
- Correct allocation: A correct allocation is one where the pair allocated to the offspring is in fact the true parental pair. When there is no allocated pair, as in the case of ambiguity, the result of the allocation procedure is neither correct nor incorrect. The term "correct" should not be confused with the word "success" as found in some other parental allocation programs, meaning simply that the offspring has been allocated, correctly or not.
- Correctness rate: The proportion of correct allocations among all allocations (excluding cases of ambiguity).
- Discrepancy: The absolute (unsigned) value of the difference between exact allocation error rate and some estimate of the allocation error rate.
- Exact correctness rate: The true proportion of correct allocations as opposed to estimated correctness rates obtained from running a simulator.
- Family size: The number of offspring (among the offspring sample) associated with a specific parental pair. It may be exact or estimated either directly from the allocations or as a result of applying the focal set method.
- Family size array: The array (vector) of family sizes of all candidate parental pairs.
- Ambiguity: An ambiguous result associated with a specific offspring is one where the highest likelihood is shared by several parental pairs. "Ambiguous" is one of the possible output from the PAPA allocation and simulation procedures.
| Funding |
|---|
|
|
|---|
French Biodiversity Institute (contract no. 2001/583/P00202); French Bureau of Genetic Resources (contract SRP-04 B/2003—CV03000029); Sansouire Foundation; the Tolmin Angling Association.
| Appendix: The Focal Set Method |
|---|
|
|
|---|
Building the Focal Set and the Allocation Family Size Array <n>
First, the real offspring is allocated to the putative parents. The family size allocated to each potential parental pair is then computed. All parental pairs with at least 1 allocated offspring are then collected within a focal set and the corresponding allocation family sizes are recorded within an array, say <n1, n2, ..., ni, ..., nL>.
Building the Allocation Matrix
For each of the L pairs of the focal set, say Pi, a large number (1000) of genotypes are generated as offspring from Pi. These offspring are then allocated to the complete, original set of putative parents. The proportion of allocations of the Pi artificial offspring allocated to each of the L focal set pairs is then recorded. Note that allocations to nonfocal pairs are ignored. Those proportions are then divided by their sum so as to add up to unity. Each proportion is now viewed as an estimate of the probability that Pi offspring are allocated to P1, P2, ..., Pi, ..., PL, respectively. These estimates are denoted PrPi
Pj and put together to form the focal allocation matrix <MF>
![]() |
Computing a Corrected Family Size Array <N>
Some allocations of the Pi real offspring may have been wrongly allocated to other pairs of the focal set. The family size of Pi estimated directly from the allocations (ni) is in fact the sum of correct allocations of the Ni true offspring from the Pi pair to itself and of offspring generated by other pairs but wrongly allocated to Pi. Therefore, one may write
|
|
All specific equations, one for each of n1, n2, ..., ni, ..., nL, together form an equation system that may be expressed in matrix form as follows:
![]() |
This can also be written in the more compact format:
|
|
Solving the above system for the <N> array, one obtains estimates of the true numbers of offspring for each pair of the focal set.
Comments on the Focal Set Method
- The focal allocation matrix MF is built from simulating allocations to the whole set (not just the focal set) of potential parents because this better represents the allocation conditions and, therefore, should produce more accurate estimates of the allocation proportions for each parental pair of the focal set.
- The focal allocation matrix MF involves allocating artificial offspring only from parental pairs that belong to the focal set. In theory, if the MF were extended to all potential parental pairs to become a full allocation matrix MA, the family size estimations (N) could be more accurate. However, under the very conditions that we are considering here, where most pairs have zero offspring, complete parental sets will usually be very large compared with corresponding focal sets and the ensuing calculations much more intensive. The only potential drawback associated with the focal set reduction is the possibility that some potential pairs with no allocated offspring have actually contributed offspring all of which wrongly allocated to one or several members of the focal set. Although this is certainly a theoretical possibility, it is an unlikely event. In fact, this would mean that a pair, say Pk, has bred some offspring and that none of which was allocated to itself but to one or several similar pairs that, in turn, contributed none of their own offspring to the Pk allocations. Note that the simulations used to validate the focal set simulation method do not a priori preclude this possibility so that some spillover of offspring generated outside the focal set will automatically be reflected in the results.
- Solving for the corrected family size array <N> involves standard linear algebra procedures. For instance, this can be easily done by first inverting the focal allocation matrix MF and then multiplying the allocation family size array <nT> by the inverted matrix because

- Note that matrix inversion and multiplication are predefined functions within several widespread softwares such as Excel.
- Note that matrix inversion and multiplication are predefined functions within several widespread softwares such as Excel.
- The solution for the family size array <N> will usually involve fractional (noninteger) numbers. Fractional numbers should be rounded up to the closest integer. Negative family size values obtained from the focal set method might occur when pairs are filled mainly by offspring that are erroneously allocated to those "empty" pairs. In this case, the empty pairs should be removed and a second round of allocations followed by a second application of the focal method should be run.
| Acknowledgments |
|---|
We thank Rasmus Nielsen and Dorte Bekkevold for comments on earlier versions of the manuscript. we would like to thank René Guyomard and Lars-Erik Holm for their generous donation of primer aliquots for testing. This is an ISEM-2007-151 publication.
| Footnotes |
|---|
Corresponding Editor: William Modi
Received July 5, 2007
Accepted November 20, 2007
| References |
|---|
|
|
|---|
-
Avise JC. Molecular markers, natural history and evolution (1994) New York: Chapman & Hall.
Bekkevold D, Hansen MM, Loeschcke V. Male reproductive competition in spawning aggregations of cod (Gadus morhua, L.). Mol Ecol (2002) 11:91–102.[CrossRef][Medline]
Bernasconi G, Brostaux Y, Meyer EP, Arnaud L. Do spermathecal morphology and inter-mating interval influence paternity in the polyandrous beetle Tribolium castaneum? Behaviour (2006) 143:643–658.[CrossRef]
Berrebi P, Povz M, Jesensek D, Cattaneo-Berrebi G, Crivelli AJ. The genetic diversity of native, stocked and hybrid populations of marble trout in the Soca river, Slovenia. Heredity (2000) 85:277–287.[CrossRef][Web of Science][Medline]
Blouin MS, Parsons M, Lacaille V, Lotz S. Use of microsatellites loci to classify individuals by relatedness. Mol Ecol (1996) 5:393–401.[CrossRef][Medline]
Blouin-Demers G, Gibbs HL, Weatherhead PJ. Genetic evidence for sexual selection in black ratsnakes, Elaphe obsoleta. Anim Behav (2005) 69:225–234.[CrossRef][Web of Science]
Bonin A, Bellemain E, Bronken EP, Pompanon F, Brochmann C, Taberlet P. How to track and assess genotypic errors in population genetics studies. Mol Ecol (2004) 13(11):3261–3273.[CrossRef][Medline]
Brotherton PNM, Pemberton JM, Komers PE, Malarky G. Genetic and behavioural evidence of monogamy in a mammal, Kirk's dik-dik (Madoqua kirkii). Proc R Soc Lond B Biol Sci (1997) 264:675–681.[Medline]
Clapham PJ, Palsboll PJ. Molecular analysis of paternity shows promiscuous mating in female humpback whales (Megaptera novaengliae, Borowski). Proc R Soc Lond B Biol Sci (1997) 264:95–98.[Medline]
Coltman DW, Bancroft DR, Robertson A, Smith JA, Clutton-Brock TH, Pemberton JM. Male reproductive success in a promiscuous mammal: behavioural estimates compared with genetic paternity. Mol Ecol (1999) 8:1199–1209.[CrossRef][Medline]
Duchesne P, Bernatchez L. Individual-based genotype methods in aquaculture. In: Aquaculture genome technologies—Liu ZJ, ed. (2007) Ames (IA): Blackwell Publishing Professional. 87–108.
Duchesne P, Godbout MH, Bernatchez L. PAPA (package for the analysis of parental allocation): a computer program for simulated and real parental allocation. Mol Ecol Notes (2002) 2:191–193.[CrossRef][Web of Science]
Fessehaye Y, El-Bialy Z, Rezk MA, Croojimans R, Bovenhuis H, Komen H. Mating systems and male reproductive success in Nile tilapia (Oreochromis niloticus) in breeding hapas: a microsatellite analysis. Aquaculture (2006) 256(1–4):148–158.[CrossRef][Web of Science]
Fumagalli L, Snoj A, Jesensek D, Balloux F, Jug T, Duron O, Brossier F, Crivelli AJ, Berrebi P. Extreme genetic differentiation among the remnant populations of marble trout (Salmo marmoratus) in Slovenia. Mol Ecol (2002) 11:2711–2716.[CrossRef][Medline]
Garant D, Dodson JJ, Bernatchez L. A genetic evaluation of mating system and determinants of individual reproductive success in Atlantic salmon (Salmo salar L.). J Hered (2001) 92:137–145.
Garcia-Vazquez E, Moran P, Martinez JL, Perez J, de Gaudemar B, Beall E. Alternative mating strategies in Atlantic salmon and brown trout. J Hered (2001) 92:146–149.
Girman DJ, Mills MGL, Geffen E, Wayne RK. A molecular genetic analysis of social structure, dispersal, and interpack relationships of the African wild dog (Lycaon pictus). Behav Ecol Sociobiol (1997) 40:187–198.[CrossRef][Web of Science]
Goossens B, Graziani L, Waits LP, Farand E, Magnolon S, Coulon J, Bel MC, Taberlet P, Allaine D. Extra-pair paternity in the monogamous Alpine marmot revealed by nuclear DNA microsatellite analysis. Behav Ecol Sociobiol (1998) 43:281–288.[CrossRef][Web of Science]
Hara M, Sekino M. Parentage testing for hatchery-produced abalone Haliotis discus hannai based on microsatellite markers: preliminary evaluation of early growth of selected strains in mixed family farming. Fish Sci (2007) 73(4):831–836.[CrossRef]
Heiberg AC, Leirs H, Siegismund HR. Reproductive success of bromadiolone-resistant rats in absence of anticoagulant pressure. Pest Manage Sci (2006) 62(9):862–871.[CrossRef]
Hughes C. Integrating molecular techniques with field methods in studies of social behavior: a revolution results. Ecology (1998) 79:383–399.[CrossRef][Web of Science]
Jones AG, Ardren WR. Methods of parentage analysis in natural populations. Mol Ecol (2003) 12:2511–2523.[CrossRef][Medline]
Kelly MJ. Lineage loss in Serengeti cheetahs: consequences of high reproductive variance and heritability of fitness on effective population size. Conserv Biol (2001) 15:137–147.[CrossRef]
Li G, Hedgecock D. Genetic heterogeneity, detected by PCR-SSCP, among samples of larval Pacific oysters (Crassostrea gigas) supports the hypothesis of large variance in reproductive success. Can J Fish Aquat Sci (1998) 55:1025–1033.[CrossRef]
Luikart G, England PR. Statistical analysis of microsatellite DNA data. Trends Ecol Evol (1999) 14:253–256.[CrossRef][Medline]
MapleSoft. (1981–2004) Maple 9.50 Copyright©. Waterloo (ON): MapleSoft, a division of Waterloo Maple Inc.
Meldgaard T, Crivelli AJ, Jesensek D, Poizat G, Rubin JF, Berrebi P. Hybridization mechanisms between the endangered marble trout (Salmo marmoratus) and the brown trout (Salmo trutta) as revealed by in-stream experiments. Biol Conserv (2007) 136(4):602–611.[CrossRef]
Milinkovitch MC, Monteyne D, Gibbs JP, Fritts TH, Tapia W, Snell HL, Tiedemann R, Caccone A, Powell JR. Genetic analysis of a successful repatriation programme: giant Galapagos tortoises. Proc R Soc Lond B Biol Sci (2004) 271(1537):341–345.[Medline]
Palti Y, Silverstein JT, Wieman H, Phillips JG, Barrows FT, Parsons JE. Evaluation of family growth response to fishmeal and gluten-based diets in rainbow trout (Oncorhynchus mykiss). Aquaculture (2006) 255(1–4):548–556.[CrossRef][Web of Science]
Pemberton JM, Albon SD, Guinness FE, Cluttonbrock TH, Dover GA. Behavioral estimates of male mating success tested by DNA fingerprinting in a polygynous mammal. Behav Ecol (1992) 3:66–75.
Ribble DO. The monogamous mating system of Peromyscus californicus as revealed by DNA fingerprinting. Behav Ecol Sociobiol (1991) 29:161–166.[CrossRef][Web of Science]
Rice WR. Analyzing tables of statistical tests. Evolution (1989) 43:223–225.[CrossRef][Web of Science]
Richardson DS, Burke T. Extrapair paternity and variance in reproductive success related to breeding density in Bullock's orioles. Anim Behav (2001) 62:519–525.[CrossRef][Web of Science]
Selvamani MJP, Sandie A, Degnan M. Microsatellite genotyping of individual abalone larvae: parentage assignment in aquaculture. Mar Biotech (2001) 3:478–485.[CrossRef]
Wesmajervi MS, Westgaard JI, Delghandi M. Evaluation of a novel pentaplex microsatellite marker system for paternity studies in Atlantic cod (Gadus morhua L.). Aquac Res (2006) 37(12):1195–1201.[CrossRef]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









