Journal of Heredity Advance Access published online on November 21, 2007
Journal of Heredity, doi:10.1093/jhered/esm095
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Two Patterns of Variation among MHC Class I Loci in Tuatara (Sphenodon punctatus)
From the Allan Wilson Centre for Molecular Ecology and Evolution, School of Biological Sciences, Victoria University of Wellington, PO Box 600, Wellington, 6140 New Zealand (Miller, Andrews-Cookson, and Daugherty)
Address correspondence to H. C. Miller at the address above, or e-mail: hilary.miller{at}vuw.ac.nz.
The genes of the major histocompatibility complex (MHC) are a central component of the immune system in vertebrates and have become important markers of functional, fitness-related genetic variation. We have investigated the evolutionary processes that generate diversity at MHC class I genes in a large population of an archaic reptile species, the tuatara (Sphenodon punctatus), found on Stephens Island, Cook Strait, New Zealand. We identified at least 2 highly polymorphic (UA type) loci and one locus (UZ) exhibiting low polymorphism. The UZ locus is characterized by low nucleotide diversity and weak balancing selection and may be either a nonclassical class I gene or a pseudogene. In contrast, the UA-type alleles have high nucleotide diversity and show evidence of balancing selection at putative peptide-binding sites. Twenty-one different UA-type genotypes were identified among 26 individuals, suggesting that the Stephens Island population has high levels of MHC class I variation. UA-type allelic diversity is generated by a mixture of point mutation and gene conversion. As has been found in birds and fish, gene conversion obscures the genealogical relationships among alleles and prevents the assignment of alleles to loci. Our results suggest that the molecular mechanisms that underpin MHC evolution in nonmammals make locus-specific amplification impossible in some species.
As a gene family at the interface between an organism and its environment, major histocompatibility complex (MHC) genes have become the marker of choice for molecular ecologists wishing to measure functional, fitness-related genetic variation (reviewed in Piertney and Oliver 2006). Classical class I and class II MHC genes are central to the vertebrate immune response as they encode cell-surface receptors that present short peptides, usually of bacterial or viral origin, to T cells. They are generally characterized by extremely high levels of sequence polymorphism in the peptide-binding region (PBR) of the gene. This diversity appears to be largely maintained by balancing selection driven by past and present exposure to disease, where specific alleles confer resistance to specific diseases (e.g., Langefors et al. 2001), and/or heterozygotes have higher fitness due to their ability to present a wider array of peptides (e.g., Penn et al. 2002). In some species, reproductive mechanisms such as disassortative mating also appear to contribute to maintaining MHC diversity (Penn 2002).
Analysis of MHC genes in natural populations can provide important information on how MHC variation is generated and maintained, how population demography influences MHC variation, and how, in turn, MHC variation influences population viability. However, measuring variation at MHC genes from nonmodel organisms can be fraught with difficulty due to differences in gene number, organization, and evolution among species (Hess and Edwards 2002; Stet et al. 2003). Generation of diversity within the MHC is characterized by gene duplication and decay, point mutation, and gene conversion (where a fragment of one locus is copied onto another), but the relative contributions of each of these mechanisms appear to vary among species. In mammals, most loci arose from ancient duplications and are maintained independently of each other (Gu and Nei 1999). Interlocus gene conversion appears to be rare among mammalian class II genes (Yeager and Hughes 1999) but has been documented at class I genes in mouse (Yun et al. 1997) and primates (Cardenas et al. 2005). In contrast to mammals, birds do not retain ancient allelic lineages; rather, loci tend to be homogenized within species as a result of either recent gene duplications or interlocus gene conversions (Hess and Edwards 2002). Where interlocus gene conversion occurs over short sequence fragments, it can play a large role in generating diversity but can also obscure the true genealogical relationships among alleles, making it difficult to assign alleles to their individual loci (Miller and Lambert 2004; Westerdahl et al. 2004).
Reptiles are an important group for understanding the differences in MHC evolution among vertebrate groups as they are sister taxa to both mammals and birds. However, the relative importance of point mutation, gene conversion, and balancing selection in generating diversity in the reptile MHC has not previously been investigated. In this study, we investigate mechanisms of evolution in class I MHC genes in an archaic reptile, the tuatara (Sphenodon punctatus). Tuatara are the sole extant representative of the order Sphenodontia, the sister taxa of squamates (snakes and lizards). Sphenodontids were globally widespread until the late Cretaceous (65–80 million years ago) but now only survive on offshore islands of New Zealand. Two species of tuatara are currently recognized: S. punctatus, which comprises 2 genetic groups, one in western Cook Strait and the other off the northeast of the North Island and Sphenodon guntheri, found only on North Brother Island in eastern Cook Strait (Daugherty et al. 1990; Hay et al. 2003).
Both class I and class II genes have been characterized from tuatara (Miller et al. 2005, 2006). The tuatara MHC appears to consist of at least 4 (and possibly as many as 8) closely related class II genes and at least 2–3 class I genes. Several class I alleles have previously been identified in tuatara, at least some of which are from expressed, polymorphic loci, but we were unable to assign alleles to individual loci (Miller et al. 2006). In this study, we characterize intron sequences to identify putative individual MHC class I loci and use sequence type–specific primers to analyze inheritance of alleles and patterns of variation in the largest tuatara population, on Stephens Island in western Cook Strait. We aim to develop a screening method to identify functionally relevant MHC variation across tuatara populations, with the long-term goal of investigating how population size, history, pathogen loads, and mate choice influence MHC variation.
| Materials and Methods |
|---|
|
|
|---|
Samples and DNA Extraction
Blood samples were collected from tuatara on Stephens Island in November 1998, November 2003, November 2004, and May 2005. Family group samples were from animals housed at Victoria University that originated from Stephens Island. Genomic DNA was extracted from 5 to 10 µl whole blood using either a Qiagen DNEasy kit or standard phenol/chloroform methods (Sambrook et al. 1989).
Isolation of MHC Introns
We aimed to sequence introns 1 and 2 from MHC class I loci previously identified (Miller et al. 2006) and to develop primers that flank exon 2 for MHC genotyping. We used a mixture of polymerase chain reaction (PCR) amplification using primers within exons and genome walking off both genomic DNA and bacterial artificial chromosome (BAC) clones containing MHC class I sequences.
The tuatara BAC library (Wang et al. 2006) was screened using a mixture of tuatara MHC class I and II cDNA probes (Miller et al. 2005, 2006). Eighty-four positive clones were identified. To identify which clones contained class I genes, Southern hybridizations were performed as described in Miller et al. (2005) using the MHC1cDNA probe. Membranes were twice washed at 65 °C in 2x standard saline citrate (SSC), 0.1% sodium dodecyl sulfate (SDS) for 30 min each, followed by 1x SSC, 0.1% SDS for 30 min, and then exposed to Kodak Biomax MR film for 2 days. The presence of class I MHC genes on each BAC was confirmed by PCR amplification with primers MHC1ex2F1 and MHC1ex2R1 (Miller et al. 2006), using PCR conditions as described in Miller et al. (2006). PCR products were sequenced using the BigDye Terminator Cycle sequencing kit (version 3.1) and analyzed on an ABI3730 Genetic Analyzer.
Intron sequences were amplified from genomic DNA using primers MHC1ex1F2/MHC1ex2R1 (intron 1) and MHC1ex2F1/MHC1ex3R2 (intron 2, see Figure 1) in 25 µl reaction volumes using the Expand HiFi PCR system with 2.5 mM MgCl2, 200 µM each deoxynucleoside triphosphate (dNTP), 0.4 µM each primer, and 1 µl genomic DNA for 40 cycles of 95 °C, 30 s; 55 °C, 20 s; and 72 °C, 1 min 30 s. Products were excised from agarose gels and purified using the HighPure PCR purification kit (Roche, Auckland, New Zealand) and then cloned into pGEM-T Easy vector (Promega, Madison, Wisconsin). Up to 6 clones per PCR product were sequenced as described above. Where intron sequences could not be obtained using the above conditions, we repeated the PCRs with a polymerase specially formulated for long PCR (BIO-X-ACT Long DNA polymerase; Bioline, London, UK) and tried adding combinations of 1 M Betaine, 5% dimethyl sulfoxide (DMSO), and/or 0.4 mg/ml bovine serum albumin (BSA).
|
In order to identify additional intron sequences, we used the genome walking method of Siebert et al. (1995) off both BAC and genomic DNA. Briefly, 0.5–1 µg of BAC DNA or 2.5 µg of genomic DNA was restriction digested overnight at 37 °C with PvuII, EcoRV, StuI, SmaI, or SnaBI. Restriction digests were purified and ligated to adaptors to create adaptor libraries as described in Siebert et al. (1995) and Cottage et al. (2001). Primary PCR amplifications were performed using primers AP1 (Cottage et al. 2001) and either MHC1ex2R1 (intron 1) or MHC1ex2F1 (intron 2) in 25 µl reactions consisting of 2 µl of adaptor library, 1x BIO-X-ACT Long DNA polymerase buffer, 2 mM MgCl2, 1 mM each dNTP, 0.4 µM each primer, 1 M betaine, 5% DMSO, and 2 units BIO-X-ACT Long DNA polymerase. PCR amplification was performed for 38 cycles consisting of 95 °C, 30 s; 57 °C, 20 s; and 72 °C, 3 min. Nested PCRs were performed on 1:20 to 1:50 dilutions of the primary PCR product, using the primer AP2 (Cottage et al. 2001) paired with MHC1ex2Fnest, MHC1ex2UZF (intron 2), or MHC1ex2Rnest (intron 1). PCR conditions were the same as for the primary PCR, but the annealing temperature was increased to 60 °C, extension time was reduced to 1 min 30 s, and 35 cycles were performed. The resulting PCR products were either purified directly for sequencing using ExoSAP-IT (GE Healthcare, Rydalmere, NSW, Australia) or excised from agarose gels and purified using the HighPure PCR purification kit (Roche) and then sequenced as described above. Where a clean sequence could not be obtained from sequencing PCR products directly, products were cloned into pGEM-T Easy vector, and 2–4 clones per product were sequenced.
Analysis of MHC Variation
We identified 3 sequence types from the alignment of intron 2 sequences (see Results) and used a different set of primers to amplify each sequence type (Figures 1 and 2). Type 1 sequences were amplified with MHC1ex2F1 and MHC1ex2UAR, type 2 sequences with MHC1ex2F1 and MHC1ex2UBR, and type 3 sequences with MHC1ex2UZF and MHC1ex2UZR. PCRs were performed in 25 µl reaction volumes using the Expand HiFi PCR system with 1.5 mM MgCl2, 200 µM each dNTP, 0.4 µM each primer, and 1 µl genomic DNA for 30 cycles of 94 °C, 30 s; 53 °C, 20 s; and 72 °C, 30 s. Type 2 and type 3 amplification products were purified directly with ExoSAP-IT and sequenced as above. For type 3 amplification products, heterozygotes were identified from double peaks on the chromatogram, and a subset of heterozygous individuals were cloned in pGEM-T to confirm the sequence of individual alleles.
|
We could not obtain clean sequence from direct sequencing of type 1 amplification products as these primers appear to amplify alleles from multiple loci. A modified type of single-stranded conformation polymorphism (SSCP) (Schwieger and Tebbe 1998) was therefore used to separate these alleles prior to sequencing. This method overcomes many of the problems of conventional SSCP, such as incomplete denaturation of DNA strands, multiple conformations of DNA strands, and heteroduplex formation (Sunnucks et al. 2000), by removing one strand of the PCR product prior to SSCP analysis. We refer to this method as one-stranded conformation polymorphism (OSCP) to distinguish it from regular SSCP. PCR reactions destined for OSCP analysis used a forward primer that was phosphorylated at the 5' end (MHC1ex2F1-P). PCR products were purified using a HighPure PCR product purification kit and then digested with 5 units lambda exonuclease (New England Biolabs, Ipswich, MA) at 37 °C for 2 h to remove one strand of the PCR product.
OSCP analysis was performed using a 9.5% polyacrylamide gel (37.5:1 acrylamide:bisacrylamide) with 5% glycerol. Electrophoresis was performed at 4 °C in 0.5x tris-borate-EDTA at 3 W for 14.5 h using a Protean II xi Vertical Electrophoresis cell (BioRad, Auckland, New Zealand). Samples (3–5 µl of single-stranded PCR product) were mixed with an equal volume of loading dye (96% formamide, 20 mM ethylenediaminetetraacetic acid), denatured for 5 min at 100 °C, and then cooled on ice for 5 min before loading. Gels were silver stained using the method described in Bassam et al. (1991). Bands were excised from the gel and eluted in tris-EDTA for 4 h at 37 °C and then reamplified using MHC1ex2F1 and MHC1ex2UAR as described above. Products were purified by digestion with ExoSAP-IT and sequenced as described above. In order to assess the repeatability and accuracy of the OSCP method, we repeated the entire OSCP procedure in 12 individuals and cloned PCR products from 11 individuals into the pGEM-T Easy vector (Promega). We analyzed 15–20 clones per individual as described in Miller et al. (2006) and found that in general, the same alleles were identified from cloning as from OSCP.
However, in 3 individuals, OSCP analysis identified alleles that were not found when PCR products were cloned, indicating that OSCP is more sensitive. All sequences included in the analysis were verified by amplification in independent PCRs.
Data Analysis
Sequences were edited using Sequencher 4.5 (GeneCodes Corporation, Ann Arbar, Michigan). All sequences are available in the GenBank database under accession numbers EF546395
[GenBank]
–EF546413
[GenBank]
. Coding-region sequences were translated into amino acids, aligned using ClustalW as implemented in BioEdit 7.0.5.3
[EC]
(Hall 1999), and then back-translated to nucleotide sequences. Intron sequences were aligned where possible using ClustalW. Measures of nucleotide and amino acid diversity were calculated in MEGA3 (Kumar et al. 2004).
The maximum likelihood tree was constructed using PHYML (Guindon and Gascuel 2003) using the general time-reversible +
(with gamma distribution of rates with 4 rate categories) model. Base frequencies, relative rate parameters, and the shape parameter of the
distribution (alpha) were estimated from the data as follows: A = 0.2016, C = 0.3009, G = 0.3659, T = 0.1316; substitution rate matrix A
C = 0.932, A
G = 1.325, A
T = 1.679, C
G = 0.819, C
T = 2.719, and G
T = 1.0; gamma shape parameter (
) = 0.646. Tree topologies were evaluated using 500 bootstrap replicates. Conflicting phylogenetic signals in the dataset were assessed using neighbor-net analysis (Bryant and Moulton 2004) implemented in the program SplitsTree4 (Huson and Bryant 2006). The network was constructed from uncorrected pairwise distances. The prevalence of gene conversion was assessed using GENECONV v. 1.81 (Sawyer 1999). Putative gene conversion events were considered significant when the simulated global P value, based on 10 000 permutation runs, was less than 0.05. Gscale values of 0 and 2 were used, allowing for varying levels of mismatches (i.e., subsequent mutation) within the converted region. A gscale value of 0 means no mismatches are allowed, whereas a value of 2 allows for some mismatches within the converted region.
To test for positive selection over evolutionary timescales, we measured the ratio of nonsynonymous (dN) to synonymous (dS) substitutions. We first measured dN and dS values for putative PBR sites and sites outside the PBR (non-PBR), by the simple counting-based method of Nei and Gojobori (1986), using uncorrected P distances. Putative peptide-binding sites were identified as sites that point into the peptide-binding cleft in the human MHC molecule (Bjorkman et al. 1987). We then conducted a more detailed analysis using random-site models (Nielsen and Yang 1998; Yang et al. 2000, 2005) implemented in the program CODEML (Yang 1997). These models allow for varying substitution rates across sites and avoid the need for a priori knowledge about which sites form the PBR. For each substitution rate category, the dN/dS ratio (
) and the proportion of codon sites that fall into that category are determined from the data. Where
> 1, positive selection is inferred. The models used here were M0 (one
ratio for all sites), M1a (nearly neutral:
either 0 or less than 1), M2a (positive selection: similar to M1a but allows a third category of
> 1), M3 (discrete: 3 site categories with
for each class estimated directly from the data with no constraints), M7 (beta:
ratio varies among sites according to a beta distribution), and M8 (beta and omega: similar to M7 but allows an extra category of
> 1). Models M2a, M3, and M8 allow for positive selection (
> 1) and were compared with their corresponding null models (M1a, M0, and M7, respectively) using a likelihood ratio test. Positively selected sites were inferred using the Bayes empirical Bayes method of Yang et al. (2005), implemented in models M2a and M8.
| Results |
|---|
|
|
|---|
MHC Class I Introns
Southern blot analysis using a MHC class I–specific probe identified 37 BAC clones that hybridized to class I sequences. PCR amplification from these clones using primers designed from MHC class I cDNA sequences identified 7 clones containing MHC class I sequences. Two different sequences were found: one was identical to an allele (U*11) identified in Miller et al. (2006) and the other was a new sequence not identified in the previous paper. Two BAC clones were chosen for intron PCRs and genome walking: 531J19, containing U*11 sequence, and 509B19, containing the new sequence.
Initially, we attempted to isolate intron sequences from BAC and genomic DNA using PCR amplification across the whole intron. We could not amplify any intron 1 sequences using this method, and only one intron 2 product was successfully amplified. This sequence was amplified from genomic DNA and spanned 1652 bp, including part of exons 2 and 3 and an intron of 1219 bp. The exon 2 sequence was homologous to allele U*10, identified in Miller et al. (2006). We isolated additional partial intron sequences using genome walking. Partial intron 1 sequences were obtained for the U*11 allele from BAC clone 531J19 and for the U*10 allele from genomic DNA. Only the first 40 bp of intron 1 sequence immediately flanking exon 2 could be obtained for the U*11 allele due to the presence of homopolymeric tracts of cytosine and guanine residues. There was little similarity between U*10 and U*11 intron 1 sequences, and they could not be aligned (see Supplementary Material for sequences).
Partial intron 2 sequences, ranging in length from 22 to 453 bp, were obtained from BAC clones 531J19 for the U*11 allele and clone 509B19 for the new allele and from genomic DNA for alleles U*01, U*03, U*04, and U*05 identified in Miller et al. (2006) (Figure 2, see Supplementary Material for full sequences). These partial sequences were compared with the full intron 2 sequence obtained from allele U*10. Sequences could be divided into 3 types. We have named sequences from alleles U*01, U*03, U*04, U*05, and U*11 as type 1 as they are highly conserved in the region that immediately flanks exon 2. The remaining 2 sequences (the sequence from BAC clone 509B19 and the U*10 sequence) were divergent from each other and all other sequences and have been named type 2 and type 3, respectively. Homopolymeric tracts of cytosine and guanine residues were also present in intron 2 sequences, particularly in type 1 sequences.
Patterns of MHC Class I Variation
To test whether type 1, type 2, and type 3 sequences represent 3 separate MHC loci, we amplified alleles from each sequence type using type-specific primers (Figure 2) in a family group comprising mother, father, and 8 offspring. Type 1 primers amplified up to 3 alleles per individual, but type 2 primers only produced a product of a single allele in the father and 3 of the offspring. This allele was identical to the sequence from BAC clone 509B19 and was named U*18. Analysis of segregation of these alleles in a family group (Figure 3A and B) suggests that together these sequences represent 2 loci but that the type 2 sequence is an allele of one of these loci and that this locus also contains type 1 alleles. However, it appears that either there are missing alleles or one allele (U*02) is present at both loci. We refer to alleles from these 2 polymorphic loci as UA-type alleles. UA-type alleles are numbered in order of their original isolation and are prefixed with U*. We did not use a second letter as a specific locus designation in naming UA-type alleles due to difficulties assigning alleles to loci as described below. Type 3 primers amplified 1 or 2 sequences in every individual, and these sequences segregate in a Mendelian fashion in a family group (Figure 3C), indicating they represent a separate locus. We refer to this locus as UZ to distinguish it from other loci.
|
To measure levels of variation at these MHC class I loci, exon 2 sequences were amplified from Stephens Island tuatara using the above primer sets. Twenty-six individuals were genotyped for UA-type alleles using both type 1 and type 2 primer sets, and 18 of these individuals were genotyped at UZ.
The UZ alleles were 196 bp long after primer sequences were removed. Six different UZ alleles were identified in 18 individuals. Two of these alleles were identical to alleles U*09 and U*10 identified in Miller et al. (2006), and these were renamed UZ*01 and UZ*02. Although all sequences contain the conserved residues expected in functional class I sequences, there was low nucleotide diversity among alleles (0.023 ± 0.007, Table 1). Amino acid diversity was also low, with 1–4 amino acid differences out of 64 sites, and only one amino acid change was found in the putative peptide-binding sites.
|
Fifteen different UA-type sequences were identified in 26 individuals, with 2–5 alleles per individual. Type 1 primers amplified up to 4 alleles per individual, whereas type 2 primers only produced a product in 9 out of 26 individuals. There were 21 different genotypes among the 26 individuals. Alleles U*01–U*08 were previously identified in Miller et al. (2006); U*14–U*19 are new to this study. UA-type sequences were 224 or 218 bp long after primer sequences were removed. When aligned to MHC class I exon 2 sequences from other vertebrates, 8 of the sequences were shown to contain a 6-bp in-frame insertion at the beginning of exon 2 (Figure 4). This insertion has been found in full-length expressed sequences (Miller et al. 2006), so these sequences appear to represent functional genes. All sequences also contain all the conserved residues expected of functional class I sequences. Most UA-type alleles were highly divergent from each other, with an average of 23.3 amino acid differences out of 72 sites (excluding the indel) between alleles (range 1–38). The mean pairwise nucleotide diversity (
) for all alleles was 0.193 ± 0.015 (Table 1).
|
Maximum likelihood analysis of sequences showed that alleles containing the 6-bp insertion form a monophyletic group (Figure 5). However, these alleles do not appear to represent a single locus as some individuals have 3 of these long alleles, whereas others have 3 short alleles. The 2 alleles amplified with type 2 primers (U*18 and U*19) are also highly divergent. On the basis of the pedigree and phylogenetic analyses, it appears that UA-type alleles cannot be assigned to loci on the basis of their sequence alone. The phylogenetic network (Figure 6) shows numerous conflicting signals in the dataset, which suggests that gene conversion between sequences has occurred. The phi test found statistically significant evidence for recombination (P = 0.00023). A total of 12 significant gene conversion events between UA-type alleles, ranging from 23 to 93 bp in length, were detected using GENECONV v. 1.81 (Table 2). Seven of these events were detected using a gscale value of 0 (no mismatches), whereas an additional 5 events were detected when a small number of mismatches were allowed within the converted region.
|
|
|
To test for positive selection, we first measured the ratio of nonsynonymous to synonymous nucleotide changes for residues predicted to form the PBR, compared with those outside the PBR, for both UA-type and UZ alleles (Table 1). For UA-type alleles, dN was higher than dS for PBR residues (ratio dN/dS = 1.321), but this was not statistically significant (Z test of selection, Z = 1.385, P = 0.085). Outside the PBR, the ratio of dN/dS was 0.803; however, dS was not significantly less than dN (Z = 1.384, P = 0.089). For UZ alleles, dN was not significantly different from dS for PBR or non-PBR residues.
We then used maximum likelihood random-site models to test for positive selection on both UA-type and UZ alleles without partitioning the sequences into PBR or non-PBR residues. For UA-type sequences, models that incorporate positive selection (M2a, M3, and M8) fit the data significantly better than the corresponding simpler models (M1a, M0, and M7) (Table 3). In models M2a, M3, and M8, 3–9% of sites fall into categories where
> 1 (Table 4). Three positively selected sites with posterior probabilities greater than 95% were identified in Bayes empirical Bayes analysis under model M8. These sites are predicted to contact the peptide based on the structure of the human leukocyte antigen (HLA-A) molecule (Bjorkman et al. 1987). For UZ-type sequences, M3 fits the data significantly better than M0, with P < 0.001, indicating there is variable selective pressure among sites. However, the more stringent models of positive selection (M2a and M8) were only significant at P < 0.05 (Table 3). This indicates that the effect of positive selection on UZ alleles is less than that observed for UA-type alleles. The last amino acid of exon 2 in UZ alleles was identified by Bayes empirical Bayes as being positively selected but is not a putative peptide-binding site.
|
|
| Discussion |
|---|
|
|
|---|
In this study, we have characterized partial intron sequences from tuatara MHC class I loci and identified at least 2 highly polymorphic loci plus one locus with low polymorphism, using sequence type–specific primers. We investigated the possibility of distinguishing between MHC class I loci on the basis of intron sequences. Intron sequences may be a better indicator of loci than coding sequences as they are generally less variable and not under selection (Elsner et al. 2002; Reusch and Langefors 2005). We had little success in amplifying across whole introns in tuatara MHC genes despite trying both high-fidelity and long-PCR enzyme formulations and using PCR additives such as betaine, DMSO, and BSA, which can sometimes be successful in amplifying complex sequences. However, we were able to amplify partial sequences using genome walking. These partial sequences revealed a number of homopolymeric tracts of guanine and cytosine residues that may have inhibited PCR amplification across the whole intron. It is also possible that tuatara MHC introns are several kilobase pairs in length and that this contributed to our difficulty amplifying across them. The tuatara genome is about 40% larger than the human genome (Olmo 1981), and it has the highest GC content observed so far for vertebrates (Wang et al. 2006). Thus, both the MHC region and individual MHC genes may be very large in tuatara.
Identifying Polymorphic Loci
Although we identified 3 different sequence types on the basis of exon 2 and partial intron 2 sequences, amplifying alleles from individual MHC loci was problematic. Type 3 sequences appeared to represent a single locus (UZ) with low polymorphism that could be amplified with locus-specific primers, but type 1 and type 2 sequences together appear to represent 2 polymorphic loci (referred to as UA type).
Alleles from the UZ locus were originally identified using nonspecific primers in a previous study (Miller et al. 2006), but they did not amplify in every individual, making them appear as alleles of a polymorphic gene. Using more specific primers, we were able to amplify UZ alleles from every individual, indicating that this locus is present in every individual. However, UZ alleles do not appear to be expressed in peripheral blood mononuclear cells as we have not found any UZ-like alleles from either reverse transcriptase–PCR or cDNA library screening (Miller et al. 2006). This locus appears to be either a nonclassical class I (class Ib) gene or a pseudogene. The class Ib genes are usually less polymorphic than classical class I genes, have tissue-specific expression patterns, and often have functions other than peptide presentation (Geraghty 1993; Gouin et al. 2006). Nucleotide diversity at the UZ locus was an order of magnitude lower than that recorded for UA-type alleles, sites predicted to be involved in peptide binding were nonpolymorphic, and evidence for balancing selection at this locus was weak, in accordance with what would be expected of a nonclassical class Ib gene. However, it is difficult to distinguish between nonclassical and nonfunctional MHC genes as low polymorphism has also been reported at MHC pseudogenes (Hess et al. 2000; Aguilar et al. 2006). The UZ allele sequences do not show any evidence of this being a nonfunctional locus as no stop codons or frameshift mutations are present. However, isolation of the entire length of this gene and further analysis of its expression in other tissues will be required to determine the status of this locus.
Alleles from the 2 putative polymorphic loci could not be easily separated. Although alleles in the family group could be assigned to their putative loci (Figure 3A and B), we could not assign any other alleles to their loci on the basis of sequence similarity alone, probably because of the high levels of recombination between sequences, as suggested by the split decomposition network and GENECONV analysis. Alleles amplified with type 2 primers (U*18 and U*19) were divergent from sequences amplified with type 1 primers, but appear to be segregating as alleles of one of the loci, and were amplified in only 9 out of 26 individuals. Pedigree analysis also suggested that allele U*02 was present at both loci, a situation that could result from gene conversion between loci (e.g., Van Oosterhout et al. 2006). It is also important to note that alleles that share the same exon 2 sequence could differ in their exon 3 sequence and thus still retain a functionally different PBR. Generally 2–4 UA-type sequences per individual were amplified, consistent with 2 loci, but a single individual had 5 UA-type sequences. It is possible that the number of class I genes varies among individuals, as has been found in some mammalian and fish species (Malaga-Trillo et al. 1998; Roos and Walter 2005). However, we cannot rule out the possibility that we have not amplified all alleles present in all individuals. Therefore, our primers provide an estimate of overall class I variation in tuatara rather than a specific measure at a single locus.
Generation of MHC Diversity by Point Mutation, Gene Conversion, and Balancing Selection
Levels of nucleotide diversity among UA-type sequences were similar to that recorded for other classical MHC class I loci from natural populations (e.g., Westerdahl et al. 2004; Consuegra et al. 2005). Diversity appears to be generated by a mixture of point mutation and recombination. Analysis with GENECONV identified 12 statistically significant exchanges of exon 2 sequence fragments (23–93 bp in length) between alleles, suggesting that new alleles have been created by shuffling of sequence motifs. It has been suggested that gene conversion is the principal mode of generating diversity in the MHC (Martinsohn et al. 1999), but we found that point mutation is also important in generating new alleles in the tuatara MHC as several pairs of alleles differ by only 3–4 point mutations (e.g., U*01 and U*03, U*08 and U*17, and U*18 and U*19). At least some gene conversion events appear to represent interlocus conversions based on assignment of alleles to loci in the family group (e.g., U*01/U*08 and U*04/U*08 events). We cannot determine whether the remaining events are interlocus or intralocus, but it is clear that gene conversion obscures the true relationships among UA-type alleles and prevents individual loci from being identified. Interlocus gene conversion is particularly well documented in birds (Hess and Edwards 2002) and fish (Reusch and Langefors 2005); so our finding of interlocus conversion in reptiles is not surprising and suggests this is a general feature of the nonmammalian MHC.
Balancing selection may also decrease the phylogenetic signal from individual loci and, when acting on diverse alleles generated by gene conversion and point mutation, can create a highly polymorphic MHC (Satta 1997). Evidence for balancing selection acting over evolutionary timescales (characterized by a dN to dS ratio [
] greater than one) appears ubiquitous among peptide-binding residues of classical MHC genes and is often used as a hallmark of functional MHC loci (Bernatchez and Landry 2003). Although we found only weak evidence for selection on UZ sequences, we found strong evidence for balancing selection on UA-type alleles as maximum likelihood random-site models that incorporate positive selection fit the data significantly better than neutral or nearly neutral models. The 3 sites identified as being under positive selection correspond to predicted peptide-binding sites, indicating that balancing selection is linked to peptide presentation, and hence may be driven by previous pathogen exposure. Evidence for balancing selection was not found at other predicted peptide-binding sites, and we did not find a significant excess of nonsynonymous changes in predicted PBR sites overall when these sites were analyzed separately, unlike in other MHC studies. This is likely to be due to the fact that our dataset was small and contained highly divergent sequences, which can lower the power of these analyses due to saturation of substitutions (Anisimova et al. 2001; Van Oosterhout et al. 2006). In addition, the PBR sites are predicted from the structure of the human MHC molecule and may not exactly match PBR sites in the tuatara MHC molecule.
The Stephens Island tuatara population is the largest tuatara population (more than 30 000 individuals; Newman 1982), is genetically variable, and is the subject of intensive studies of pathogen loads and mating behavior (Godfrey S, Moore J, unpublished data). Studies of natural populations such as this are important for understanding how environmental and demographic factors interact to drive MHC variation. However, as Piertney and Oliver (2006) rightly point out, such studies generally require data from single MHC loci in order to rigorously test for selection in contemporary populations. Piertney and Oliver (2006) suggest that MHC diversity must be rigorously characterized to determine the number of loci, which loci are expressed, and assign alleles to loci. Our study shows that identifying nonclassical or pseudogenes is particularly important as these can confuse the assessment of levels of MHC diversity. However, our results suggest that assigning alleles to loci may be an intractable problem in many nonmammalian species due to the molecular mechanisms such as gene conversion that underpin MHC evolution.
| Funding |
|---|
|
|
|---|
The Allan Wilson Centre for Molecular Ecology and Evolution; Foundation for Research, Science, and Technology New Zealand.
| Supplementary Material |
|---|
|
|
|---|
Supplementary material can be found at http://www.jhered.oxfordjournals.org/.
| Acknowledgments |
|---|
We thank Nicola Nelson, Jennifer Moore, and Sue Keall for assistance with sample collection, and Scott Edwards for providing the tuatara BAC library. This research was approved by the Victoria University of Wellington Animal Ethics Committee (2003R14 and 2003R16), the New Zealand Department of Conservation (WE/51/FAU and LIZ0410), and the Environmental Risk Management Authority (GMD03106). We acknowledge the support of the Ngati Koata no Rangitoto ki te Tonga Trust.
| Footnotes |
|---|
Corresponding Editor: Stephen J. O'Brien
Received April 19, 2007
Accepted July 31, 2007
| References |
|---|
|
|
|---|
-
Aguilar A, Edwards SV, Smith TB, Wayne RK. Patterns of variation in MHC class II B loci of the Little Greenbul (Andropadus virens) with comments of MHC evolution in birds. J Hered (2006) 97:133–142.
Anisimova M, Bielawski JP, Yang Z. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol. (2001) 18:1585–1592.
Bassam BJ, Caetanoanolles G, Gresshoff PM. Fast and sensitive silver staining of DNA in polyacrylamide gels. Anal Biochem (1991) 196:80–83.[CrossRef][Web of Science][Medline]
Bernatchez L, Landry C. MHC studies in nonmodel vertebrates: what have we learned about natural selection in 15 years? J Evol Biol. (2003) 16:363–377.[CrossRef][Web of Science][Medline]
Bjorkman PJ, Saper MA, Samraoui B, Bennett WS, Strominger JL, Wiley DC. The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens. Nature (1987) 329:512–518.[CrossRef][Medline]
Bryant D, Moulton V. Neighbour-net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol. (2004) 21:255–265.
Cardenas PP, Suarez CF, Martinez P, Patarroyo ME, Patarroyo MA. MHC class I genes in the owl monkey: mosiac organisation, convergence and loci diversity. Immunogenetics (2005) 56:818–832.[CrossRef][Web of Science][Medline]
Consuegra S, Megens HJ, Schaschl H, Leon K, Stet RJM, Jordan WC. Rapid evolution of the MH class I locus results in different allelic compositions in recently diverged populations of Atlantic salmon. Mol Biol Evol. (2005) 22:1095–1106.
Cottage A, Yang A, Maunders H, de Lacy RC, Ramsay NA. Identification of DNA sequences flanking T-DNA insertions by PCR-walking. Plant Mol Biol Rep (2001) 19:321–327.[CrossRef][Web of Science]
Daugherty CH, Cree A, Hay JM, Thompson MB. Neglected taxonomy and continuing extinctions of tuatara (Sphenodon). Nature (1990) 347:177–179.[CrossRef][Web of Science]
Elsner H-A, Rozas J, Blasczyk R. The nature of introns 4–7 largely reflects the lineage specificity of HLA-A alleles. Immunogenetics (2002) 54:447–462.[CrossRef][Web of Science][Medline]
Geraghty DE. Structure of the HLA class-I region and expression of its resident genes. Curr Opin Immunol (1993) 5:3–7.[CrossRef][Web of Science][Medline]
Gouin N, Wright AM, Miska KB, Parra ZE, Samollow PB, Baker ML, Miller RD. Modo-UG, a marsupial nonclassical MHC class I locus. Immunogenetics (2006) 58:396–406.[CrossRef][Web of Science][Medline]
Gu X, Nei M. Locus specificity of polymorphic alleles and evolution by a birth-and-death process in mammalian MHC genes. Mol Biol Evol. (1999) 16:147–156.[Abstract]
Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. (2003) 52:696–704.[CrossRef][Web of Science][Medline]
Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser (1999) 41:95–98.
Hay JM, Daugherty CH, Cree A, Maxson LR. Low genetic divergence obscures phylogeny among populations of Sphenodon, a remnant of an ancient reptile lineage. Mol Phylogenet Evol. (2003) 29:1–19.[CrossRef][Web of Science][Medline]
Hess CM, Edwards SV. The evolution of the major histocompatibility complex in birds. BioScience (2002) 52:423–431.[CrossRef][Web of Science]
Hess CM, Gasper J, Hoekstra HE, Hill CE, Edwards SV. MHC Class II pseudogene and genomic signature of a 32 kb cosmid in the house finch (Carpodacus mexicanus). Genome Res. (2000) 10:613–623.
Huson DH, Bryant D. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol. (2006) 23:254–267.
Kumar S, Tamura K, Nei M. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform (2004) 5:150–163.
Langefors A, Lohm J, Grahn M, Anderson O, von Schantz T. Association between major histocompatibility complex class IIB alleles and resistance to Aeromonas salmonicida in Atlantic salmon. Proc R Soc Lond B Biol Sci. (2001) 268:479–485.[Medline]
Malaga-Trillo E, Zaleska-Rutczynska Z, McAndrew B, Vincek V, Figueroa F, Sultmann H, Klein J. Linkage relationships and haplotype polymorphism among cichlid Mhc class II B loci. Genetics (1998) 149:1527–1537.
Martinsohn JT, Sousa AB, Guethlein LA, Howard JC. The gene conversion hypothesis of MHC evolution: a review. Immunogenetics (1999) 50:168–200.[CrossRef][Web of Science][Medline]
Miller HC, Belov K, Daugherty CH. Characterisation of MHC class II genes from an ancient reptile lineage, Sphenodon (tuatara). Immunogenetics (2005) 57:883–891.[CrossRef][Web of Science][Medline]
Miller HC, Belov K, Daugherty CH. MHC class I genes in the tuatara (Sphenodon spp.): evolution of the MHC in an ancient reptilian order. Mol Biol Evol. (2006) 23:949–956.
Miller HC, Lambert DM. Gene duplication and gene conversion in class II MHC genes of New Zealand robins (Petroicidae). Immunogenetics (2004) 56:178–191.[Web of Science][Medline]
Nei M, Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. (1986) 3:418–426.[Abstract]
Newman DG. Tuatara, Sphenodon punctatus, and burrows, Stephens Island. In: New Zealand herpetology—Newman DG, ed. (1982) Wellington (New Zealand): New Zealand Wildlife Service. 213–223.
Nielsen R, Yang ZH. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics (1998) 148:929–936.
Olmo E. Evolution of genome size and DNA base composition in reptiles. Genetica (1981) 57:39–50.[CrossRef][Web of Science]
Penn DJ. The scent of genetic compatibility: sexual selection and the major histocompatibility complex. Ethology (2002) 108:1–21.[CrossRef][Web of Science]
Penn DJ, Damjanovich K, Potts WK. MHC heterozygosity confers a selective advantage against multiple-strain infections. Proc Natl Acad Sci USA. (2002) 99:11260–11264.
Piertney SB, Oliver MK. The evolutionary ecology of the major histocompatibility complex. Heredity (2006) 96:7–21.[Web of Science][Medline]
Reusch TBH, Langefors A. Inter—and intralocus recombination drive MHC class IIB gene diversification in a teleost, the three-spined stickleback Gasterosteus aculeatus. J Mol Evol. (2005) 61:531–541.[CrossRef][Web of Science][Medline]
Roos C, Walter L. Considerable haplotypic diversity in the RT1-CE class I gene region of the rat major histocompatibility complex. Immunogenetics (2005) 56:773–777.[CrossRef][Web of Science][Medline]
Sambrook J, Fritsch EF, Maniatis T. Molecular cloning: a laboratory manual. (1989) 2nd ed. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press.
Satta Y. Effects of intra-locus recombination on HLA polymorphism. Hereditas (1997) 127:105–112.[CrossRef][Web of Science][Medline]
Sawyer SA. GENECONV: a computer package for the statistical detection of gene conversion [Internet]. St Louis (MO): Distributed by the author, Department of Mathematics, Washington University. (1999) Available from http://www.math.wustl.edu/-sawyer. Accessed 27 Feb 2007.
Schwieger F, Tebbe CC. A new approach to utilize PCR-single-strand-conformation polymorphism for 16S rRNA gene-based microbial community analysis. Appl Environ Microbiol. (1998) 64:4870–4876.
Siebert PD, Chenchik A, Kellogg DW, Lukyanov KA, Lukyanov SA. An improved PCR method for walking in uncloned genomic DNA. Nucleic Acids Res. (1995) 23:1087–1088.
Stet RJM, Kruiswijk CP, Dixon B. Major histocompatibility lineages and immune gene function in fish: the road not taken. Crit Rev Immunol (2003) 23:441–471.[Web of Science][Medline]
Sunnucks P, Wilson ACC, Beheregaray LB, Zenger K, French J, Taylor AC. SSCP is not so difficult: the application and utility of single-stranded conformation polymorphism in evolutionary biology and molecular ecology. Mol Ecol (2000) 9:1699–1710.[CrossRef][Medline]
Van Oosterhout C, Joyce DA, Cummings SM. Evolution of MHC class IIB in the genome of wild and ornamental guppies, Poecilia reticulata. Heredity (2006) 97:111–118.[CrossRef][Web of Science][Medline]
Wang Z, Miyake T, Edwards SV, Amemiya CT. Tuatara (Sphenodon) genomics: BAC library construction, sequence survey and application to the DMRT gene family. J Hered (2006) 97:541–548.
Westerdahl H, Wittzell H, von Schantz T, Bensch S. MHC class I typing in a songbird with numerous loci and high polymorphism using motif-specific PCR and DGGE. Heredity (2004) 92:534–542.[CrossRef][Web of Science][Medline]
Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci (1997) 15:555–556.
Yang Z, Nielsen R, Goldman N, Pedersen AMK. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics (2000) 155:431–449.
Yang Z, Wong WSW, Nielsen R. Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol. (2005) 22:1107–1118.
Yeager M, Hughes AL. Evolution of the mammalian MHC: natural selection, recombination, and convergent evolution. Immunol Rev. (1999) 167:45–58.[CrossRef][Web of Science][Medline]
Yun TJ, Melvold RW, Pease LR. A complex major histocompatibility complex D locus variant generated by an unusual recombination mechanism in mice. Proc Natl Acad Sci USA. (1997) 94:1384–1389.
This article has been cited by other articles:
![]() |
H. C Miller, J. A Moore, N. J Nelson, and C. H Daugherty Influence of major histocompatibility complex genotype on mating success in a free-ranging reptile population Proc R Soc B, May 7, 2009; 276(1662): 1695 - 1704. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






