2.1. Taxon sampling
The nucleotide sequence of the complete mt genome was
determined anew from a single individual of three species of
squamate lizards: the Iberian worm lizardBlanus cinereus(Amphisbaenidae; collected by MGP and Iñigo Martínez-Solano in Santa Maria de Alameda-Madrid, Spain; May 2001; voucher MNCN/ADN 21711)
the slow wormAnguis fragilis(Anguidae; collected by RZ in VilarmielLugo, Spain; April 2003; voucher MNCN/ADN 7215) and the Moorish geckoTarentola mauritanica(Geckkonidae; collected by MGP in Las Gaviras-Granada,Spain; March 2003; voucher MNCN/ADN 7216). In addition, 24 complete mt genome sequences of species representing the major lineages of squamates were retrieved from public sequence databases, and included in the phylogenetic analyses (Genbank accession numbers are listed in Supplementary material). The Tuatara (Sphenodonon punctatus; Rhynchocephalia), as well as several
representatives of more distantly related amniotes (Archosauria, Testudines, and Mammalia), and of amphibians were used as
outgroup taxa (Genbank accession numbers of employed outgroups are provided in Supplementary material). Phylogenetic performance of mt sequence data was compared with that of previously published nuclear sequence data (Rag1, c-mos;Townsend et al., 2004).
2.2. DNA purification, amplification, and sequencing
Total genomic DNA was purified from preserved (ethanol 80–96%) small amounts of tissue using standard proteinase K/ SDS digestion, phenol–chloroform extraction, and ethanol precipitation (Sambrook et al., 1989). Standard PCR reactions (Saiki et al., 1988) were conducted in a total volume of 25μl containing 67 mM Tris–HCl, pH 8.3, 1.5 mM MgCl2, 0.4 mM of each dNTP, 2.5 μM of each primer, template mtDNA (10–100 ng), and Taq DNA polymerase (1 unit, Biotools). The following PCR cycling conditions were used: an initial denaturing step at 94 °C for 5 min; 35 cycles of denaturing at 94 °C for 60 s, annealing at 42– 52 °C (Supplementary material) for 60 s, and extending at 72 °C for 90 s; and afinal extending step of 72 °C for 7 min. A suite of 34 primers was used to amplify by PCR contiguous and overlapping fragments that covered the entire mt genome (Supplementary material). PCR products were purified by ethanol precipitation, and sequenced in an automated DNA sequencer (ABI PRISM 3700) using the BigDye
dideoxy Terminator cycle-sequencing kit (Applied Biosystems) following manufacturer’s instructions. Short amplicons were sequenced directly using the corresponding PCR primers. Long amplicons were cloned into pGEM-T vectors (Promega), and recombinant plasmids were sequenced using both M13 (forward and reverse) universal primers, and walking primers (available from the authors on request).
The obtained sequences averaged 700 base pairs (bp) in length, and each sequence overlapped the next contig by about 150 bp. In no case were differences in sequence observed between the overlapping regions.
The complete mt genome nucleotide sequences reported in this
paper have been deposited in the GenBank (EU443255-57).
2.3. Molecular and phylogenetic analyses
The deduced amino acid sequence of each of the 13 mt proteincoding genes were aligned separately using CLUSTAL X version 1.83 (Thompson et al., 1997) with default parameters, and revised by eye in order to maximize homology of position. Ambiguous alignments and gaps were excluded from the analyses using GBLOCKS version 0.91b (Castresana, 2000) with default parameters. The resulting 13 mt amino acid alignments were concatenated into a single data set (henceforth referred to as the allmt data set; 3161 positions).
Sequence alignments are available from the authors upon request.
Phylogenetic relationships among squamates were inferred using maximum likelihood (ML;Felsenstein, 1981) and Bayesian inference (BI,Huelsenbeck et al., 2001). Best-fit models of sequence evolution were selected both for the ML analysis and the different BI partitions using PROTTEST version 1.2.6 (Abascal et al., 2005) under the Akaike Information Criterion (AIC, Akaike, 1973). ML analyses were performed with PHYML version 2.4.3 (Guindon and Gascuel, 2003) using a BioNJ as input tree. BI analyses were conducted using MrBayes version 3.1.2 (Huelsenbeck and Ronquist, 2001; Ronquist and
Huelsenbeck, 2003) based on four different partitions (ATP: for the ATP synthase F0 subunits, COX: for the cytochromecoxidase subunits, Cytb: for the cytochromeb, NADH: for the NADH dehydrogenase
subunits). The rationale behind using these partitions was that subunits of the same mitochondrial enzyme complex are likely
subjected to similar evolutionary forces, and therefore they can be grouped together for model parameter estimation. Four Markov
chains Monte Carlo (MCMC) were run for one million generations, sampling every 100 generations, and discarding generations before MCMC reached stationarity (100,000) as“burn-in”. Two independent BI runs were performed to control for an adequate mixing of the MCMC. Robustness of the resulting ML and BI trees was evaluated by non-parametric bootstrapping with 1000 pseudoreplicates, and
Bayesian posterior probabilities, respectively.
In order to further evaluate the effect of taxon sampling on the topology and robustness of squamate phylogeny, four additional data sets were analyzed. The combined mt protein-coding data set was evaluated at the amino acid level using either one single or two (one basal and one derived) representative species per major lineage (henceforth referred to as 1mt and 2mt data sets, respectively). A nuclear data set combining nucleotide sequences of the c-mos
(311 bp) and RAG-1 (2499 bp) genes (Townsend et al., 2004)was analyzed using either one single or two (one basal and one derived) representative species per major lineage (henceforth referred to as the 1nuc and 2nuc data sets, respectively). The recovered phylogenetic 14 E.M. Albert et al. / Gene 441 (2009) 12-21
trees based on these four data sets were compared with those based on a full taxon sampling coverage, and either the combined mt protein-coding (this paper) or nuclear (Townsend et al., 2004) gene data sets.
In order to compare our results based on the combined mt proteincoding gene amino acid data set and those ofKumazawa (2007), the effect of using a more distantly related outgroup was explored using a combined mt protein coding data set, which included the amphibian X. laevis(henceforth referred to as the mtXenopusdataset. Furthermore, the effect of incorporating lineages with different evolutionary rates into phylogenetic analyses was evaluated by excluding from the combined mt protein-coding gene amino acid data set either all representatives of the two major lineages (snakes and acrodont lizards) that exhibited long branches, or only excluding representatives of one of them at a time. These datasets were henceforth referred as to the noSA, noS, and noA datasets, respectively.
Finally, seven alternative tree topologies (taken from the literature) were evaluated by the non-parametric tests Approximately Unbiased (AU;Shimodaira, 2002), Shimodaira–Hasegawa (SH;Shimodaira and Hasegawa, 1999), and Kishino–Hasegawa (KH;Kishino and Hasegawa, 1989). All tests were carried out on the allmt dataset using Consel version 0.1i (Shimodaira and Hasegawa, 2001) with sitewise log-likelihoods calculated by PAML version 3.14 (Yang, 1997). A total of one million multiscale bootstrap replicates were used in order to reduce sampling error.
2.4. Dating of divergence times
The main cladogenetic events of the squamate phylogeny were
dated using relaxed molecular clock approaches (Welch and Bromham, 2005). A Bayesian estimation of divergence times was performed using Multidivtime (Thorne et al., 1998; Kishino et al., 2001; Thorne and Kishino, 2002). We used the best topology that was inferred from the combined mt protein-coding gene BI analyses (Fig. 2) as the starting phylogeny. Branch lengths of the inferred topology, and divergence times were estimated using PAML and the programs
Estbranches and Multidivtime (available athttp://statgen.ncsu.edu/ thorne/ multidivtime.html). We performed the analysis under a mtREV (Adachi and Hasegawa, 1996)+Γmodel, constructed following the instructions by Yoshinori Kumazawa at the multidivtime website.
The MCMC was run for ten million generations, with sampling intervals of 100 generations, and a burn-in period corresponding to thefirst million generations. The prior assumption for the mean of the time of the ingroup root node (root to tip mean; rttm) was set to 3.123 time units with a standard deviation of 0.20, where 1 time unit in this analysis represents 100 million years (Myr). This value was obtained based on the recommendation byBenton and Donoghue (2007)of the estimated split of synapsids and sauropsids 312.3 million years ago (Mya). We calibrated our time estimates using 12 internal time constraints on nine internal nodes based on fossil evidence: (1) split between Archosauromorpha and Lepidosauromorpha between 299.8 and 259.7 Mya (Benton and Donoghue, 2007), (2) split between Crurotarsi and Ornithodira between 250.4 and 235 Mya (Benton and Donoghue, 2007), (3) split between Palaeognathae and Neognathae between 86.5 and 66 Mya (Benton and Donoghue, 2007), (4) Minimum age for the Pleurodira–Cryptodira split at 210 Mya (fossil record of Proterochersis,(Gaffney and Meylan, 1988), (5) Minimum age for the Rhynchocephalia–Squamata split at 227 Mya (Sues and Olsen, 1990), (6) Minimum age for the origin of Anguimorpha at 160 Mya (Evans, 2003), (7) Minimum age for theCordylus–Eumecessplit at 65.5 Mya (Krause et al., 2003), (8) Minimum age for the split of Rhineuridae at 60.5 Mya (Sullivan, 1985), and (9) Minimum age for the origin of Colubridae at 33.7 Mya (Rage et al., 1992).
Divergence times were also estimated using a penalized likelihood (PL) approach (Sanderson, 2002)withtheprogramr8s version 1.70 (Sanderson, 2003). We used the best ML topology with branch lengths optimized with PHYML, and the same calibration constraints employed for the Bayesian dating analysis. The truncated Newton (TN) algorithm and the additive penalty function were used for the PL analyses. In order tofind the optimal smoothing parameter (λ) for PL, cross-validation was performed over a range of values ofλranging from 10
in 15 steps. Confidence intervals
were estimated by calculating an age distribution based on chronograms generated from 100 bootstrapped datasets. These datasets were generated with the SEQBOOT module of PHYLIP 3.6 (Felsenstein, 1989), and optimized under a mtREV+Γ+I model of sequence evolution in PHYML.