Ecdysozoan Mitogenomics: Evidence for a Common Origin
of the Legged Invertebrates, the Panarthropoda
Omar Rota-Stabelli*,1,2, Ehsan Kayal3, Dianne Gleeson4, Jennifer Daub5, Jeffrey L. Boore6,
Maximilian J. Telford1, Davide Pisani2, Mark Blaxter5, and Dennis V. Lavrov*,3
1Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
2Department of Biology, National University of Ireland, Maynooth, Maynooth, Co. Kildare, Ireland
3Department of Ecology, Evolution and Organismal Biology, Iowa State University
4EcoGene, Landcare Research New Zealand Ltd., St Johns, Auckland, New Zealand
5Institute of Evolutionary Biology, The University of Edinburgh, Ashworth Laboratories, Edinburgh, United Kingdom
6Genome Project Solutions, Hercules, California
*Corresponding author: E-mail: omar.rota-stabelli@nuim.ie, omar42@gmail.com; dlavrov@iastate.edu.
Accepted: 26 May 2010
Abstract
Ecdysozoa is the recently recognized clade of molting animals that comprises the vast majority of extant animal species and
the most important invertebrate model organisms—the fruit fly and the nematode worm. Evolutionary relationships within
the ecdysozoans remain, however, unresolved, impairing the correct interpretation of comparative genomic studies. In
particular, the affinities of the three Panarthropoda phyla (Arthropoda, Onychophora, and Tardigrada) and the position of
Myriapoda within Arthropoda (Mandibulata vs. Myriochelata hypothesis) are among the most contentious issues in animal
phylogenetics.
To elucidate these relationships, we have determined and analyzed complete or nearly complete mitochondrial genome
sequences of two Tardigrada, Hypsibius dujardini and Thulinia sp. (the first genomes to date for this phylum); one Priapulida,
Halicryptus spinulosus; and two Onychophora, Peripatoides sp. and Epiperipatus biolleyi; and a partial mitochondrial genome
sequence of the Onychophora Euperipatoides kanagrensis. Tardigrada mitochondrial genomes resemble those of the
arthropods in term of the gene order and strand asymmetry, whereas Onychophora genomes are characterized by numerous
gene order rearrangements and strand asymmetry variations. In addition, Onychophora genomes are extremely enriched in
A and T nucleotides, whereas Priapulida and Tardigrada are more balanced.
Phylogenetic analyses based on concatenated amino acid coding sequences support a monophyletic origin of the Ecdysozoa
and the position of Priapulida as the sister group of a monophyletic Panarthropoda (Tardigrada plus Onychophora plus
Arthropoda). The position of Tardigrada is more problematic, most likely because of long branch attraction (LBA). However,
experiments designed to reduce LBA suggest that the most likely placement of Tardigrada is as a sister group of
Onychophora. The same analyses also recover monophyly of traditionally recognized arthropod lineages such as Arachnida
and of the highly debated clade Mandibulata.
Key words: Arthropoda, mitogenomics, long branch attraction, Tardigrada, phylogeny, strand asymmetry.
Introduction
In spite of an ongoing debate concerning their utility in phy-
logenetics (Curole and Kocher 1999, Delsuc et al. 2003,
Cameron et al. 2004), mitogenomic studies have proven
to be informative and insightful for phylogenetic studies
(e.g., Boore et al. 1998, Lavrov and Lang 2005). This can
be explained by conceptual advantages such as the con-
served gene set, (almost) unambiguous orthology of genes,
and presence of rare genomic changes, including gene re-
arrangement and changes in the genetic code, as well as
some historical and methodological advantages, such as
the availability of primers for amplifying specific genes from
many lineages and the relative ease of generating new data
(Fendt et al. 2009, Boore et al. 2005). On the other hand,
ª The Author(s) 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/
2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Genome Biol. Evol. 2:425–440. doi:10.1093/gbe/evq030 Advance Access publication May 30, 2010 425
GBE
phylogenies based on mitochondrial sequences are well
known to be affected by a variety of reconstruction artifacts,
which may be responsible for dilution of the true phyloge-
netic signal and generation of homoplasies.
One of the main problems of mitogenomics is thought to
be the lineage-specific compositional heterogeneity, which
also influences the amino acid content of the encoded pro-
teins (Foster et al. 1997; Gibson et al. 2005). For example,
some ecdysozoan lineages, such as some Arthropoda and
Nematoda, have mitochondrial genomes enriched for
A þ T nucleotides, and in the absence of strong purifying
selection, the corresponding proteins are enriched for amino
acids encoded by A þ T rich codons (Foster et al. 1997;
Saccone et al. 2002). Another type of compositional hetero-
geneity, measured by GC and AT skews between the two
strands of DNA (Perna and Kocher 1995), reflects a direc-
tional mutational bias driven by the asymmetric nature of
replication of the mitochondrial genome and results in op-
posite compositional biases in genes with opposite tran-
scriptional polarities (Saccone et al. 1999; Lavrov et al.
2000, Hassanin et al. 2005; Hassanin 2006, Jones et al.
2007). Gene inversions that cause a gene to change its ori-
entation relative to the replication origin will result in a rapid
compositional change as the sequence evolves from its an-
cestral nucleotide skews toward the ones driven by its new
position (Helfenbein et al. 2001). It has been shown that
heterogeneities in both A þ T proportion and AT and GC
skew can cause erroneous results in phylogenetic inference
(Gibson et al. 2005, Jones et al. 2007, Masta, et al. 2009).
Compositional heterogeneity is only one of the factors
affecting mitochondrial phylogenies. Accelerated substitu-
tion rates may also play a role in masking and eroding phy-
logenetic signal through unrecognized homoplasy and lead
to increased susceptibility to systematic biases, such as long
branch attraction (LBA; Felsenstein 1978, Brinkmann et al.
2005). Because of the variety of possibly confounding biases
that could affect mitochondrial genomes concurrently,
strong outgroup effects should be expected and have been
observed (Cameron et al. 2004, Rota-Stabelli and Telford
2008), with different outgroups suggesting alternative
equally well-supported rooting positions for ingroup taxa.
One approach to deal with these problems is to improve
the standard general time reversible (GTR) models of mito-
chondrial sequence evolution both at the nucleotide
(Hassanin et al. 2005) and at the amino acid level (Abascal
et al. 2007, Rota-Stabelli et al. 2009). More sophisticated
evolutionary models such as the heterogenous CAT model,
which accounts for among-site heterogeneity (Lartillot and
Philippe 2004), and the derived CAT-BP model, which also
accounts for lineage-specific compositional heterogeneities
(Blanquart and Lartillot 2008), can lessen the effects of com-
positional biases. Another approach to reduce bias is to
increase taxonomic sampling. Sampling more taxa, particu-
larly close to weakly supported nodes, can break long
branches, allowing for a better elucidation of homoplastic
similarities resulting from multiple substitutions and thus
reducing the likelihood of the LBA artifact. Finally,
site-stripping approaches (Brinkmann and Philippe 1999;
Pisani 2004; Sperling et al. 2009) can be used to eliminate
rapidly evolving sites and limit LBA artifacts because the
most rapidly evolving sites are expected to be the most
heterogenous in composition.
Panarthropoda
Our knowledge of metazoan evolution has changed dra-
matically since the seminal work of Aguinaldo et al.
(1997), which first formally proposed the Ecdysozoa, a group
of molting organisms that includes Arthropoda (e.g., in-
sects), Tardigrada (water bears), Onychophora (velvet
worms), Nematoda (round worms), Nematomorpha (horse-
hair worms), Priapulida (penis worms), Kinorhyncha (mud
dragons), and Loricifera. Although a monophyletic origin
of the Ecdysozoa is now widely accepted (reviewed in
Telford et al. 2008), the relationships among the eight extant
ecdysozoan phyla, in particular the position of Tardigrada,
are still vigorously debated. Although there is a strong sup-
port from morphological and developmental gene expres-
sion data for the monophyly of Panarthropoda, a group
characterized by segmental, paired, locomotory appen-
dages, comprising Arthropoda, Onychophora, and Tardigra-
da (Nielsen 2001, Telford et al. 2008), these data are
ambiguous about the placement of the Tardigrada and Ony-
chophora within Panarthropoda (Peterson and Eernisse
2001, Nielsen 2001, Mayer and Whitington 2009b). Fur-
thermore, monophyly of Panarthropoda has found little mo-
lecular support. Although arthropod affinity of the
Onychophora is strongly supported by expressed sequence
tag (EST)–derived phylogenomic data sets (Dunn et al. 2008,
Roeding et al. 2009; Hejnol et al. 2009), mitochondrial data
from the complete mitochondrial genomes of one Ony-
chophoran placed it as the sister group of Arthropoda plus
Priapulida (Podsiadlowski et al. 2008). The position of the
Tardigrada is equally unclear. Ribosomal RNA sequences
support a group of Tardigrada plus Onychophora as a sister
lineage to the Arthropoda (Mallatt and Giribet 2006),
whereas EST data challenged the Panarthropoda hypothe-
sis, grouping Tardigrada with Nematoda (Lartillot and
Philippe 2008, Roeding et al. 2009, Hejnol et al. 2009).
In these analyses, Tardigrada and Nematoda are character-
ized by long branches, suggesting that this clade could rep-
resent a phylogenetic artifact. This possibility is reinforced by
the analyses of Dunn et al. (2008), also based on EST data,
which recover a monophyletic Panarthropoda, suggesting
that the placement of Tardigrada may be model dependent.
Arthropoda
The monophyly of Crustacea plus Hexapoda (Tetraconata or
Pancrustacea) within Arthropoda is now well accepted
Rota-Stabelli et al. GBE
426 Genome Biol. Evol. 2:425–440. doi:10.1093/gbe/evq030 Advance Access publication May 30, 2010
(Friedrich and Tautz 1995; Boore et al. 1998, Dohle 2001,
Pisani 2009). However, the phylogenetic relationships
among Pancrustacea, Myriapoda (millipedes, centipedes,
symphylans), and Chelicerata (arachnids, ticks, and their al-
lies) are hotly debated. Many morphological and paleonto-
logical studies group Myriapoda with Hexapoda and
Crustacea in a clade called Mandibulata (Scholz et al.
1998, Harzsch et al. 2005, but see Mayer and Whitington
2009a). By contrast, different types of molecular data (mi-
tochondrial, ribosomal, nuclear protein–coding genes, and
EST data sets) have supported a sister group relationship be-
tween Myriapoda and Chelicerata, which were placed to-
gether in Myriochelata or Paradoxopoda (Friedrich and
Tautz 1995; Pisani et al. 2004, Podsiadlowski et al. 2008,
Mallatt and Giribet 2006, Dunn et al. 2008, Hejnol et al.
2009, Roeding et al. 2009). Conversely, two recent analyses
based on mixed markers and 62 nuclear genes found, con-
vincing support for Mandibulata (Bourlat et al. 2008, Regier
et al. 2010). Support for either Mandibulata or Myriochelata
may depend on the outgroup used (Rota-Stabelli and Telford
2008), exclusion of sites (Pisani 2004), and/or method of
phylogenetic inference (Regier et al. 2008), suggesting that
signal is weak and that some phylogenetic conclusions may
be prone to systematic errors.
Synopsis
To further clarify relationships within Ecdysozoa, shed light
on the evolution of their mitochondrial genomes, and fill the
gap in taxonomic representation that currently exists, we
have sequenced and analyzed the complete mitochondrial
genomes of five species: two tardigrades, Hypsibius dujar-
dini and Thulinia sp. (the first tardigrade mitochondrial
genomes to be sequenced); one priapulid, Halicryptus
spinulosus; and two onychophorans, Peripatoides sp. and
Epiperipatus biolleyi. We also determined a partial mito-
chondrial genome sequence from a third onychophoran
Euperipatoides kanagrensis. Here, we briefly describe com-
positional and genomic characteristics of the mitochondrial
genomes of Ecdysozoa, particularly focusing on these
newly studied species. We show that Tardigrada species
have an ‘‘arthropod-like’’ mitochondrial genome and that
the two priapulids share the same inverted fragment with
an unexpected difference in GC skew. Finally, Onychophora
mitochondrial genomes are highly divergent and contain
several gene rearrangements.
We carried out phylogenetic analyses to understand the
relationships within Panarthropoda. Results strongly support
a sister relationship between the Onychophora and the Ar-
thropoda, whereas the position of Tardigrada is more prob-
lematic. However, analyses performed using the CAT model
(which is more robust to LBA) support the grouping of Ony-
chophora and Tardigrada within a monophyletic Panarthro-
poda. This hypothesis is reinforced by sequential removal
of rapidly evolving lineages and by the exploration of phy-
logenetic signal using partitions of sites with different evo-
lutionary rates. We also revisit the relationships within
Arthropoda, providing insight into the relationships among
the major arthropod subphyla (Chelicerata, Myriapoda, and
Crustacea and Hexapoda), as well as the relationships within
Chelicerata.
Materials and Methods
Genome Sequencing and Annotation
The complete mitochondrial genomes of the onychophoran
E. biolleyi and Peripatoides sp., the tardigrades Thulinia sp.
and H. dujardini, and the priapulid Halicryptus spinulosus
were amplified and sequenced as described in Lavrov et al.
(2000). Partial sequences encompassing five coding genes
were identified in E. kanagrensis using EST data. Open read-
ing frames in the newly sequenced genomes were annotated
based on comparisons with protein sequences from closely
related species. In addition, the mitochondrial genome from
Metaperipatus inae (Podsiadlowski et al., unpublished data;
GenBank accession EF624055) was re-annotated based on
the two other onychophoran mitochondrial genomes. tRNA
genes were inferred using the tRNAscan-SE and ARWEN
programs (Lowe and Eddy 1997; Laslett and Canba¨ck
2008) and checked manually. tRNA genes not found by
the computer programs were identified based on expected
anticodon sequences, conserved positions, potential second-
ary structures, and similarities with sequences from closely
related species. If several potential tRNA gene sequences
were found, we preferred the one with a more conserved
gene order position. The E. biolleyi mitochondrial genome
has been previously independently analyzed by Podsiadlowski
et al. (2008). Comparison with the genome sequenced by us
revealed a 97.9% identity at nucleotide level and 98.8%
identity at amino acid level. However, our analysis resulted
in a very different annotation of this genome, especially in
respect to tRNA genes.
Compositional Analyses
For each species of our data set, we concatenated the 13 mt
protein-coding genes to calculate 1) the overall nucleotide G
þ C% using all three codon positions and 2) the frequency
of amino acids encoded by GC-rich codons (G þ A þ R þ
P%). In addition, we calculated the amino acid composition
(again as G þ A þ R þ P%) expected from the nucleotide
composition if the nucleotide frequencies at all three codon
positions are the same (F1� 4 codon model). We have gen-
erated 10,000 random nucleotides using the same nucleo-
tide composition of the real data and translated them into
amino acids using the appropriate genetic code. We plotted
the corresponding values in figure 1 (white squares and dot-
ted line).
We also calculated the GC skew (Perna and Kocher 1995)
and the skew index (Rota-Stabelli and Telford 2008) on the
Ecdysozoan Mitogenomics GBE
Genome Biol. Evol. 2:425–440. doi:10.1093/gbe/evq030 Advance Access publication May 30, 2010 427
concatenated alignment, and for each gene independently,
using all three codon positions. To test if the strand asym-
metry of genes was at equilibrium, we also calculated GC
skew for the 1st þ 2nd and 3rd codon position separately.
GC skew values were plotted for species of interest using
the inferred arthropod ancestral gene order (AAGO, which
is the same as Limulus polyphemus and the ancestral ecdy-
sozoan; Lavrov et al. 2000) as a reference to order genes on
the abscissa of the plots in figures 2 and 3. The skew index
was calculated as an absolute value, and not referenced to
a focal species, as in Rota-Stabelli and Telford (2008). We
summarize some of these statistics in supplementary table
1 (Supplementary Material online).
Mitochondrial Gene and Protein Alignments
We downloaded nucleotide sequences of the 13 mitochon-
drial protein-coding genes for 240 metazoan species
from the NCBI (http://www.ncbi.nlm.nih.gov) and the OGRe
database (http://drake.physics.mcmaster.ca/ogre/compare
.shtml). To this initial data set, we added complete sequen-
ces for the six species determined for this study, thus gen-
erating a data set of 245 taxa (246 minus the onychophoran
E. biolleyi previously published by Podsiadlowski et al. 2008).
We translated nucleotide sequences into amino acids, using
appropriate genetic codes, and aligned the 13 protein
sets individually with ClustalW (http://www.ebi.ac.uk/Tools
/clustalw2/index.html). We back aligned nucleotide sequen-
ces to the amino acid alignment using TranslatorX (down-
loadable from http://translatorx.co.uk/) and assembled
a concatenated alignment of the nucleotide sequences of
the 13 genes.
To avoid artifacts due to inadequate outgroup and in-
group selection, we followed the decision making strategy
of Rota-Stabelli and Telford (2008), generating a table (data
not shown, but available upon request) containing statistics
for each of the 245 ingroup and outgroup species sampled.
We thus sampled a set of Ecdysozoa and outgroups aiming
to minimize root-to-tip distances and compositional hetero-
geneity across the data set. The key estimates used to iden-
tify optimal outgroups included, for each possible outgroup
1) the maximum likelihood (ML) distance to the Ecdysozoa,
2) the G þ C content, 3) the content of G þ C-rich codon–
encoded amino acids, and 4) the two indicators of GC
strand asymmetry: GC skew and the skew index, an indica-
tor of gene overall strand asymmetry. For each possible out-
group, an ML distance metric was calculated as the average
ML distance to Priapulus caudatus, L. polyphemus, and
Tribolium castaneum, and the skew index was calculated
with reference to the average across all Arthropoda with
similar skew profile.
From the 245 taxa alignment, we selected a balanced
sample of 66 species (listed in supplementary table 1, Sup-
plementary Material online), of which ten were outgroup
taxa. The nucleotide and the amino acid alignments were
FIG. 1.—Compositional properties of ecdysozoan mitochondrial coding sequences. The G þ C content of 1st and 2nd codon positions in the
concatenated alignment is plotted against the percentage of amino acids encoded by G- and C-rich codons (glycine, alanine, arginine, and proline
[G þ A þ R þ P]). Values are averaged for some major groups, with SDs indicated. All Ecdysozoa are A þ T rich compared with outgroup sequences.
Onychophora are extremely A þ T rich, whereas Priapulida and Tardigrada have more balanced nucleotide compositions. Amino acid frequencies