Are strepsipterans related to flies? Exploring long branch attraction

David R. Maddison

The question

Strepsiptera, sometimes known as twisted-wing parasites, is an enigmatic group of insects. They are parasites of other insects. The males have only one pair of wings and odd raspberry-like eyes and lobed antennae. Females are wingless, and in most species never leave the host. These insects have traditionally been considered related to beetles, although that placement is supported by little evidence. Recently, molecular data have suggested that they may instead be related to true, two-winged flies (Diptera).

For example, parsimony analysis of a small data matrix of 18S ribosomal DNA yields (Carmean and Crespi, 1995) a phylogeny in which Strepisptera is placed as sister group of Diptera (in yellow) rather than Coleoptera (in orange):


This is the strict consensus tree of 27 most-parsimonious trees.

However, examining the relative branch lengths for any one of the trees, it becomes evident that the branches for Diptera and Strepsiptera are unusually long:


This raised the possibility that the association between Strepsiptera and Diptera in these trees was an artifact, caused by long-branch attraction (Felsenstein, 1978; Hendy and Penny, 1989).

Huelsenbeck (1997) set to determine if the relationship seen in this analysis of Strepsiptera with Diptera could be accounted for by long-branch attraction. To do this, he conducted a simulation study. We won't reproduce his whole study here, just one part of his Figure 2.

We will ask only one simple question: If strepsipterans are indeed related to beetles, would our observation that parsimony analysis yields a phylogeny with strepsipterans related to Diptera be unexpected? If we can show that it is unexpected, we can reject the notion that strepsipterans are related to beetles. If, however, inference of strepsipterans as related to Diptera is expected even if they are actually related to beetles, then we cannot so readily reject the traditional view.

A simulation study

To conduct a simulation study, we first need a model tree. We can use a tree inferred using maximum likelihood, which places strepsipterans with beetles:

Model Tree

The branch lengths of this tree were inferred using maximum likelihood. We also need a full model of DNA sequence evolution (which can also be inferred using maximum likelihood).

With this model in hand, we can simulated the evolution of 18S rDNA up the branches of this phylogeny, to yield a simulated matrix. We can then infer the phylogeny for this simulated matrix using parsimony, and see where Strepsiptera falls. Repeating this multiple times will give us an idea about the expected placement of Strepsiptera using parsimony inference presuming the model tree shown above.

Conducting the simulations

To conduct the simulations, in Mesquite open the example file "study002.nex", in the folder Mesquite_Folder/docs/mesquite/studies/study002/. This file contains the 13-taxa data matrix from Carmean and Crespi (1995) as modified by Huelsenbeck (1977), as well as the model tree shown above. The model tree will appear in a window when you open the file:


In addition, a model of character evolution has been entered into this file, with parameters obtained from maximum likelihood inference using PAUP*4 (Swofford, 2003). These were inferred on the model tree using the observed 18S rDNA data.

You can see the model by choosing Characters>Edit Character Model>Huelsenbeck Model:


This model uses the empirical state frequencies as present in the original 18S rDNA matrix, a model of character rate variation called "gammaModel", and a rate matrix model called "HKY85". You can see the nature of gammaModel by touching on the Edit button near its listing:


and the nature of the HKY85 model by touching on its Edit button:


With the model fully established, we can now conduct the simulation study. To do this, we want to ask Mesquite to create many simulated matrices, each evolved according to the model. We also want to create a script ("batch file") that will tell a tree-inference program (we will use PAUP*, but another program such as NONA could be used) to find the most parsimonious trees for each of the matrices, and summarize the results. Mesquite's Batch Architect package contains the tools to automate this process.

Choose Analysis>Batch Architect > Export Matrices & Batch Files... In the first dialog box that appears choose Simulated Matrices on Current Tree, then Evolve DNA Characters, then Huelsenbeck Model. You will be presented with the Export Matrices & Batch Files dialog box, in which you can enter the base name for the matrix files to be created and the number of matrices (100 is a good start).


The batch file template to be used is the one called "Basic PAUP tree search". This template builds a PAUP* command file that will tell PAUP to execute and analyze each matrix in turn, and then, and the end, harvest the results and calculate a majority-rule consensus tree.

After pressing OK, you will be asked for a location to save the 100 matrices. It is recommended that you have an empty folder available into which they can be saved to avoid cluttering up another folder with many files. You will be asked one last question: the number of characters to be evolved in each matrix. As we want the simulation model to be as similar as possible to reality, we will chose to evolve the same number of characters as is present in the observed matrix, which is 770.

Mesquite will now simulate the matrices, and produce a batch file called "paupCommands.nex". This file consists of commands for PAUP*. The start of the file looks something like this:


After some initial setup, PAUP* is to execute the first simulated matrix file ("StrepSim0.nex"), do a heuristic search (which, by default, should be for most-parsmonious trees), and then save the results to a tree file. It then saves the strict consensus tree of the most-parsimonious trees to a different tree file. It repeats this process for the next matrix, StrepSim1.nex, and the next one, StrepSim2.nex, and so on, until it gets to StrepSim99.nex, as requested near the end of the file:


After processing StrepSim99.nex, it reads in the tree file containing the accumulated consensus trees, and calculates both a strict consensus tree of the results from the 100 replicates, as well as the majority rule consensus tree of the replicate's results.

To ask PAUP* to do this analysis, open PAUP* and ask it to execute the file paupCommands.nex.

Interpreting the results

After PAUP* finished executing paupCommands.nex, near the bottom of PAUP*'s main window will be the majority rule consensus tree of the strict consensus trees from each of the 100 replicates. It will look something like this:

The actual numbers may differ, but they should be similar to those shown.
The "73" on the Strepsiptera + Aedes + Drosophila branch indicates that in 73 of the simulated matrices, the most parsimonious trees had Strepsiptera with Diptera, even though the model tree on which the data were simulated had Strepsiptera with beetles. This indicates that the inferred placement of strepsipterans with dipterans with the observed data might be the result of an inference artifact, as inferring strepsipterans with Diptera is expected even if they truly are with beetles.


Simulations were done to see what trees we would expect from a phylogeny inference under a particular model tree. The steps in the study are:

  • The branch lengths of the model tree are inferred using maximum likelihood in PAUP*.
  • Values of parameters of a model of sequence evolution (gamma shape parameter, transition/transversion rate) are inferred using maximum likelihood in PAUP* on the model tree using the observed DNA sequences.
  • The data matrix is opened in Mesquite, along with the model tree. Submodels and a model of character evolution are created within Mesquite to match those inferred.
  • Mesquite's Batch Architect is used to automate the process of simulating the evolution of multiple (100 or more) data matrices under this model, using Mesquite's Genesis package. Batch Architect also builds a command file for PAUP* and an instruction file for Mesquite so that it can interpret the results of the PAUP* analyses.
  • The command file is executed in PAUP*, telling PAUP* to search for the shortest trees for each of the simulated matrices, accumulating the consensus trees for each matrix in a tree file. PAUP* is also instructed to calculate a majority-rule consensus tree of the results of each analysis.
  • The majority-rule consensus tree is examined in PAUP* to see what trees are expected to be inferred under these conditions.


Carmean, D., and B. Crespi. 1995. Do long branches attract flies? Nature, 373:666.
Felsenstein, J. 1978. Cases in which parsimony and compatibility methods will be positively misleading. Systematic Zoology, 27, 401-410.
Hendy, M.D., and Penny, D. 1989. A framework for the quantitative study of evolutionary trees. Systematic Zoology, 38, 297-309.
Huelsenbeck, J.P. 1997. Is the Felsenstein zone a fly trap? Systematic Biology, 46, 69-74.
Swofford, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4 beta 10. Sinauer Associates, Sunderland, Massachusetts.

Citation for this page

Maddison, D.R. 2004. Are strepsipterans related to flies? Exploring long branch attraction. Study 2 in Mesquite: a modular system for evolutionary analysis, version 2.54,