Species tree inference

I. Phylogenetic inference of chickadees: gene trees versus species trees and the influence of sampling design.

To estimate accurate species trees, one must account for the various biological factors and evolutionary processes that often cause gene tree incongruence. I am interested in investigating how such processes influence the outcome of species tree inference. Furthermore, I aim to use empirical studies to develop strategic sampling design that may improve phylogenetic accuracy for certain taxonomic groups.

Abstract: Determining the adequate number of loci and choosing markers with desirable attributes to use for species tree inference are practical dilemmas in systematic biology. Simulation studies have shown that our estimates will steadily improve with increased loci sampling, but empirical evaluations of the number and attributes of loci needed for estimating species trees are still rare. We explore the long-standing issue of how many loci are needed to infer accurate phylogenetic relationships, and whether loci with particular attributes (i.e., parsimony informativeness, variability, gene tree resolution) outperform others. We estimate relationships among the seven species of chickadees (Aves: Paridae) using DNA sequence data from 40 nuclear loci and from mtDNA.  These chickadees are a recently diverged group, well studied ecologically but lacking a nuclear phylogeny. We compare four species-tree inference methods that utilize the multi-species coalescent: two Bayesian approaches (BEST and *BEAST) and two ML approaches (STEM and STELLS). We find that although the reference species-tree may be attainable with a minimum number of “desirable” loci, there is a trade-off between the accuracy of demographic parameter estimates and the number of loci. These results, in combination with those of previous studies, suggest that the reference tree topology may be found in analyses of only few loci with high information content, but that accurate population genetic parameter estimates may require substantially more loci.

Collaborators: Matt Carling and Irby Lovette.
Citation: Harris RB, Carling MD, Lovette IJ. 2013. Phylogenetic inference of chickadees: gene trees versus species trees and the influence of sampling design. Evolution, 68:501-513.

1558590_10101404714737205_385597838_n

II. The relative trade-offs of using RAD versus sequence capture for species tree inference.

Recent advances in sequencing technologies have resulted in the development of numerous methods that allow sampling of genome-wide variation from large numbers of individuals. The two most popular options are: 1) to collect genome-wide single nucleotide polymorphisms (SNPs) by sequencing of restriction site associated DNA (RADseq) and, 2) targeted sequence capture of predefined regions of interest. Despite the widespread use of both these methods in non model systems, there have been no comparative simulation studies looking at the relative statistical power of data collected using a RADseq approach versus sequence capture in phylogenetic inference and population genetic inference.

Collaborators: Ziheng Yang and Adam Leaché
Funding: University of Washington’s Sargent Award, 2014.
Progress: Currently in the mix.