I. The influence of gene flow on species tree estimation: A simulation study
Look to the existing literature to find numerous studies on how ILS impacts species tree estimation, but find relatively few studies on the ramifications of gene flow. We aim to quantify the impacts of both ILS and gene flow on Bayesian species tree estimation by simulating and analyzing multi-locus datasets that characterize a number of realistic situations: paraphyletic gene flow, divergence with gene flow, and allelic introgression.
Abstract: Gene flow among populations or species and incomplete lineage sorting (ILS) are two evolutionary processes responsible for generating gene tree discordance and therefore hindering species tree estimation. Numerous studies have evaluated the impacts of ILS on species tree inference, yet the ramifications of gene flow on species trees remain less studied. Here, we simulate and analyse multilocus sequence data generated with ILS and gene flow to quantify their impacts on species tree inference.We characterize species tree estimation errors under various models of gene flow, such as the isolation-migration model, the n-island model, and gene flow between non-sister species or involving ancestral species, and species boundaries crossed by a single gene copy (allelic introgression) or by a single migrant individual. These patterns of gene flow are explored on species trees of different sizes (4 vs. 10 species), at different time scales (shallow vs. deep), and with different migration rates. Species trees are estimated with the multispecies coalescent model using Bayesian methods (BEST and *BEAST) and with a summary statistic approach (MPEST) that facilitates phylogenomic-scale analysis. Even in cases where the topology of the species tree is estimated with high accuracy, we find that gene flow can result in overestimates of population sizes (species tree dilation) and underestimates of species divergence times (species tree compression). Signatures of migration events remain present in the distribution of coalescent times for gene trees, and with sufficient data it is possible to identify those loci that have crossed species boundaries. These results highlight the need for careful sampling design in phylogeographic and species delimitation studies as gene flow, introgression, or incorrect sample assignments can bias the estimation of the species tree topology and of parameter estimates such as population sizes and divergence times.
Collaborators: Adam Leaché, Bruce Rannala, and Ziheng Yang.
Citation: Leaché AD, Harris RB, Rannala B, Yang Z. 2013. The influence of gene flow on species tree estimation: a simulation study. Systematic Biology, 63(1):17-30. doi:10.1093/sysbio/syt049
II. Comparative species divergence across eight triplets of spiny lizards (Sceloporus) using genomic sequence data
Recent studies show that gene flow can have mixed effects on species divergence. There are multiple empirical examples of divergence with gene flow but relatively few studies have explored this in a comparative manner. We contrasted gene flow between eight triplets of North American lizards using a maximum likelihood implementation of the isolation-migration model using genome-wide SNP data.
Abstract: Species divergence is typically thought to occur in the absence of gene flow, but many empirical studies are discovering that gene flow may be more pervasive during species formation. Although many examples of divergence with gene flow have been identified, few clades have been investigated in a comparative manner, and fewer have been studied using genome-wide sequence data. We contrast species divergence genetic histories across eight triplets of North American Sceloporus lizards using a maximum likelihood implementation of the isolation-migration (IM) model. Gene flow at the time of species divergence is modeled indirectly as variation in species divergence time across the genome or explicitly using a migration rate parameter. Likelihood ratio tests (LRTs) are used to test the null model of no gene flow at speciation against these two alternative gene flow models. We also use the Akaike information criterion to rank the models. Hundreds of loci are needed for the LRTs to have statistical power, and we use genome sequencing of reduced representation libraries to obtain DNA sequence alignments at many loci (between 340 and 3,478; mean = 1,678) for each triplet. We find that current species distributions are a poor predictor of whether a species pair diverged with gene flow. Interrogating the genome using the triplet method expedites the comparative study of species divergence history and the estimation of genetic parameters associated with speciation.
Collaborators: Adam Leaché, Max Maliska, and Charles Linkem.
Citation: Leaché AD, Harris RB, Maliska, M, Linkem C. 2013. Comparative species divergences across eight triplets of spiny lizards (Sceloporus) using genomic sequence data. Genome Biology and Evolution, 5(12):2410-2419. doi:10.1093/gbe/evt186