Compatibility-Based Clustering of Phylogenetic Data Sets

Time

-

Locations

LS 111


Speaker

David Fernandez-Baca
Iowa State University
http://www.cs.iastate.edu/~fernande/



Description

The treatment of conflict within data sets is among the most debated questions in phylogenetics, and it is an especially relevant concern with the accumulation of large data sets that combine data from many sources. Indeed, recent genomic-scale phylogenetic analyses have called attention to the extent of the variation in the phylogenetic signal among loci. This variation results from a several causes, including differing relative branch lengths, recombination, and horizontal gene transfer. Regardless of its source, identifying phylogenetic conflict is an important step in building reliable evolutionary trees.

We describe a method to partition phylogenetic data sets of discrete characters based on the pairwise compatibility of characters. Unlike previous approaches, our method requires no knowledge of the phylogeny, model of evolution, or characteristics of the data. The method is based on a similarity scoring scheme that measures how close pairs of characters are to compatibility. The goal is to partition the characters into clusters so that characters within a cluster are more compatible with each other than they are with characters in other clusters. While partitioning according to these criteria is computationally intractable, we show that spectral methods quickly provide high-quality solutions. We demonstrate that our partitioning method effectively identifies conflicting phylogenetic signals in simulated and empirical data sets.

This is joint work with Duhong Chen, of the Department of Computer Science, Iowa State University, and J. Gordon Burleigh, of the Section of Evolution and Ecology, University of California, Davis.

Tags: