Much of my research is aimed at developing algorithms and software for assembling data from the large sequences databases for the purpose of building comprehensive phylogenetic trees. GenBank, for example, presently archives data on over 165,000 species, a sizable fraction of all described biodiversity. My lab is currently funded through two NSF AToL (Assembling the Tree of Life) grants to develop tools and techniques for acquiring sequence data and assembling it in a pre-processing pipeline upstream of phylogenetic inference proper. We are collaborating with computer scientists and other phylogeneticists to develop algorithms and test them primarily on plant phylogenetic and genomic data sets. These datasets range from taxonomically broad collections across sizeable parts of the tree of life to genome scale EST data sets and BAC-end sequence data sets (in collaboration with the OMAP rice genomics project) on smaller groups of taxa. Analysis of data at these extremes requires novel phylogenetic inference methods such as supertree construction, another active area of research in our group. Having completed some initial work on supertree methods, we and our math and computer science collaborators are now looking at problems associated with defining optimal inputs for supertree construction, and developing methods for estimating their confidence limits. Finally, we have recently started working in the area of biodiversity informatics, developing methods for examining patterns of phylogenetic diversity in local floristic assemblages. This dovetails with the phylogenomic work in unexpected ways through the common currency of taxonomic names associated with the sequence data needed to build reliable phylogenetic histories.
Visit Mike Sanderson's website.