Nov. 12, 2012, 9:43 p.m. by Rosalind Team
Topics: Phylogeny
Incomplete Characters
The modern revolution in genome sequencing has produced a huge amount of genetic data for a wide variety of species. One ultimate goal of possessing all this information is to be able to construct complete phylogenies via direct genome analysis.
For example, say that we have a gene shared by a number of taxa. We could create a character based on whether species are known to possess the gene or not, and then use a huge character table to construct our desired phylogeny. However, the present bottleneck with such a method is that it assumes that we already possess complete genome information for all possible species. The race is on to sequence as many species genomes as possible; for instance, the Genome 10K Project aims to sequence 10,000 species genomes over the next decade. Yet for the time being, possessing a complete genomic picture of all Earth's species remains a dream.
As a result of these practical limitations, we need to be able to work with partial characters, which divide taxa into three separate groups: those possessing the character, those not possessing the character, and those for which we do not yet have conclusive information.
A partial split of a set
We can assemble a collection of partial characters into a generalized partial character table
A quartet is a partial split
Given: A partial character table
Return: The collection of all quartets that can be inferred from the splits corresponding to the underlying characters of
cat dog elephant ostrich mouse rabbit robot 01xxx00 x11xx00 111x00x
{elephant, dog} {rabbit, robot} {cat, dog} {mouse, rabbit} {mouse, rabbit} {cat, elephant} {dog, elephant} {mouse, rabbit}