ROSALIND | Phylogeny Comparison with Split Distance

Quantifying Binary Tree Comparison

We may often obtain two different phylogenies on the same collection of taxa from different sets of data. As a result, we would like to have a way of quantifying how much the two phylogenies differ. In the simplest case, we would like to compare the characters of two phylogenies.

Recall from “Counting Unrooted Binary Trees” that two unrooted binary trees are equivalent when they have the same set of splits; recall also (by extension of “Counting Phylogenetic Ancestors”) that any unrooted binary tree on $n$ taxa must have $n - 3$ nontrivial splits.

Problem

Define the split distance between two unrooted binary trees as the number of nontrivial splits contained in one tree but not the other.

Formally, if $s(T_1, T_2)$ denotes the number of nontrivial splits shared by unrooted binary trees $T_1$ and $T_2$, Then their split distance is $d_{\textrm{split}}(T_1, T_2) = 2(n-3) - 2s(T_1, T_2)$.

Given: A collection of at most 3,000 species taxa and two unrooted binary trees $T_1$ and $T_2$ on these taxa in Newick format.

Return: The split distance $d_{\textrm{split}}(T_1, T_2)$.

Sample Dataset

dog rat elephant mouse cat rabbit
(rat,(dog,cat),(rabbit,(elephant,mouse)));
(rat,(cat,dog),(elephant,(mouse,rabbit)));

Phylogeny Comparison with Split Distance solved by 234

Quantifying Binary Tree Comparison

Problem

Sample Dataset

Sample Output

Phylogeny Comparison with Split Distance solved by 234

Quantifying Binary Tree Comparison

Problem

Sample Dataset

Sample Output

Report a typo