ROSALIND | Glossary

A character is some feature, either physical or genetic, that divides a collection of taxa into two groups. The ultimate goal is to apply characters to the construction of a phylogeny, where taxa are represented as the leaves of a tree.

There are two common ways of encoding a given character $C$ dividing a collection of $n$ taxa. $C$ can be written in split notation as $S \mid S^{\textrm{c}}$, where $S$ is a subset of our taxa and $S^{\textrm{c}}$ is the set complement of $S$. Removing an edge from a tree divides its leaves into two disjoint sets $S$ and $S^{\textrm{c}}$, so that we can establish a correspondence between characters and edges of the phylogeny: specifically, we may assign each character to the edge that its split notation implies.

The second notation for $C$ assumes that we have ordered our $n$ taxa, after which $C$ may be written in array notation as an array $A$ in which $A[i]$ is equal to 1 or 0 depending on whether the $i$th taxon belongs to $S$ or $S^{\textrm{c}}$. Given a collection of arrays from a number of different characters, we may combine the arrays into a matrix called a character table. The creation of a phylogeny from a character table is an important algorithmic problem.

Glossary

Character

Report a typo