Jan. 18, 2013, 4:24 p.m. by Rosalind Team
Topics: Alignment, Combinatorics
Beware of Alignment Inference
In “Edit Distance Alignment”, we introduced the concept of an alignment of two genetic strings having differing lengths with respect to edit distance. This provided us with a way of visualizing a specific collection of symbol substitutions, insertions, and deletions that could have taken place on the evolutionary path between the two strings.
However, simply finding one optimal alignment and declaring that it represents a true evolutionary history is a dangerous idea because the actual evolutionary picture may be suboptimal. For that matter, the collection of all optimal alignments may be huge, and the characteristics of these alignments could differ widely.
In order to begin analyzing the collection of optimal alignments for a pair of strings, the first question we will ask is simple: just how many optimal alignments exist?
Recall from “Edit Distance Alignment” that if
As a result, we obtain
Given: Two protein strings
Return: The total number of optimal alignments of
>Rosalind_78 PLEASANTLY >Rosalind_33 MEANLY
4
Why Are We Counting Modulo 134,217,727?
As a simple example, say that we attempt to count some statistic modulo 10. If computing the statistic requires us to multiply a collection of integers, and at any point we multiply by a multiple of 10, then the statistic will automatically become a multiple of 10 and thus congruent to 0 modulo 10. A similar issue can arise when we count a huge number modulo any composite number; however, if we count modulo a large prime number
$p$ (i.e., one without any divisors other than itself), then problems can only ever arise if when counting our statistic, we multiply by a multiple of$p$ .