July 2, 2012, midnight by Rosalind Team
Topics: Alignment, Dynamic Programming
Aligning Similar Substrings
Whereas global alignment (see “Global Alignment with Scoring Matrix”) can be helpful for comparing genetic strings of similar length that resemble each other, often we will be presented with strings that are mostly dissimilar except for some unknown region of the strings, which may represent a shared gene. To find such genes, we need to modify global alignment to instead search for shared motifs in the form of locally similar regions (recall “Finding a Shared Motif” and “Finding a Shared Spliced Motif”).
Using global alignment often fails to find shared motifs hidden in larger strings because (especially if the similar region is found on different ends of the string) aligning the strings causes gap penalties to rack up.
If we are only interested in comparing the regions of similarity, then we would like to have some way of disregarding the parts of the strings that don't resemble each other. The way to do this is to produce alignment scores for all possible pairs of substrings.
A local alignment of two strings
Given: Two protein strings
Return: A maximum alignment score along with substrings
>Rosalind_80 MEANLYPRTEINSTRING >Rosalind_21 PLEASANTLYEINSTEIN
23 LYPRTEINSTRIN LYEINSTEIN