May 11, 2013, 10:48 a.m. by mariel.21
Topics: String Algorithms
The Case of Mutated Repeats
In “Finding a Motif with Modifications”, we considered a problem in which we were given a motif and a long string (perhaps representing a genome), and we aimed to find the "closest" substring of the long string to the motif. In that problem, "closest" was defined as a minimum with respect to edit distance.
Yet there may be multiple substring candidates from the genome that achieve the minimum distance to the motif; this situation might occur in practice when the motif forms a repeat that occurs multiple times with variations deriving from mutations.
In this problem, we would like to find all substrings of a genome that are within a certain fixed distance of the desired motif.
Given: A positive integer
Return: All substrings
2 ACGTAG ACGGATCGGCATCGT
1 4 1 5 1 6