Feb. 1, 2013, 3:37 p.m. by Rosalind Team
Topics: Alignment, Dynamic Programming
Overlapping Reads with Errors
As also mentioned in “Error Correction in Reads”, the sequencing machines that identify reads can make errors. However, the problem that we considered in “Genome Assembly as Shortest Superstring” assumed that all reads are error-free.
Thus, rather than trying to overlap reads exactly, we will instead do so approximately. The key to do this is to move toward methods that incorporate alignments. Yet neither global nor local alignment is appropriate for this task. Global alignment will attempt to align the entire reads, when we know that only the overlapping parts of the reads are relevant. For that matter, we may identify an optimal local alignment that does not correspond to an overlap.
As a result, we need a specific type of local alignment that aligns only the overlapping parts of two strings.
An overlap alignment between two strings
The term "overlap alignment" has also been used to describe what Rosalind defines as a semiglobal alignment. See “Semiglobal Alignment” for details.
Given: Two DNA strings
Return: The score of an optimal overlap alignment of
>Rosalind_54 CTAAGGGATTCCGGTAATTAGACAG >Rosalind_45 ATAGACCATATGTCAGTGACTGTGTAA
1 ATTAGAC-AG AT-AGACCAT
Citation
This problem follows Jones & Pevzner, An Introduction to Bioinformatics Algorithms, problem 6.22.