July 29, 2015, 1:11 a.m. by Rosalind Team
Given an arbitrary collection of k-mers Patterns (where some k-mers may appear multiple times), we define CompositionGraph(Patterns) as a graph with |Patterns| isolated edges. Every edge is labeled by a k-mer from Patterns, and the starting and ending nodes of an edge are labeled by the prefix and suffix of the k-mer labeling that edge. We then define the de Bruijn graph of Patterns, denoted DeBruijn(Patterns), by gluing identically labeled nodes in CompositionGraph(Patterns), which yields the following algorithm.
Construct the de Bruijn graph from a collection of k-mers.
Given: A collection of k-mers Patterns.
Return: The de Bruijn graph DeBruijn(Patterns), in the form of an adjacency list.
GAGG CAGG GGGG GGGA CAGG AGGG GGAG
AGG -> GGG CAG -> AGG,AGG GAG -> AGG GGA -> GAG GGG -> GGA,GGG