Read Filtration by Quality solved by 1049

June 25, 2013, 8:05 p.m. by Rosalind Team

Topics: Bioinformatics Tools, NGS

A Good Sort

Filtering out poor-quality reads from a sequencing run can have multiple benefits. Not only is it likely to improve results, but it will generally reduce the time and memory overhead required for assembly or analysis.

Problem

Poor-quality reads can be filtered out using the FASTQ Quality Filter tool from the FASTX toolkit. A command-line version of FASTX can be downloaded for Linux or MacOS from its website. An online interface for the FASTQ Quality Filter is also available here within the Galaxy web platform.

Given: A quality threshold value $q$, percentage of bases $p$, and set of FASTQ entries.

Return: Number of reads in filtered FASTQ entries

Sample Dataset

20 90
@Rosalind_0049_1
GCAGAGACCAGTAGATGTGTTTGCGGACGGTCGGGCTCCATGTGACACAG
+
FD@@;C<AI?4BA:=>C<G=:AE=><A??>764A8B797@A:58:527+,
@Rosalind_0049_2
AATGGGGGGGGGAGACAAAATACGGCTAAGGCAGGGGTCCTTGATGTCAT
+
1<<65:793967<4:92568-34:.>1;2752)24')*15;1,.3*3+*!
@Rosalind_0049_3
ACCCCATACGGCGAGCGTCAGCATCTGATATCCTCTTTCAATCCTAGCTA
+
B:EI>JDB5=>DA?E6B@@CA?C;=;@@C:6D:3=@49;@87;::;;?8+

Sample Output

2

Please login to solve this problem.