July 31, 2012, midnight by Sonya Alexandrova
Topics: Computational Mass Spectrometry
Searching the Protein Database
Many proteins have already been identified for a wide variety of organisms. Accordingly, there are a large number of protein databases available, and so the first step after creating a mass spectrum for an unidentified protein is to search through these databases for a known protein with a highly similar spectrum. In this manner, many similar proteins found in different species have been identified, which aids researchers in determining protein function.
In “Comparing Spectra with the Spectral Convolution”, we introduced the spectral convolution and used it to measure the similarity of simplified spectra. In this problem, we would like to extend this idea to find the most similar protein in a database to a spectrum taken from an unknown protein. Our plan is to use the spectral convolution to find the largest possible number of masses that each database protein shares with our candidate protein after shifting, and then select the database protein having the largest such number of shared masses.
The complete spectrum of a weighted string
Given: A positive integer
Return: The maximum multiplicity of
4 GSDMQS VWICN IASWMQS PVSMGAD 445.17838 115.02694 186.07931 314.13789 317.1198 215.09061
3 IASWMQS