Novel peptide identification from tandem mass spectra using ESTs and sequence database compression

Nathan J Edwards

doi:10.1038/msb4100142

Novel peptide identification from tandem mass spectra using ESTs and sequence database compression

Mol Syst Biol. 2007:3:102. doi: 10.1038/msb4100142. Epub 2007 Apr 17.

Author

Nathan J Edwards¹

Affiliation

¹ Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA. nedwards@umiacs.umd.edu

Abstract

Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. Traditional search engines, which match peptide sequences with tandem mass spectra to identify the samples' proteins, use protein sequence databases to suggest peptide candidates for consideration. Although the acquisition of tandem mass spectra is not biased toward well-understood protein isoforms, this computational strategy is failing to identify peptides from alternative splicing and coding SNP protein isoforms despite the acquisition of good-quality tandem mass spectra. We propose, instead, that expressed sequence tags (ESTs) be searched. Ordinarily, such a strategy would be computationally infeasible due to the size of EST sequence databases; however, we show that a sophisticated sequence database compression strategy, applied to human ESTs, reduces the sequence database size approximately 35-fold. Once compressed, our EST sequence database is comparable in size to other commonly used protein sequence databases, making routine EST searching feasible. We demonstrate that our EST sequence database enables the discovery of novel peptides in a variety of public data sets.

Publication types

Congress
Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Databases, Genetic*
Expressed Sequence Tags*
Humans
Peptides / chemistry*
Polymorphism, Single Nucleotide
Proteomics
RNA, Messenger / genetics
Tandem Mass Spectrometry / methods*

Substances

Peptides
RNA, Messenger

Abstract

Publication types

MeSH terms

Substances

Grants and funding