• Home
  • Biology
  • Discuss why understanding manual analysis of mass spectrometry data is essential in order to correctly use automated mass spectrometric database searching programs.

Discuss why understanding manual analysis of mass spectrometry data is essential in order to correctly use automated mass spectrometric database searching programs. Essay Example

  • Category:
    Biology
  • Document type:
    Essay
  • Level:
    Masters
  • Page:
    3
  • Words:
    1620

Importance of Understanding Manual Analysis of Mass Spectrometry Data for Correct Use of Mass Spectrometric Database Searching Programs

Importance of Understanding Manual Analysis of Mass Spectrometry Data for Correct Use of Mass Spectrometric Database Searching Programs

In the 1980s, scientists made a discovery that enabled them to harness the mass spectrometer. They then made use of this instrument to analyze bio-molecules. To this date there are millions of laboratories worldwide that have produced and are still producing billions in spectra that need to be analyzed. The response to this by some technological companies has been to develop algorithms that are utilized by mass spectrometric database search programs in order to analyst the data that is fed to them automatically (Parc 2011).

Mass spectrometry is a type of analytical technique that takes the measurement of mass-to-charge ratios (m/z) of particles that have been charged. Through determining how much the particles weigh or their masses, elemental composition of the molecule or sample being examined can then be determined. The function of a mass spectrometer is to produce ions from the sample or substance that is being investigated. After production, they are then separated according to mass-to-charge ratio. The relative abundance of the particular present ionic species is recorded (USP MS n.d., pp. 95).

The mass spectrometer has three vital components each with its role to play. The source of ion produces ions in the form of gas from the sample being studied. The analyzer separates the ions in to their attributed mass components depending on their m/z. The final component is a detector that identifies and records the relative abundance of the ionic species that have been resolved. In order to control this system, a computer is needed. The computer will also aid in acquiring and manipulating data and comparing the spectra to the relevant libraries available (USP MS n.d., pp. 95).

Mass spectrometry has become the standard identification technology for experimenting on substances to identify proteins. It is important that the instruments used for analysis are operated manually to make database searches more accurate and to avoid double standards in the search results. As the instruments are developed further, the sensitivity, throughput and accuracy of analysis improves too. The emphasis is no longer o how fast the instrument can come up with results but on how good a quality of analysis and interpretation is produced and the consequent generation of protein assignments that are confident.

The ability to generate billions of spectra in need of analysis everyday and the demand for this service has made manual analysis almost obsolete. Manual methods have become inadequate because of the amount of data that needs to be analyzed. However, there is an equally urgent need to transfer the expertise that is contained in each of the human analysts into MS interpretation algorithms so that they can be utilized in search databases. The algorithms will be complex and sophisticated to match the complexity and sophistication in the data provided. The capabilities of the human experts need to be converted into algorithms in computers (Chamrad et al 1995, pp. 1014-1022). The peptide mass fingerprinting (PMF) software should be upgraded, with reference to the human experts, in order to provide confident identifications of proteins and at faster rates. To get this type of result, there should be use of automated calibration. In addition, there should also be use of meta-searches and peak-rejection which will all make use of different PMF search databases.

The automatic calibrations work by matching known calibrations to spectral information. It is dependent on this information. Peak-rejection works by filtering information and utilizing only relevant information. Signals that are not related to the protein that has been analyzed are excluded from the dataset dependent lists that are then generated. When using meta-searches, relevant search databases and engines are triggered then the results or interpretations contained in each of them are combined or merged into a meta-score (Chamrad et al 1995, pp. 1014-1022). The score can then be linked or compared to other values that are expected. They work as statistical measures.

Various algorithms are utilized within mass spectrometric databases for improving the searching programs. Some databases make use of the de novo peptide sequencing through tandem mass spectrometry. This approach utilizes a sequencing algorithm known as de novo in order to produce a brief list of possible sequence result candidates that will function as queries in homology-based searches that happen subsequently (Taylor & Johnson 1997, pp. 1067-1075). The algorithm will apply a graph theory that appears to be similar to sequencing programs that have been described before. The problem that is often incurred from using this algorithm is that there can be multiple sequences produced from a single spectrum. The solution would be to ensure that the data entered is accurate and unique to avoid this. Manual analysis is recommended to come up with such results.

It is also important to do manual analysis because certain proteins present particular and unique challenges in their analysis. These proteins could be contained in more complex mixtures, for instance wheat gluten proteins and gliadins. These two groups contain proteins that are difficult to distinguish from each other because they have similar and repetitive sequences and motifs respectively. Improvements in the analytical tools and instruments will yield more unique results that can then be input into search databases resulting singular results (Vensel et al 2011).

High-throughput proteomic data often includes a significant amount of non-peptide ions in the result or spectra that is of very poor quality such that a high quality result on peptide identification is harder to achieve. It becomes almost impossible to positively identify the peptides or some of them are identified using false positives. The most effective way to ensure that the identification process yields better results is by riding the results of poor quality spectra that confuse the process. Using a dynamic noise level algorithm to screen the spectra from the peptide MS data can filter the tandem mass spectra and distinguish or separate them from poor quality spectra that have resulted from shotgun Proteomic experiments done poorly (XuH & Freitas 2010). This algorithm makes a determination of the noise for each independent spectrum that is within a set of tandem mass spectrometric data. By removing the unnecessary spectra, database searches will be of better quality and the price of computation will be reduced. Previous tryouts using this algorithm showed an 89% removal of spectra that did not match peptides when scanned. However, there was also a loss of about 3.6% of spectra that yielded positive matches.

Other database search algorithms have been developed for use in identifying cross-links in proteins that are intact. These identifications are made in peptides and proteins whose spectra are within tandem mass spectrometric data. This algorithm produces confident and an accurate result in terms of identifying the cross links (Xu H et al 2010, pp. 3384-3393). The algorithm offers an additional approaches to identifying the cross links. Tandem mass spectrometry is becoming more popular is used increasingly in the analysis of high throughput complex protein substances that have been investigated (Nesvizhskii & Aebersold 2004, pp. 173). With more laboratories producing spectra for analysis; major problems of consistency, transparency and objectivity of the analysis of the information from the experiments is experienced.

Recent and ongoing advances in MS have resulted in more data available for analysis. Owing to instrumental advances that have allowed biochemists to make analysis of objects as fast and as accurate as possible. Previously, the creation of technology and software that can help in the initial analysis of data and its interpretation were the domains of experts in the field of mass spectrometry (Baldwin 2004, pp. 1). However, the demand for more analysis cannot be met with human resources alone and there is need for inclusion of other forms of expertise and this is where computer systems come in. As more researchers become more conversant with the ways of mass spectrometry, databases for analysis need to be developed from the understanding of human specialists while making improvements that may have otherwise been impossible for humans.

List of References

Baldwin, MA 2004, Protein Identification by Mass Spectrometry: Issues to be Considered, Molecular and cellular proteomics, vol. 3, no. 1, pp. 1, viewed 29 May 2011 <http://www.mcponline.org/content/3/1/1.full.pdf>

Chamrad, DC, Koerting, G, Gobom, J, Thiele, H, Klose, J, Meyer, HE & Blueggel, M 1995, Interpretation of mass spectrometry data for high-throughput proteomics, Analytical and bioanalytical chemistry, vol. 376, no. 7, pp. 1014-1022, viewed 29 May 2011 <http://www.springerlink.com/content/gm5ptqu001uulc90/>

Nesvizhskii, AI & Aebersold, R 2004, Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS, vol. 9, no. 4, pp. 173, viewed 29 May 2011, <http://www.proteomecenter.org/PDFs/Nesvi_DDT.pdf>

Parc 2011, mass spectrometry data analysis, Parc, viewed 29 May 2011 <http://www.parc.com/work/focus-area/mass-spectra-analysis/>

Sparkman, OD 2000, Mass spectrometry desk reference, Global View Pub, Pittsburgh.

Taylor JA & Johnson, RS 1997, Sequence database searches via de novo peptide sequencing by tandem mass spectrometry, Rapid Commun Mass spectrum, vol. 11, no. 9, pp. 1067-1075, viewed 29 May 2011 <
http://www.ncbi.nlm.nih.gov/pubmed/9204580>

USP MS n.d., Mass spectrometry, viewed 29 May 2011, pp. 95 <http://www.forumsci.co.il/HPLC/usp-ms-2spaltet.pdf>

Vensel, WH, Dupont, FM, Sloane, S & Altenbach, SB 2011, Effect of cleavage enzyme, search algorithm and decoy database on mass spectrometric identification of wheat gluten proteins, Phytochemistry, viewed 29 May 2011 <http://www.ncbi.nlm.nih.gov/pubmed/21292286>

XuH & Freitas, MA 2010, A dynamic noise level algorithm for spectral screening of peptide MS/MS spectra, BMC Bioinformatics, vol. 22, no. 436 viewed 29 May 2011 <http://www.ncbi.nlm.nih.gov/pubmed/2073186>

Xu H, Hsu, PH, Zhang, L, Tsai, MD & Freitas MA 2010, Database search algorithm for identification of intact cross-links in proteins and peptides using tandem mass spectrometry, Jpreome Res, vol. 9, no. 2, pp. 3384-3393, viewed 29 May 2011 <http://www.ncbi.nlm.nih.gov/pubmed/20469931>