Desarrollo de nuevas metodologías informáticas aplicadas a la espectrometría de masas y al análisis masivo de datos generados en proyectos de proteómica utilizando técnicas de segunda generación

  1. Navarro, Pedro J.
Dirigida por:
  1. Jesús Vázquez Director/a

Universidad de defensa: Universidad Autónoma de Madrid

Fecha de defensa: 12 de marzo de 2010

Tribunal:
  1. José María Carazo García Presidente/a
  2. Concepcion Gil Garcia Secretaria
  3. Francisco Zafra Vocal
  4. Benito Cañas Montalvo Vocal
  5. Paulino Gómez Puertas Vocal
  6. Joaquín Abián Vocal
  7. Fernando J. Corrales Vocal

Tipo: Tesis

Resumen

High¿throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern Proteomics. In this work, we introduce a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores obtained by the database searching engine SEQUEST. The probability ratio is a non¿parametric and robust indicator that makes unnecessary spectra classification according to parameters such as charge state and allows a peptide identification performance, on the basis of false discovery rates, at least better than that obtained by other empirical statistical approaches. The indicator can also be modified to take into account the isoelectric point information obtained after IEF peptide fractionation. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single¿spectrum SEQUEST score distributions. These results make the robustness, conceptual simplicity and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high¿throughput experiments. In the other hand, statistical models for the analysis of protein expression changes by stable isotope labeling are still poorly developed. Besides, large¿scale test experiments to validate the null hypothesis are lacking. In this work we analyze several null¿hypothesis, large¿scale quantitative proteomics experiments performed using different isotope labeling approaches and mass spectrometry machines. Current statistical models based on normality and variance homogeneity were found unsuitable to describe the null hypothesis in all the situations tested, producing false expression changes. A random¿effects model was then developed including four different sources of variance at the spectrum¿fitting, scan, peptide and protein levels. With the new model the number of outliers at scan and peptide levels and the number of false expression changes were negligible in all the cases analyzed. The new model allowed to pass normality test all the three quantitation levels, becoming the first integrated, null¿hypothesis tested statistical model capable of interpreting any kind of quantitative data obtained by stable isotope labeling. All these algorithms and statistical models have been integrated in a software platform called QuiXoT.