Statistical methods for the integration analysis of –omics data (genomics, epigenomics and transcriptomics)an application to bladder cancer

  1. Silvia Pineda Sanjuan
Supervised by:
  1. Núria Malats Riera Director
  2. Kristel van Steen Director

Defence university: Universidad Autónoma de Madrid

Fecha de defensa: 27 October 2015

  1. Fernando Rodríguez Artalejo Chair
  2. Alfonso Valencia Herrera Secretary
  3. Monika Stoll Committee member
  4. Mario Fernández Fraga Committee member
  5. Douglas Easton Committee member

Type: Thesis


An increase amount of –omics data are being generated and single –omics analysis have been performed to analyze them in the last decades. They have revealed significant findings to better understand the biology of complex disease, such as cancer, but combining more than two –omics data may reveal important biological insights that are not found otherwise. For this reason, in the last five years the idea of integrating data has appeared on the context of system biology. However, the integration of –omics data requires of appropriate statistical techniques to address the main challenges that high-throughput data impose. In this thesis, we propose different statistical approaches to integrate –omics data (genomics, epigenomics and transcriptomics from tumor tissue and genomics from blood samples) in individuals with bladder cancer. In the first approach, a framework based on a multi-staged strategy is proposed. Pairwise combinations using the three –omics measured in tumor were analyzed (transcriptomics-epigenomics, eQTL and methQTL) to end with the combination of all of them in triples relationships. They showed a whole spectrum of the associations between them and sound biological "trans" associations identifying new possible molecular targets. In the second approach, a multi-dimensional analysis is applied where the three –omics are considered together in the same model. Penalized regression methods (LASSO and ENET) were applied since they can combine the data in a large input matrix dealing with many of the –omics integrative challenges. Besides, a permutation–based MaxT method was proposed to assess goodness of fit while correcting by multiple testing which are the main drawbacks of the penalized regression methods. We obtained and externally validated in an independent data set a list of genes associated with genotypes and DNA methylation in "cis" relationship. Finally, this approach is applied to integrate the three –omics in tumor with the genomics in blood samples in an integrative eQTL analysis. This approach was compared with the 2 stage regression (2SR) approach previously used for eQTL integrative analysis. Our approach highlighted relevant eQTLs including also the ones found by the 2SR approach generating a list of genes and eQTLs that may be considered in future analysis. Overall, we have shown that –omics integrative analysis are needed to find missing, hidden or unreliable information and the application of the appropriate statistical approaches help in the integration of all the information available showing interesting biological relationships.