Diagnóstico y pronóstico en bases de datos clínicas con tecnicas no supervisadas. Diagnosis ano prognosis in clinical databases through unsupervised statistical techniques

  1. SÁNCHEZ RICO, MARINA LUCÍA
Dirigée par:
  1. Nicolas Hoertel Directeur/trice
  2. Jesús María Alvarado Izquierdo Directeur

Université de défendre: Universidad Complutense de Madrid

Fecha de defensa: 30 mars 2022

Jury:
  1. Marta Evelia Aparicio García President
  2. Miguel Ángel Castellanos López Secrétaire
  3. Hugo Peyre Rapporteur
  4. José Manuel Reales Avilés Rapporteur
  5. Francisco José Abad García Rapporteur

Type: Thèses

Résumé

When working in clinical settings, epidemiological research can, and frequently has, a direct impact on patients. Observational studies based on hospital data can be extremely valuable tools, especially in situations in which time is a key element. They have the ability tostudy a broad range of patients, and test very complex associations, both regarding the search and study of pathologies, prevalence, characteristics, associated risk factors or conditions, or associations between treatments or interventions and clinical outcomes. In recent years there has been a substantial growth in high quality observational studies in epidemiology, which is hypothesised to be due to two main factors. First, a proper, strong design that accounts for several potential error sources that account for the lack of randomization of observational studies. Second, because the proliferation and improvement of electronic health records (EHRs), researchers have been able to use techniques from other fields of study for epidemiological settings. In this thesis we aimed to contribute to the study and implementationof machine learning techniques that allow to take advantage of EHRs and clinical databases in observational epidemiological studies. To that aim, we incorporated unsupervised machine learning techniques for pattern identification studies to explore comorbidity patterns in hospitalized patients. In study 1, we compared the performance of three dimensionality reduction techniques, (i.e., Principal Component Analysis (PCA), t-Stochastic NeighborEmbedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP)) when applied in combination with cluster analysis to find hidden diagnostic patterns, finding a superior performance of UMAP...