Análisis y optimización del recurso UMLS en la recuperación de información biomédica mediante métricas de similitud semántica

  1. Alonso Martínez, Israel
Dirigée par:
  1. David Contreras Bárcena Directeur/trice

Université de défendre: Universidad Pontificia Comillas

Fecha de defensa: 20 janvier 2016

Jury:
  1. Francisco Javier Montero de Juan President
  2. Mario Castro Ponce Secrétaire
  3. Lourdes Araujo Rapporteur
  4. José Ángel Olivas Varela Rapporteur
  5. Rafael Palacios Hielscher Rapporteur

Type: Thèses

Résumé

The information retrieval of medical documents through natural language processing, is far enough important and complex to devote special attention to this area of research. It is for this reason that many published studies address the issue of semantic similarity metrics in a theoretical context (consisting of pairs of independent and closed concepts) through the support of some resources contained in the UMLS Metathesaurus. However, none of these works focuses its study in a real context of biomedical information retrieval Therefore, in this thesis, a new study is proposed for the performance evaluation of metrics Intrinsic IC-Path y Path in a real environment of medical documentation (TREC Medical Records Track 2011), using UMLS source as support. To perform this novel experimental evaluation work, arises the need for a specific method of information retrieval based on the parameterization of Metathesaurus UMLS that add the similarities of both elements (similarity matrix) into a single outcome (Relevance/Not Relevance) that faces the relevance judgments of TREC experts to evaluate the performance of each of the metrics. The implementation of this system has led to the realization in the first part of the work, a comprehensive study and parameterization of UMLS resource in order to obtain optimal results coverage by different semantic similarity metrics. Accordingly, the need to propose a new information retrieval system that integrates the optimal use of UMLS infrastructure in the application of semantic similarity metrics on a real context of biomedical documentation (based on the TREC repository). This system allows us to assess the real extent of the main metrics (Path e Intrinsic IC-Path) on a single and reliable environment. Finally, an automatic summarization system of medical records is proposed as a way to two new approaches. The first arises as a need to validate the usefulness of the concept-based representation for a medical document presented in this thesis, in other contexts or applications. The second approach appears as a prelude to possible future improvements of the information retrieval system defined and evaluated in the development of this thesis.