Análisis y optimización del recurso UMLS en la recuperación de información biomédica mediante métricas de similitud semántica

  1. Alonso Martínez, Israel
Supervised by:
  1. David Contreras Bárcena Director

Defence university: Universidad Pontificia Comillas

Fecha de defensa: 20 January 2016

Committee:
  1. Francisco Javier Montero de Juan Chair
  2. Mario Castro Ponce Secretary
  3. Lourdes Araujo Committee member
  4. José Ángel Olivas Varela Committee member
  5. Rafael Palacios Hielscher Committee member

Type: Thesis

Abstract

The information retrieval of medical documents through natural language processing, is far enough important and complex to devote special attention to this area of research. It is for this reason that many published studies address the issue of semantic similarity metrics in a theoretical context (consisting of pairs of independent and closed concepts) through the support of some resources contained in the UMLS Metathesaurus. However, none of these works focuses its study in a real context of biomedical information retrieval Therefore, in this thesis, a new study is proposed for the performance evaluation of metrics Intrinsic IC-Path y Path in a real environment of medical documentation (TREC Medical Records Track 2011), using UMLS source as support. To perform this novel experimental evaluation work, arises the need for a specific method of information retrieval based on the parameterization of Metathesaurus UMLS that add the similarities of both elements (similarity matrix) into a single outcome (Relevance/Not Relevance) that faces the relevance judgments of TREC experts to evaluate the performance of each of the metrics. The implementation of this system has led to the realization in the first part of the work, a comprehensive study and parameterization of UMLS resource in order to obtain optimal results coverage by different semantic similarity metrics. Accordingly, the need to propose a new information retrieval system that integrates the optimal use of UMLS infrastructure in the application of semantic similarity metrics on a real context of biomedical documentation (based on the TREC repository). This system allows us to assess the real extent of the main metrics (Path e Intrinsic IC-Path) on a single and reliable environment. Finally, an automatic summarization system of medical records is proposed as a way to two new approaches. The first arises as a need to validate the usefulness of the concept-based representation for a medical document presented in this thesis, in other contexts or applications. The second approach appears as a prelude to possible future improvements of the information retrieval system defined and evaluated in the development of this thesis.