Using Semantic Graphs and Word Sense Disambiguation Techniques to Improve Text Summarization

  1. Plaza Morales, Laura
  2. Díaz Esteban, Alberto
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2011

Número: 47

Páginas: 97-105

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

En este trabajo se presenta un método para la generación automática de resúmenes basado en grafos semánticos. El sistema utiliza conceptos y relaciones de WordNet para construir un grafo que representa el documento, así como un algoritmo de clustering basado en la conectividad para descubrir los distintos temas tratados en él. La selección de oraciones para el resumen se realiza en función de la presencia en las oraciones de los conceptos más representativos del documento. Los experimentos realizados demuestran que el enfoque propuesto obtiene resultados significativamente mejores que otros sistemas evaluados bajo las mismas condiciones experimentales. Asimismo, el sistema puede ser fácilmente adaptado para trabajar con documentos de diferentes dominios, sin más que modificar la base de conocimiento y el método para identificar conceptos en el texto. Finalmente, este trabajo también estudia el efecto de la ambigüedad léxica en la generación de resúmenes.

Referencias bibliográficas

  • Agirre, E. and A. Soroa. 2009. Personalizing PageRank for Word Sense Disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 33–41.
  • Banerjee, S. and T. Pedersen. 2002. An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet. In Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics, pages 136–145.
  • Barabási, A.L. and R. Albert. 1999. Emergence of Scaling in Random Networks. Science, 268:509–512.
  • Bawakid, A. and M. Oussalah. 2008. A Semantic Summarization System: University of Birmingham at TAC 2008. In Proceedings of the First Text Analysis Conference.
  • Bossard, A., M. Généreux, and T. Poibeau. 2008. Description of the LIPN Systems at TAC 2008: Summarizing Information and Opinions. In Proceedings of the 1st Text Analysis Conference.
  • Brandow, R., K. Mitze, and L. F. Rau. 1995. Automatic Condensation of Electronic Publications by Sentence Selection. Information Processing and Management, 5(31):675–685.
  • Celikyilmaz, Asli, Marcus Thint, and Zhiheng Huang. 2009. A Graph-based Semi-Supervised Learning for Question-Answering. In Proceedings of the 47th Annual Meeting of the ACL, pages 719–727.
  • Edmundson, H. P. 1969. New Methods in Automatic Extracting. Journal of the Association for Computing Machinery, 2(16):264–285.
  • Erkan, G. and D. R. Radev. 2004. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research, 22:457–479.
  • Lesk, M. 1986. Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from a Ice Cream Cone. In Proceedings of Special Interest Group on Design of Communication, pages 24–26.
  • Lin, C-Y. 2004. Rouge: A Package for Automatic Evaluation of Summaries. In Proceedings of the Association for Computational Linguistics, Workshop: Text Summarization Branches Out, pages 74–81.
  • Litvak, M. and M. Last. 2008. Graph-based Keyword Extraction for Single-document Summarization. In Proceedings of the International Conference on Computational Linguistics, Workshop on Multi-source Multilingual Information Extraction and Summarization.
  • Lloret, E., O. Ferrández, R. Muñoz, and M. Palomar. 2008. A Text Summarization Approach under the Influence of Textual Entailment. In Proceedings of the 5th International Workshop on Natural Language Processing and Cognitive Science in Conjunction with the 10th International Conference on Enterprise Information Systems, pages 22–31.
  • Mihalcea, R. and P. Tarau. 2004. TextRank: Bringing Order into Texts. In Proceedings of the Conference on Empirical Methods on Natural Language Processing, pages 404–411.
  • Patwardhan, S., S. Banerjee, and T. Pedersen. 2005. SenseRelate::TargetWord: A Generalized Framework for Word Sense Disambiguation. In Proceedings of the Association for Computational Linguistics, pages 73–76.
  • Plaza, L., A. Diaz, and P. Gervas. 2010. Automatic Summarization of News Using WordNet Concept Graphs. IADIS International Journal on Computer Science and Information Systems, V:45–57.
  • Reeve, L. H., H. Han, and A. D. Brooks. 2007. The Use of Domain-specific Concepts in Biomedical Text Summarization. Information Processing and Management, 43:1765–1776.
  • Sparck-Jones, K. 1972. A Statistical Interpretation of Term Specificity and its Application in Retrieval. Journal of Documentation, 28(1):11–20.
  • Sparck-Jones, K. 1999. Automatic Summarising: Factors and Directions. The MIT Press.
  • Steinberger, J., M. Poesio, M. A. Kabadjov, and K. Jezek. 2007. Two Uses of Anaphora Resolution in Summarization. Information Processing and Management, 43(6):1663–1180.
  • Yoo, I., X. Hu, and I-Y. Song. 2007. A Coherent Graph-based Semantic Clustering and Summarization Approach for Biomedical Literature and a New Summarization Evaluation Method. BMC Bioinformatics, 8(9).
  • Zhao, L., L. Wu, and X. Huang. 2009. Using Query Expansion in Graph-based Approach for Query-focused Multi-document Summarization. Information Processing and Management, 45:35–41.