Using Semantic Graphs and Word Sense Disambiguation Techniques to Improve Text Summarization

  1. Plaza Morales, Laura
  2. Díaz Esteban, Alberto
Aldizkaria:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Argitalpen urtea: 2011

Zenbakia: 47

Orrialdeak: 97-105

Mota: Artikulua

Beste argitalpen batzuk: Procesamiento del lenguaje natural

Laburpena

This paper presents a semantic graph-based method for extractive summarization. The summarizer uses WordNet concepts and relations to produce a semantic graph that represents the document, and a degree-based clustering algorithm is used to discover different themes or topics within the text. The selection of sentences for the summary is based on the presence in them of the most representative concepts for each topic. The method has proven to be an efficient approach to the identification of salient concepts and topics in free text. In a test on the DUC data for single document summarization, our system achieves significantly better results than previous approaches based on terms and mere syntactic information. Besides, the system can be easily ported to other domains, as it only requires modifying the knowledge base and the method for concept annotation. In addition, we address the problem of word ambiguity in semantic approaches to automatic summarization.

Erreferentzia bibliografikoak

  • Agirre, E. and A. Soroa. 2009. Personalizing PageRank for Word Sense Disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 33–41.
  • Banerjee, S. and T. Pedersen. 2002. An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet. In Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics, pages 136–145.
  • Barabási, A.L. and R. Albert. 1999. Emergence of Scaling in Random Networks. Science, 268:509–512.
  • Bawakid, A. and M. Oussalah. 2008. A Semantic Summarization System: University of Birmingham at TAC 2008. In Proceedings of the First Text Analysis Conference.
  • Bossard, A., M. Généreux, and T. Poibeau. 2008. Description of the LIPN Systems at TAC 2008: Summarizing Information and Opinions. In Proceedings of the 1st Text Analysis Conference.
  • Brandow, R., K. Mitze, and L. F. Rau. 1995. Automatic Condensation of Electronic Publications by Sentence Selection. Information Processing and Management, 5(31):675–685.
  • Celikyilmaz, Asli, Marcus Thint, and Zhiheng Huang. 2009. A Graph-based Semi-Supervised Learning for Question-Answering. In Proceedings of the 47th Annual Meeting of the ACL, pages 719–727.
  • Edmundson, H. P. 1969. New Methods in Automatic Extracting. Journal of the Association for Computing Machinery, 2(16):264–285.
  • Erkan, G. and D. R. Radev. 2004. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research, 22:457–479.
  • Lesk, M. 1986. Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from a Ice Cream Cone. In Proceedings of Special Interest Group on Design of Communication, pages 24–26.
  • Lin, C-Y. 2004. Rouge: A Package for Automatic Evaluation of Summaries. In Proceedings of the Association for Computational Linguistics, Workshop: Text Summarization Branches Out, pages 74–81.
  • Litvak, M. and M. Last. 2008. Graph-based Keyword Extraction for Single-document Summarization. In Proceedings of the International Conference on Computational Linguistics, Workshop on Multi-source Multilingual Information Extraction and Summarization.
  • Lloret, E., O. Ferrández, R. Muñoz, and M. Palomar. 2008. A Text Summarization Approach under the Influence of Textual Entailment. In Proceedings of the 5th International Workshop on Natural Language Processing and Cognitive Science in Conjunction with the 10th International Conference on Enterprise Information Systems, pages 22–31.
  • Mihalcea, R. and P. Tarau. 2004. TextRank: Bringing Order into Texts. In Proceedings of the Conference on Empirical Methods on Natural Language Processing, pages 404–411.
  • Patwardhan, S., S. Banerjee, and T. Pedersen. 2005. SenseRelate::TargetWord: A Generalized Framework for Word Sense Disambiguation. In Proceedings of the Association for Computational Linguistics, pages 73–76.
  • Plaza, L., A. Diaz, and P. Gervas. 2010. Automatic Summarization of News Using WordNet Concept Graphs. IADIS International Journal on Computer Science and Information Systems, V:45–57.
  • Reeve, L. H., H. Han, and A. D. Brooks. 2007. The Use of Domain-specific Concepts in Biomedical Text Summarization. Information Processing and Management, 43:1765–1776.
  • Sparck-Jones, K. 1972. A Statistical Interpretation of Term Specificity and its Application in Retrieval. Journal of Documentation, 28(1):11–20.
  • Sparck-Jones, K. 1999. Automatic Summarising: Factors and Directions. The MIT Press.
  • Steinberger, J., M. Poesio, M. A. Kabadjov, and K. Jezek. 2007. Two Uses of Anaphora Resolution in Summarization. Information Processing and Management, 43(6):1663–1180.
  • Yoo, I., X. Hu, and I-Y. Song. 2007. A Coherent Graph-based Semantic Clustering and Summarization Approach for Biomedical Literature and a New Summarization Evaluation Method. BMC Bioinformatics, 8(9).
  • Zhao, L., L. Wu, and X. Huang. 2009. Using Query Expansion in Graph-based Approach for Query-focused Multi-document Summarization. Information Processing and Management, 45:35–41.