A flexible multitask summarizer for documents from different media, domain and language

  1. Fuentes Fort, Maria
unter der Leitung von:
  1. Horacio Rodríguez Hontoria Doktorvater/Doktormutter

Universität der Verteidigung: Universitat Politècnica de Catalunya (UPC)

Fecha de defensa: 31 von März von 2008

Gericht:
  1. David Farwell Präsident/in
  2. Mihai Surdeanu Sekretär/in
  3. Irene Castellón Masalles Vocal
  4. Manuel de Buenaga Rodríguez Vocal
  5. R. Radev Dragomir Vocal

Art: Dissertation

Teseo: 145724 DIALNET

Zusammenfassung

Automatic Summarization is probably crucial with the increase of document generation, Particularly when retrieving, managing and processing information have become decisive tasks. However, one should not expect perfect systems able to substitute human sumaries. The automatic sumarization process strongly depends not only on the characteristics of the documents, but also on user different needs.Thus, several aspects have to be taken into account when designing an information system for summarizing, because, depending on the characteristics of the input documents and the desired results, several techniques can be aplied. In order to suport this process, the final goal of the thesis is to provide a flexible multitask summarizer architecture. This goal is decomposed in three main research purposes. First, to study the process of porting systems to different summarization tasks, processing documents in different lenguages, domains or media with the aim of designing a generic architecture to permit the easy addition of new tasks by reusing existents tools. Second, the developes prototypes for some tasks involving aspects related with the lenguage, the media and the domain of the document or documents to be summarized as well as aspects related with the summary content: generic, novelly summaries, or summaries that give answer to a specific user need. Third, to create an evaluation framework to analyze the performance of several approaches in written news and scientific oral presentation domains, focusing mainly in its intrinsic evaluation.