Primera aproximación para la extracción automática de Entidades Nombradas en corpus de documentos medievales castellanos

  1. Mª Eugenia Iglesias Moreno 1
  2. Pilar Azcárate Aguilar-Amat 1
  3. Sonia Sánchez Cuadrado 1
  1. 1 Universidad Carlos III de Madrid
    info

    Universidad Carlos III de Madrid

    Madrid, España

    ROR https://ror.org/03ths8210

Livre:
Humanidades Digitales: desafíos, logros y perspectivas de futuro
  1. López Poza, Sagrario (ed. lit.)
  2. Pena Sueiro, Nieves (ed. lit.)

Éditorial: SIELAE ; Universidade da Coruña

Année de publication: 2014

Pages: 229-238

Type: Chapitre d'ouvrage

Résumé

Lingüística de Corpus, Anotación de corpus, Documentación medieval, Reconocimiento y Clasificación de Entidades NombradasThis paper presents the results of evaluating the automatic recognition and annotation of proper names in a corpus of Castilian medieval documents. The evaluation has been done by adapting Feeling, an existing tool for natural language processing. This paper describes the two iterations of this evaluation: the first iteration, using the version for standard and old Spanish, and the second iteration, using an adaptation that has been created based on the problems found in the first iteration. Such problems were mainly caused by the inherent characteristics and variants of proper names and names of places in old Spanish. For that purpose, a corpus of 14th century documents of the Libro Becerro de las Behetrías de Castilla (LBB) was used. The proposed adaptation for old Spanish leads to a 98.23% level of success, which indicates that it can be used in the future evaluation of the entire corpus.