Building corpora for the development of a dependency parser for Spanish using Maltparser

  1. Herrera, Jesús
  2. Gervás Gómez-Navarro, Pablo
  3. Moriano, Pedro J.
  4. Muñoz, Alfonso
  5. Romero, Luis
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2007

Issue: 39

Pages: 181-186

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

The present paper details the process followed for creating training and test corpora for a dependency parser generator (Maltparser). The starting point is the Cast3LB corpus, which contains constituency analyses of Spanish texts. These constituency analyses are automatically transformed into dependency analyses. In addition, the empirically and semiautomatically obtention of a set of syntactic function labels for the training corpus is described. As a result of the process followed, it has been obtained a dependency parser for Spanish showing a 91% precision when determining dependencies.