Improving parsing Accuracy for Spanish using Maltparser

  1. Ballesteros, Miguel
  2. Herrera, Jesús
  3. Francisco, Virginia
  4. Gervás Gómez-Navarro, Pablo
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2010

Issue: 44

Pages: 83-90

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

In the last years, dependency parsing has been accomplished by machine learning–based systems showing great accuracy but usually under 90% for Labelled Attachment Score (LAS). Maltparser is one of such systems. Machine learning allows to obtain parsers for every language having an adequate training corpus. Since generally such systems can not be modified the following question arises: Can we beat this 90% LAS by using better training corpora? In the present paper we show some prospective works on it. We studied some strategies considering training corpus’ size and its sentences’ length in order to obtain better parsing accuracy.

Bibliographic References

  • Buchholz, S. and E. Marsi. 2006. CoNLL{X shared task on Multilingual Dependency Parsing. In Proceedings of the 10th Con- ference on Computational Natural Lan- guage Learning (CoNLL{X), pages 149{ 164.
  • Eisner, Jason. 1996. Three New Probabilistic Models for Dependency Parsing: An Exploration. In Proceedings of the 16th International Conference on Compu- tational Linguistics (COLING{96), pages 340{345, Copenhagen.
  • Herrera, J. and P. Gervas. 2008. Towards a Dependency Parser for Greek Using a Small Training Data Set. Journal of the Spanish Society for NLP (SEPLN), 41:29{ 36.
  • Herrera, J., P. Gervas, P.J. Moriano, A. Moreno, and L. Romero. 2007a. Building Corpora for the Development of a Dependency Parser for Spanish Using Maltparser. Journal of the Spanish Society for NLP (SEPLN), 39:181{186.
  • Herrera, J., P. Gervas, P.J. Moriano, A. Moreno, and L. Romero. 2007b. JBeaver: un Analizador de Dependencias para el Espa~nol Basado en Aprendizaje. In Proceedings of the 12th Conference of the Spanish Society for Arti cial Intelli- gence (CAEPIA 07), Salamanca, Spain, pages 211{220. Asociacion Espa~nola para la Inteligencia Arti cial.
  • Johansson, R. and P. Nugues. 2006. Investigating Multilingual Dependency Parsing. In Proceedings of the Conference on Computational Natural Language Learn- ing (CoNLL{X).
  • McDonald, R., K. Lerman, and F. Pereira. 2006. Multilingual Dependency Analysis with a Two-Stage Discriminative Parser. In Proceedings of the 10th Conference on Computational Natural Language Learn- ing (CoNLL{X), pages 216{220.
  • McDonald, R. and J. Nivre. 2007. Characterizing the Errors of Data{Driven Dependency Parsing Models. In Proceedings of the 2007 Joint Conference on Empiri- cal Methods in Natural Language Process- ing and Computational Natural Language Learning, pages 122{131. Association for Computational Linguistics.
  • Nivre, J., J. Hall, and J. Nilsson. 2004. Memory{based Dependency Parsing. In Proceedings of CoNLL{2004, pages 49{56. Boston, MA, USA.
  • Nivre, J., J. Hall, J. Nilsson, G. Eryigit, and S. Marinov. 2006. Labeled Pseudo{ Projective Dependency Parsing with Support Vector Machines. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL{X), pages 221{225.
  • Palomar, M., M. Civit, A. Daz, L. Moreno, E. Bisbal, M. Aranzabe, A. Ageno, M.A. Mart, and B. Navarro. 2004. 3LB: Construccion de una base de datos de arboles sintactico{semanticos para el catalan, euskera y espa~nol. In Proceed- ings of the XX Conference of the Spanish Society for NLP (SEPLN), pages 81{88. Sociedad Espa~nola para el Procesamiento del Lenguaje Natural.
  • Taule, M., M.A. Mart, and M. Recasens. 2008. AnCora: Multilevel Annotated Corpora for Catalan and Spanish. In Pro- ceedings of 6th International Conference on Language Resources and Evaluation.
  • Wu, Y., Y. Lee, and J. Yang. 2006. The Exploration of Deterministic and Ecient Dependency Parsing. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL{X).
  • Yamada, H. and Y. Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Machines. In In Proceed- ings of International Workshop of Parsing Technologies (IWPT'03), pages 195{206. Miguel Ballesteros, Jesús Herrera, Virginia Francisco, Pablo Gervás