Reconocimiento y clasificación de entidades nombradas en textos legales en español

  1. Samy, Doaa
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2021

Número: 67

Páginas: 103-114

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Named Entity Recognition and Classification (NER/NERC) is a major task in Natural Language Processing (NLP) and Information Extraction (IE). In the legal domain, NERC is indispensable in developing legal intelligent systems. This study pretends to take a first step towards a baseline for Spanish NERC in the legal domain. The main objective is to provide a linguistic resource by annotating five basic categories of Named Entities in Spanish legislative texts. These five categories are Person, Organization, Location, Dates (absolute expressions) and, finally References to aws, decrees, regulations, etc. To achieve this goal, we adopt a hybrid approach by combining three techniques: hand-crafted patterns through regular expressions, look-up lists and training of three NERC models using the architecture of spaCy. The best model achieved a general f-score of 0.93 with some types of entities such as Legal entities and Dates reaching up to 0.98 and 0.97 respectively. The worst model achieved a general f-score of 0.85, which is still satisfactory given the state of the art.

Referencias bibliográficas

  • Agerri, R. y G. Rigau. 2020. Projecting Heterogeneous Annotations for Named Entity Recognition. En Proceedings of Iberlef Workshop. Co-located with 36th Conference of the Spanish Society for Natural Language Processing (SEPLN 2020). Málaga, Spain, September 2020. Disponible en: http://ceur-ws.org/Vol2664/capitel_paper2.pdf
  • Andrew, J. y X. Tannier. 2018. Automatic Extraction of Entities and Relation from Legal Documents. En Proceedings of the Seventh Named Entities Workshop, Association for Computational Linguistics. pages 1–8. Melbourne, Australia, July 20, 2018.
  • Badji, I. 2018. Legal entity extraction with NER Systems. Tesis (Master), E.T.S. de Ingenieros Informáticos (UPM).
  • Cardellino, C., M. Teruel, L. Alemany, y S. Villata. 2017. A low-cost, high-coverage legal named entity recognizer, classifier and linker. En Proceedings of the 16th edition of the International Conference on Artificial Intelligence and Law.
  • Chalkidis, I., I. Androutsopoulos, y A. Michos. 2017 Extracting contract elements. In Proceedings of the 16th Int. Conf. on Artificial Intelligence and Law, pages 19– 28, London, UK, 2017.
  • Chalkidis I. e I. Androutsopoulos. 2017. A deep learning approach to contract element extraction. En Proceedings of the 30th International Conference on Legal Knowledge and Information Systems, Luxembourg, pp 155–164.
  • Cormack, G., M. R. Grossman, B. Hedin., y D. Oard. 2010. Overview of the TREC 2010 Legal Track. TREC.
  • Dozier, C., R. Kondadadi, M. Light, A. Vachher, S. Veeramachaneni, y R. Wudali. 2010. Named entity recognition and resolution in legal text. En Francesconi, E., Montemagni, S., Peters, W., Tiscornia, D. (eds.) Semantic Processing of Legal Texts. LNCS (LNAI), vol. 6036, pp. 27–43. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12837-02.
  • Francesconi, E., S. Montemagni, W. Peters, y D. Tiscornia. 2010. Semantic Processing of Legal Texts: where the language of law meets the law of language (Lecture notes in computer science: lecture notes in artificial intelligence, Vol 6036).
  • Glaser, I., B. Waltl, y F. Matthes. 2018. Named entity recognition, extraction and linking in German legal contracts. En: IRIS: Internationales Rechtsinformatik Symposium, pp. 325–334.
  • Honnibal, M. y I. Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.
  • Landthaler, J., B. Waltl, y F. Matthes. 2016. Unveiling references in legal texts – implicit versus explicit network structures. En IRIS: Internationales Rechtsinformatik Symposium, pp. 71–78 (2016).
  • Leitner, E., G. Rehm, y J. Moreno-Schneider. 2019. Fine-grained Named Entity Recognition in Legal Documents. En Maribel Acosta, et al., (eds.), Semantic Systems. The Power of AI and Knowledge Graphs. Proceedings of the 15th International Conference (SEMANTiCS2019), number 11702. Lecture Notes in Computer Science, pages 272–287, Karlsruhe, Germany, 9. Springer. 10/11 September 2019.
  • Martínez-González, M., P. de la Fuente, y D.J. Vicente. 2005. Reference extraction and resolution for legal texts. En International Conference on Pattern Recognition and Machine Intelligence, pages 218-221. Springer.
  • Nadeau, D., y S. Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30, 3-26.
  • Navas-Loro, M. 2017. Mining, Representation and Reasoning with Temporal Expressions in the Legal Domain. Proceedings of the Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters.
  • Navas-Loro, M. y V. Rodríguez-Doncel. 2020. Annotador: a Temporal Tagger for Spanish, Journal of Intelligent and Fuzzy Systems, Vol. 39 (2020)
  • PlanTL-IberLegal. 2019. Recursos y aplicaciones de tecnologías del lenguaje para el dominio legal en lenguas de la Península Ibérica. Disponible en: https://plantl.mineco.gob.es/tecnologiaslenguaje/comunicacionformacion/eventos/Paginas/iberlegal2019.aspx
  • PlanTL-IberLegal. 2020. Tarea de evaluación de Entidades Nombradas en textos legales (Cancelada). Disponible en: https://temu.bsc.es/iberlegal/
  • Porta-Zamorano, J. y L. Espinosa-Anke. 2020. Overview of CAPITEL Shared Tasks at IberLEF 2020: Named Entity Recognition and Universal Dependencies Parsing. IberLEF@SEPLN. Disponible en: https://arxiv.org/pdf/2011.05932.pdf
  • Qi, P., Y. Zhang, Y. Zhang, J. Bolton, y C.D. Manning. 2020. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. En Association for Computational Linguistics (ACL) System Demonstrations. 2020.
  • Quaresma, P. y T. Gonçalves. 2010. Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents. Semantic Processing of Legal Texts.
  • Rehm, G., J. Moreno-Schneider, J. Gracia, A. Revenko, V. Mireles, M. Khvalchik, I. Kernerman, A. Lagzdins, M. Pinnis, A. Vasilevskis, E. Leitner, J. Milde, y P. Weißenhorn. 2019. Developing and Orchestrating a Portfolio of Natural Legal Language Processing and Document Curation Services. En: Aletras, N., et al. (eds.) Proceedings of Workshop on Natural Legal Language Processing (NLLP 2019), co-located with NAACL 2019, Minneapolis, USA, 7 June 2019, pp. 55–66.
  • Rios, S. 2015. Lead Generation for BigLaw? The Business and Ethics of Providing Free Legal Tools and Information Online, 2015. Working paper. Disponible en: https://law.stanford.edu/publications/leadgeneration-for-biglaw-the-business-andethics-of-providing-free-legal-tools-andinformation-online/
  • Rodríguez-Doncel, V., M. Navas-Loro, E. Montiel-Ponsoda, y P. Casanovas. 2018. Spanish Legislation as Linked Data. TERECOM@JURIX.
  • Roy, A. 2021. Recent Trends in Named Entity Recognition (NER). ArXiv, abs/2101.11420.
  • Samy, D., J. Arenas-García, y D. PérezFernández. 2020. Legal-ES: A Set of Large Scale Resources for Spanish Legal Text Processing. En Samy, D. et al. (eds.) Proceedings of Workshop on Language Technologies in Government and Public Administration (LT4Gov 2020), co-located with LREC 2020, Marseille, France.
  • Sekine, S. 2004. Named Entity: History and Future. Disponible en: http://cs.nyu.edu/sekine/papers/
  • Waltl, B. y R. Vogl. 2018. Explainable Artificial Intelligence – the New Frontier in Legal Informatics. En Jusletter IT 22. February 2018