Caracterización del sector de Tecnologías del Lenguaje mediante modelado de tópicos y análisis de grafos: Visión general de la participación española

  1. Pérez-Fernández, David
  2. Arenas-García, Jerónimo
  3. Samy, Doaa
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2019

Issue: 63

Pages: 129-136

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

This paper aims at landscaping the Human Language Technologies (HLT) sector by applying topic modeling and graph analysis to study the scientific literature in ACL Anthology with special emphasis on the Spanish participation. The analysis takes into account the structured and unstructured data to offer an overview of the HLT landscape in Spain identifying main underlying themes and its evolution in the last years compared to the international HLT community. Results obtained are represented through an interactive visualization to allow the exploration of the HLT landscape in the time frame 1983-2018. |

Funding information

This work has been carried out in the frame-work of the Spanish State Plan for Natural Language Technologies. The work of J. Arenas-Garćıa has also been partly funded by MINECO projects TEC2014-52289-R and TEC2017-83838-R.

Funders

Bibliographic References

  • Badenes-Olmedo, C., J. L. Redondo-Garcia, and O. Corcho. 2017. Distributing text mining tasks with librAIry. In Proc. 2017 ACM Symposium on Document Engineering, DocEng ’17, pages 63–66. ACM.
  • Bastian, M., S. Heymann, and M. Jacomy. 2009. Gephi: An open source software for exploring and manipulating networks.
  • Bird, S., R. Dale, B. J. Dorr, B. R. Gibson, M. T. Joseph, M.-Y. Kan, D. Lee, B. Powley, D. R. Radev, and Y. F. Tan. 2008. The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In Proc. LREC.
  • Blei, D. M., A. Y. N, and M. I. Jordan. 2003. Latent dirichlet allocation. In Proc. NIPS.
  • Blondel, V. D., J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10).
  • Clauset, A., M. Newman, and C. Moore. 2004. Finding community structure in very large networks. Physical review. E, Statistical, nonlinear, and soft matter physics.
  • Gábor, K., H. Zargayouna, D. Buscaldi, I. Tellier, and T. Charnois. 2016. Semantic annotation of the ACL anthology corpus for the automatic analysis of scientific literature. In Proc. LREC.
  • Hall, D. L. W., D. Jurafsky, and C. D. Manning. 2008. Studying the history of ideas using topic models. In EMNLP.
  • Jin, Y., M.-Y. Kan, J.-P. Ng, and X. He. 2013. Mining scientific terms and their definitions: A study of the ACL anthology. In Proc. 2013 Conf. Empirical Methods in Natural Language Processing.
  • Langhe, R. D. 2016. Towards the discovery of scientific revolutions in scientometric data. Scientometrics, 110:505–519.
  • McCallum, A. K. 2002. Mallet: A machine learning for language toolkit.
  • Schäfer, U., B. Kiefer, C. Spurk, J. Steffen, and R. Wang. 2011. The ACL Anthology Searchbench. In Proc. ACL.
  • Schäfer, U., C. Spurk, and J. Steffen. 2012. A fully coreference-annotated corpus of scholarly papers from the ACL anthology. In Proc. COLING.
  • Serenko, A., N. Bontis, L. D. Booker, K. W. Sadeddin, and T. Hardie. 2010. A scientometric analysis of knowledge management and intellectual capital academic literature (1994-2008). J. Knowledge Management, 14:3–23.
  • Vogel, A. and D. Jurafsky. 2012. He said, she said: Gender in the ACL anthology. In Proc. ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries.
  • Xu, H., Z.-H. Yue, C. C. Wang, K. Dong, H. Pang, and Z. Han. 2017. Multi-source data fusion study in scientometrics. Scientometrics, 111:773–792.