Métodos de Procesado del Lenguaje Natural aplicados al estudio de las coberturas mediáticas

Castillo-Campos, Mar; Becerra-Alonso, David; Varona-Aramburu, David

doi:10.35951/V4I2.171

Métodos de Procesado del Lenguaje Natural aplicados al estudio de las coberturas mediáticas

Castillo-Campos, Mar ¹
Becerra-Alonso, David ¹
Varona-Aramburu, David ²

1 Universidad Loyola Andalucía

Universidad Loyola Andalucía

Sevilla, España

ROR https://ror.org/0075gfd51
2 Universidad Complutense de Madrid

Universidad Complutense de Madrid

Madrid, España

ROR 02p0gd045

Zeitschrift:

Comunicación & métodos

ISSN: 2659-9538

Datum der Publikation: 2022

Titel der Ausgabe: La relevancia del método

Ausgabe: 4

Nummer: 2

Seiten: 85-99

Art: Artikel

DOI: 10.35951/V4I2.171 DIALNET GOOGLE SCHOLAR Open Access editor

Andere Publikationen in: Comunicación & métodos

Zusammenfassung

Natural Language Processing comprises different quantitative techniques for analysing texts and, although of proven solvency, it is still infrequent in the study of journalism. The methodological proposal of this research has been designed for the analysis of the media coverage of the elections to the Assembly of Madrid held in 2021. It is developed in three phases: counting of terms, studying the relationship between concepts using neural networks, and clustering and projection of terms. The results have been compared with previous studies of media coverage carried out with other methodologies. This research shows that the mechanization and automation of the proposed techniques are efficient for comparison, and serve as a starting point for qualitative or mixed research that explores texts in depth. The flexibility of the method also allows experimentation with different groups of words from media or any other documentary source.

Bibliographische Referenzen

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. O'Reilly Media.
Berven, A., Christensen, O., Moldeklev, S., Opdahl, A., & Villanger, K., (2020). A knowledge-graph platform for newsrooms. Computers in Industry 123. https://doi.org/10.1016/j.compind.2020.103321
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2020). YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509, 257-289. https://doi.org/10.1016/j.ins.2019.09.013
Casero-Ripollés, A., Feenstra, R., & Tormey, S. (2016). Old and New Media Logics in an Electoral Campaign The Case of Podemos and the Two-Way Street Mediatization of Politics. The International Journal of Press/Politics, 21(3), 378-397. https://doi.org/10.1177/1940161216645340
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing, 408, 189-215. https://doi.org/10.1016/j.neucom.2019.10.1
Christian, H., Agus, M. P., & Suhartono, D. (2016). Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech: Computer, Mathematics and Engineering Applications, 7(4), 285-294. https://doi.org/10.21512/comtech.v7i4.3746
Doan, S., Yang, E. W., Tilak, S. S., Li, P. W., Zisook, D. S., & Torii, M. (2019). Extracting health-related causality from twitter messages using natural language processing. BMC medical informatics and decision making, 19(3), 71-77. https://doi.org/10.1186/s12911-019-0785-0
Edell, A. (2018). I trained fake news detection AI with >95% accuracy, and almost went crazy. En Towards Data Science. https://towardsdatascience.com/i-trained-fake-news-detection-ai-with-95-accuracy-and-almost-went-crazy-d10589aa57c
Emadi, M., & Rahgozar, M. (2020). Twitter sentiment analysis using fuzzy integral classifier fusion. Journal of Information Science, 46(2), 226-242. https://doi.org/10.1177/0165551519828
Fenoll, V., & Rodríguez-Ballesteros, P. (2017). Análisis automatizado de encuadres mediáticos. Cobertura en prensa del debate 7D 2015: el debate decisivo. Profesional de la Información, 26(4), 630-640. https://doi.org/10.3145/epi.2017.jul.07
Gao, Z., Feng, A., Song, X., & Wu, X. (2019). Target-dependent sentiment classification with BERT. IEEE Access 7, 154290-154299. https://10.1109/ACCESS.2019.2946594
García-Marín, J., Calatrava García, A., & Luengo, Ó. G. (2018). Debates electorales y conflicto. Un análisis con máquinas de soporte virtual (SVM) de la cobertura mediática de los debates en España desde 2008. Profesional de la información 27(3). https://doi.org/10.3145/epi.2018.may.15
Goularas, D., & Kamis, S. (2019, August). Evaluation of deep learning techniques in sentiment analysis from twitter data. In 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications, pp. 12-17. IEEE. https://doi.org/10.1109/Deep-ML.2019.00011
Iyengar, S., & Simon, A. F. (2000). New perspectives and evidence on political communication and campaign effects. Annual review of psychology, 51(1), 149- 169. https://doi.org/10.1146/annurev.psych.51.1.149
Jang, B., Kim, I., & Kim, J. (2019). Word2vec convolutional neural networks for classification of news articles and tweets. PloS one, 14(8). https://doi.org/10.1371/journal.pone.0220976
Jung, N., & Lee, G. (2019). Automated classification of building information modeling (BIM) case studies by BIM use based on natural language processing (NLP) and unsupervised learning. Advanced Engineering Informatics, 41, 100917. https://doi.org/10.1016/j.aei.2019.04.007
Kim, D., Seo, D., Cho, S., & Kang, P. (2019). Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Information Sciences, 477, 15-29. https://doi.org/10.1016/j.ins.2018.10.006
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), 150. https://doi.org/10.3390/info10040150
Kuncoro, B.A., & Iswanto, B.H. (2015, November). TF-IDF method in ranking keywords of Instagram users' image captions. En 2015 International Conference on Information Technology Systems and Innovation (ICITSI) (pp. 1-5). IEEE. https://ieeexplore.ieee.org/document/7437705
Labio-Bernal, A. (2018). Anti-communism and the mainstream online press in Spain: Criticism of Podemos as a strategy of a two-party system in crisis. The Propaganda Model Today: Filtering Perceptions and Awareness. University of Westminster Press. https://doi.org/10.16997/book27
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, (pp. 1188-1196). PMLR. https://doi.org/10.48550/arXiv.1405.4053
Li, L., Johnson, J., Aarhus, W., & Shah, D. (2022). Key factors in MOOC pedagogy based on NLP sentiment analysis of learner reviews: What makes a hit. Computers & Education, 176, 104354. https://doi.org/10.1016/j.compedu.2021.104354
Mancera-Rueda, A., & Villar-Hernández, P. (2020). Análisis de las estrategias de encuadre discursivo en la cobertura electoral sobre Vox en los titulares de la prensa española. Doxa Comunicación. Revista Interdisciplinar de Estudios de Comunicación y Ciencias Sociales, 315-340. https://doi.org/10.31921/doxacom.n31a16
Marshall, M. N. (1996). Sampling for qualitative research. Family practice, 13(6), 522-526. https://doi.org/10.1093/fampra/13.6.522
McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
McNair, B. (2017). An introduction to political communication. Routledge.
Miguel-Sáez-de-Urabain, A., Fernández-de-Arroyabe-Olaortua, A., & Lazkano-Arrillaga, I. (2017). La espectacularización de la información política. El caso de El País en las elecciones estadounidenses de 2016. Revista Latina De Comunicación Social 72, 1131-1147. https://doi.org/10.4185/RLCS-2017-1211
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Müller, M., Salathé, M., & Kummervold, P. E. (2020). Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. arXiv preprint arXiv:2005.07503
Paniagua-Rojano, F., Seoane-Pérez, F., & Magallón-Rosa, R. (2020). Anatomía del bulo electoral: la desinformación política durante la campaña del 28-A en España. Revista CIDOB d'Afers Internacionals 124, 123-146. https://doi.org/10.24241/rcai.2020.124.1.123
Qaiser, S., & Ali, R. (2018). Text mining: use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29. https://doi.org/10.5120/ijca2018917395
Salton, G., Buckley, C. (1988). Term-Weighting approaches in Automatic Text Retrieval. Information Processing and Management, 24(5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0
Sánchez Gutiérrez, B. (2016). La representación mediática de los partidos políticos emergentes: el caso de Podemos y Ciudadanos en Atresmedia (Trabajo Final de Máster). Universidad de Sevilla.
Sánchez-Gutiérrez, B., & Nogales-Bocio, A. I. (2018). La cobertura mediática de Podemos en la prensa nativa digital neoliberal española: una aproximación al caso de OkDiario, El Español y El Independiente. En A.I. Nogales Bocio, C. Marta-Lazo, M.A. Solans García (Ed.), Estándares e indicadores para la calidad informativa en los medios digitales, (pp. 125-146).
Shapiro, A. H., Sudhof, M., & Wilson, D. (2020). Measuring news sentiment. Journal of Econometrics 228(2), 221-243. https://doi.org/10.1016/j.jeconom.2020.07.053
Singh, K., Sen, I., & Kumaraguru, P. (2018, July). A Twitter corpus for Hindi-English code mixed POS tagging. En Proceedings of the sixth international workshop on natural language processing for social media, (pp. 12-17). https://doi.org/10.18653/v1/W18-3503
Sun, S., Luo, C., & Chen, J. (2017). A review of natural language processing techniques for opinion mining systems. Information fusion, 36, 10-25. https://doi.org/10.1016/j.inffus.2016.10.004
Thavareesan, S., & Mahesan, S. (2020, July). Sentiment lexicon expansion using Word2vec and fastText for sentiment prediction in Tamil texts. En 2020 Moratuwa Engineering Research Conference, (pp. 272-276). IEEE. https://doi.org/10.1109/MERCon50084.2020.9185369
Tian, X., & Tong, W. (2010). An improvement to TF: Term distribution based term weight algorithm. En 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing 1, (pp. 252-255). IEEE. https://doi.org/10.1109/NSWCTC.2010.66
Xia, T., & Chai, Y. (2011). An Improvement to TF-IDF: Term Distribution based Term Weight Algorithm. Journal of Software, 6(3), 413-420. http://www.jsoftware.us/vol6/jsw0603-9.pdf
Vermeulen, M., Smith, K., Eremin, K., Rayner, G., & Walton, M. (2021). Application of Uniform Manifold Approximation and Projection (UMAP) in spectral imaging of artworks. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 252, 119547. https://doi.org/10.1016/j.saa.2021.119547
Wongso, R., Luwinda, F. A., Trisnajaya, B. C., & Rusli, O. (2017). News article text classification in Indonesian language. Procedia Computer Science, 116, 137-143. https://doi.org/10.1016/j.procs.2017.10.039
Zhou, P., Shi, W., Zhao, J., Huang, K-H., Chen, M., & Chang, K-W. (2019). Analyzing and Mitigating Gender Bias in Languages with Grammatical Gender and Bilingual Word Embeddings. ACL

Fuente de los datos: Dialnet

Métodos de Procesado del Lenguaje Natural aplicados al estudio de las coberturas mediáticas

Universidad Loyola Andalucía

Universidad Complutense de Madrid

Zusammenfassung

Bibliographische Referenzen