Métodos de Procesado del Lenguaje Natural aplicados al estudio de las coberturas mediáticas

  1. Mar Castillo-Campos 1
  2. David Becerra-Alonso 1
  3. David Varona-Aramburu 2
  1. 1 Universidad Loyola Andalucía
    info

    Universidad Loyola Andalucía

    Sevilla, España

    ROR https://ror.org/0075gfd51

  2. 2 Universidad Complutense de Madrid
    info

    Universidad Complutense de Madrid

    Madrid, España

    ROR 02p0gd045

Journal:
Comunicación y Métodos (Communication & Methods)

ISSN: 2659-9538

Year of publication: 2022

Volume: 4

Pages: 89-95

Type: Article

More publications in: Comunicación y Métodos (Communication & Methods)

Abstract

Natural Language Processing comprises different quantitative techniques for theanalysis of texts that present different starting points to those usually used in journalism.With an eminently exploratory character and based on grounded theory, thecombination of techniques used here, TF, TF*IDF, word2vec and projection of termswith UMAP allow us to detect the link between terms in different documentary sources,as well as their frequency of use and exposure to certain concepts, ideas and characters.This methodology is intended to help to envision new lines of study, and can becombined with other more in-depth discourse analysis techniques. The flexibility of the method also allows experimentation with different word groups for any otherdocumentary source.

Bibliographic References

  • Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. O'Reilly Media.
  • Berven, A., Christensen, O., Moldeklev, S., Opdahl, A., & Villanger, K., (2020). A knowledge-graph platform for newsrooms. Computers in Industry 123. https://doi.org/10.1016/j.compind.2020.103321
  • Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2020). YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509, 257-289. https://doi.org/10.1016/j.ins.2019.09.013
  • Casero-Ripollés, A., Feenstra, R., & Tormey, S. (2016). Old and New Media Logics in an Electoral Campaign The Case of Podemos and the Two-Way Street Mediatization of Politics. The International Journal of Press/Politics, 21(3), 378-397. https://doi.org/10.1177/1940161216645340
  • Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing, 408, 189-215. https://doi.org/10.1016/j.neucom.2019.10.1
  • Christian, H., Agus, M. P., & Suhartono, D. (2016). Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF). ComTech: Computer, Mathematics and Engineering Applications, 7(4), 285-294. https://doi.org/10.21512/comtech.v7i4.3746
  • Doan, S., Yang, E. W., Tilak, S. S., Li, P. W., Zisook, D. S., & Torii, M. (2019). Extracting health-related causality from twitter messages using natural language processing. BMC medical informatics and decision making, 19(3), 71-77. https://doi.org/10.1186/s12911-019-0785-0
  • Edell, A. (2018). I trained fake news detection AI with >95% accuracy, and almost wentcrazy. En Towards Data Science. https://towardsdatascience.com/i-trained-fake-news-detection-ai-with-95-accuracy-and-almost-went-crazy-d10589aa57c
  • Emadi, M., & Rahgozar, M. (2020). Twitter sentiment analysis using fuzzy integral classifier fusion. Journal of Information Science, 46(2), 226-242. https://doi.org/10.1177/0165551519828
  • Fenoll, V., & Rodríguez-Ballesteros, P. (2017). Análisis automatizado de encuadres mediáticos. Cobertura en prensa del debate 7D 2015: el debate decisivo. Profesional de la Información, 26(4), 630-640. https://doi.org/10.3145/epi.2017.jul.07
  • Gao, Z., Feng, A., Song, X., & Wu, X. (2019). Target-dependent sentiment classification with BERT. IEEE Access 7, 154290-154299. https://10.1109/ACCESS.2019.2946594
  • García-Marín, J., Calatrava García, A., & Luengo, Ó. G. (2018). Debates electorales y conflicto. Un análisis con máquinas de soporte virtual (SVM) de la cobertura mediática de los debates en España desde 2008. Profesional de la información 27(3). https://doi.org/10.3145/epi.2018.may.15
  • Goularas, D., & Kamis, S. (2019, August). Evaluation of deep learning techniques in sentiment analysis from twitter data. In 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications, pp. 12-17. IEEE. https://doi.org/10.1109/Deep-ML.2019.00011
  • Iyengar, S., & Simon, A. F. (2000). New perspectives and evidence on political communication and campaign effects. Annual review of psychology, 51(1), 149-169. https://doi.org/10.1146/annurev.psych.51.1.149
  • Jang, B., Kim, I., & Kim, J. (2019). Word2vec convolutional neural networks for classification of news articles and tweets. PloS one, 14(8). https://doi.org/10.1371/journal.pone.0220976
  • Jung, N., & Lee, G. (2019). Automated classification of building information modeling (BIM) case studies by BIM use based on natural language processing (NLP) andunsupervised learning. Advanced Engineering Informatics, 41, 100917. https://doi.org/10.1016/j.aei.2019.04.007
  • Kim, D., Seo, D., Cho, S., & Kang, P. (2019). Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Information Sciences, 477, 15-29. https://doi.org/10.1016/j.ins.2018.10.006
  • Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D.(2019). Text classification algorithms: A survey. Information, 10(4), 150. https://doi.org/10.3390/info10040150
  • Kuncoro, B.A., & Iswanto, B.H. (2015, November). TF-IDF method in ranking keywords of Instagram users' image captions. En 2015 International Conferenceon Information Technology Systems and Innovation (ICITSI) (pp. 1-5). IEEE. https://ieeexplore.ieee.org/document/7437705
  • Labio-Bernal, A. (2018). Anti-communism and the mainstream online press in Spain: Criticism of Podemos as a strategy of a two-party system in crisis. The Propaganda Model Today: Filtering Perceptions and Awareness. University of Westminster Press. https://doi.org/10.16997/book27
  • Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, (pp. 1188-1196). PMLR. https://doi.org/10.48550/arXiv.1405.4053
  • Li, L., Johnson, J., Aarhus, W., & Shah, D. (2022). Key factors in MOOC pedagogy based on NLP sentiment analysis of learner reviews: What makes a hit. Computers & Education, 176, 104354. https://doi.org/10.1016/j.compedu.2021.104354
  • Mancera-Rueda, A., & Villar-Hernández, P. (2020). Análisis de las estrategias de encuadre discursivo en la cobertura electoral sobre Vox en los titulares de la prensa española. Doxa Comunicación. Revista Interdisciplinar de Estudios de Comunicación y Ciencias Sociales, 315-340.https://doi.org/10.31921/doxacom.n31a16
  • Marshall, M. N. (1996). Sampling for qualitative research. Family practice, 13(6), 522-526. https://doi.org/10.1093/fampra/13.6.522
  • McInnes, L., Healy, J., & Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426.
  • McNair, B. (2017). An introduction to political communication. Routledge.
  • Miguel-Sáez-de-Urabain, A., Fernández-de-Arroyabe-Olaortua, A., & Lazkano-Arrillaga, I. (2017). La espectacularización de la información política. El caso de El País en las elecciones estadounidenses de 2016. Revista Latina De Comunicación Social 72, 1131-1147. https://doi.org/10.4185/RLCS-2017-1211
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
  • Müller, M., Salathé, M., & Kummervold, P. E. (2020). Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter. arXiv preprint arXiv:2005.07503
  • Paniagua-Rojano, F., Seoane-Pérez, F., & Magallón-Rosa, R. (2020). Anatomía del buloelectoral: la desinformación política durante la campaña del 28-A en España. Revista CIDOB d'Afers Internacionals 124, 123-146. https://doi.org/10.24241/rcai.2020.124.1.123
  • Qaiser, S., & Ali, R. (2018). Text mining: use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25-29. https://doi.org/10.5120/ijca2018917395
  • Salton, G., Buckley, C. (1988). Term-Weighting approaches in Automatic Text Retrieval. Information Processing and Management, 24(5), 513–523. https://doi.org/10.1016/0306-4573(88)90021-0
  • Sánchez Gutiérrez, B. (2016). La representación mediática de los partidos políticos emergentes: el caso de Podemos y Ciudadanos en Atresmedia (Trabajo Final de Máster). Universidad de Sevilla.
  • Sánchez-Gutiérrez, B., & Nogales-Bocio, A. I. (2018). La cobertura mediática de Podemos en la prensa nativa digital neoliberal española: una aproximación al caso de OkDiario, El Español y El Independiente. En A.I. Nogales Bocio, C. Marta-Lazo, M.A. Solans García (Ed.), Estándares e indicadores para la calidad informativa en los medios digitales, (pp. 125-146).
  • Shapiro, A. H., Sudhof, M., & Wilson, D. (2020). Measuring news sentiment. Journal of Econometrics 228(2), 221-243. https://doi.org/10.1016/j.jeconom.2020.07.053
  • Singh, K., Sen, I., & Kumaraguru, P. (2018, July). A Twitter corpus for Hindi-English code mixed POS tagging. En Proceedings of the sixth international workshop onnatural language processing for social media, (pp. 12-17). https://doi.org/10.18653/v1/W18-3503
  • Sun, S., Luo, C., & Chen, J. (2017). A review of natural language processing techniquesfor opinion mining systems. Information fusion, 36, 10-25. https://doi.org/10.1016/j.inffus.2016.10.004
  • Thavareesan, S., & Mahesan, S. (2020, July). Sentiment lexicon expansion using Word2vec and fastText for sentiment prediction in Tamil texts. En 2020 Moratuwa Engineering Research Conference, (pp. 272-276). IEEE. https://doi.org/10.1109/MERCon50084.2020.9185369
  • Tian, X., & Tong, W. (2010). An improvement to TF: Term distribution based term weight algorithm. En 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing1, (pp. 252-255). IEEE. https://doi.org/10.1109/NSWCTC.2010.66
  • Xia, T., & Chai, Y. (2011). An Improvement to TF-IDF: Term Distribution based Term Weight Algorithm. Journal of Software, 6(3), 413-420. http://www.jsoftware.us/vol6/jsw0603-9.pdf
  • Vermeulen, M., Smith, K., Eremin, K., Rayner, G., & Walton, M. (2021). Application of Uniform Manifold Approximation and Projection (UMAP) in spectral imaging of artworks. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 252, 119547. https://doi.org/10.1016/j.saa.2021.119547
  • Wongso, R., Luwinda, F. A., Trisnajaya, B. C., & Rusli, O. (2017). News article text classification in Indonesian language. Procedia Computer Science, 116, 137-143. https://doi.org/10.1016/j.procs.2017.10.039
  • Zhou, P., Shi, W., Zhao, J., Huang, K-H., Chen, M., & Chang, K-W. (2019). Analyzing and Mitigating Gender Bias in Languages with Grammatical Gender and Bilingual Word Embeddings. ACL.