Virus de ácido ribonucleico (ARN) y coronavirus en Google Dataset Search: alcance y correlación epidemiológica

  1. Manuel Blázquez-Ochando
  2. Juan-José Prieto-Gutiérrez
Revista:
El profesional de la información

ISSN: 1386-6710 1699-2407

Año de publicación: 2020

Título del ejemplar: Framing (Encuadre)

Volumen: 29

Número: 6

Tipo: Artículo

DOI: 10.3145/EPI.2020.NOV.28 DIALNET GOOGLE SCHOLAR lock_openAcceso abierto editor

Otras publicaciones en: El profesional de la información

Objetivos de desarrollo sostenible

Resumen

Se presenta un análisis sobre la publicación de conjuntos de datos recogidos en el buscador Google Dataset Search, especializados en familias de virus de ARN, cuya terminología fue obtenida en el tesauro del National Cancer Institute (NCI), elaborado por el Department of Health and Human Services de los Estados Unidos. Se busca evaluar el alcance y capacidad de reutilización de los datos disponibles, determinando el número de datasets, su libre acceso, proporción en formatos de descarga reutilizables, principales proveedores, cronología de publicación y verificación de su procedencia científica. Por otra parte, definir posibles vínculos entre la publicación de datasets y las principales pandemias ocurridas en los últimos 10 años. Entre los resultados obtenidos se destaca que sólo el 52% de los datasets tienen correspondencia con investigaciones científicas y, en menor medida, un 15% son reaprovechables. También se observa una evolución al alza en la publicación de datasets, especialmente vinculada a la afectación de las principales epidemias. Esto es confirmado de manera evidente con los virus del Ébola, Zika, SARS-CoV, H1N1, H1N5 y, particularmente con el coronavirus SARS-CoV-2. Finalmente, se observa que el buscador aún no ha implementado métodos adecuados para el filtrado y supervisión de los datasets. Estos resultados muestran algunas de las dificultades que aún presenta la ciencia abierta en el campo de los datasets.

Referencias bibliográficas

  • Ahlawat, Khyati; Chug, Anuradha; Singh, Amit-Prakash (2019). “Empirical evaluation of Map Reduce based hybrid approach for problem of imbalanced classification in big data”. International journal of grid and high performance computing, v. 11, n. 3, pp. 23-45. https://doi.org/10.4018/IJGHPC.2019070102
  • Bekelman, Justin E.; MPhil, Yan-Li; Gross, Cary P. (2003). “Scope and impact of financial conflicts of interest in biomedical research: a systematic review”. Jama, v. 289, n. 4, pp. 454-465. https://doi.org/10.1001/jama.289.4.454
  • Blischak, John D.; Davenport, Emily R.; Wilson, Greg (2016). “A quick introduction to version control with Git and GitHub”. PLoS computational biology, v. 12, n. 1. https://doi.org/10.1371/journal.pcbi.1004668
  • Brickley, Dan; Burgess, Matthew; Noy, Natasha (2019). “Google Dataset Search: Building a search engine for datasets in an open web ecosystem”. In: Proceedings of the 19th World wide web conference (WWW’19), pp. 1365-1375. https://doi.org/10.1145/3308558.3313685
  • Broder, Andrei (2002). “A taxonomy of web search”. ACM Sigir forum, v. 36, n. 2, pp. 3-10. https://doi.org/10.1145/792550.792552
  • Canino, Adrienne (2019). “Deconstructing Google Dataset Search”. Public services quarterly, v. 15, n. 3, pp. 248-255. https://doi.org/10.1080/15228959.2019.1621793
  • Chen, Emily; Lerman, Kristina; Ferrara, Emilio (2020). “Tracking social media discourse about the Covid-19 pandemic: Development of a public coronavirus Twitter data set”. JMIR public health and surveillance, v. 6, n. 2. https://doi.org/10.2196/19273
  • Chen, Serena H.; Young, M. Todd; Gounley, John; Stanley, Christopher; Bhowmik, Debsindhu (2020). “Distinct structural flexibility within SARS-CoV-2 spike protein reveals potential therapeutic targets”. BioRxiv. https://doi.org/10.1101/2020.04.17.047548
  • Corrales-Garay, Diego; Ortiz-de-Urbina-Criado, Marta; Mora-Valentín, Eva-María (2019). “Knowledge areas, themes and future research on open data: A co-word analysis”. Government information quarterly, v. 36, n. 1, pp. 77-87. https://doi.org/10.1016/j.giq.2018.10.008
  • Dick, George W. A.; Kitchen, Stuart F.; Haddow, Alexander J. (1952). “Zika virus (I). Isolations and serological specificity”. Transactions of the Royal Society of Tropical Medicine and Hygiene, v. 46, n. 5, pp. 509-520. https://doi.org/10.1016/0035-9203(52)90042-4
  • Elmeiligy, Manar A.; El-Desouky, Ali I.; Elghamrawy, Sally M. (2020). “A multi-dimensional big data storing system for generated Covid-19 large-scale data using Apache Spark”. arXiv preprint. https://arxiv.org/abs/2005.05036
  • Emond, Ronald T.; Evans, Barry; Bowen, Ernest-Thomas; Lloyd, Graham (1977). “A case of Ebola virus infection”. British medical journal, v. 2, n. 6086, pp. 541-544. https://doi.org/10.1136/bmj.2.6086.541
  • Google Search (2020). Dataset. https://developers.google.com/search/docs/data-types/dataset
  • Haleem, Abid; Javaid, Mohd; Khan, Ibrahim-Haleem; Vaishya, Raju (2020). “Significant applications of big data in Covid-19 pandemic”. Indian journal of orthopaedics, v. 54, n. 7. https://doi.org/10.1007/s43465-020-00129-z
  • Hawking, David; Craswell, Nick; Bailey, Peter; Griffihs, Kathleen (2001). “Measuring search engine quality”. Information retrieval, v. 4, n. 1, pp. 33-59. https://doi.org/10.1023/A:1011468107287
  • Hawking, David; Craswell, Nick; Thistlewaite, Paul; Harman, Dona (1999). “Results and challenges in web search evaluation”. Computer networks, v. 31, n. 11-16, pp. 1321-1330. https://doi.org/10.1016/S1389-1286(99)00024-9
  • Hernández-Pérez, Tony (2016). “En la era de la web de los datos: primero datos abiertos, después datos masivos”. El profesional de la información, v. 25, n. 4, pp. 517-525. https://doi.org/10.3145/epi.2016.jul.01
  • Howe, Nicola; Giles, Emma; Newbury-Birch, Dorothy; McColl, Elaine (2018). “Systematic review of participants’ attitudes towards data sharing: a thematic synthesis”. Journal of health services research & policy, v. 23, n. 2, pp. 123-133. https://doi.org/10.1177/1355819617751555
  • Irwin, Richard S. (2009). “The role of conflict of interest in reporting of scientific information”. Chest, v. 136, n. 1, pp. 253-259.https://doi.org/10.1378/chest.09-0890
  • Johansson, Michael A.; Saderi, Daniela (2020). “Open peer-review platform for Covid-19 preprints”. Nature, v. 579, n. 7797. https://doi.org/10.1038/d41586-020-00613-4
  • Karasti, Helena; Baker, Karen S.; Halkola, Eija (2006). “Enriching the notion of data curation in e-science: data managing and information infrastructuring in the long term ecological research (LTER) network”. Computer supported cooperative work, v. 15, n. 4, pp. 321-358. https://doi.org/10.1007/s10606-006-9023-2
  • Khashan, Eman A.; El-Desouky, Ali I.; Fadel, Magdy; Elghamrawy, Sally M. (2020). “A big data based framework for executing complex query over Covid-19 datasets (Covid-QF)”. arXiv preprint arXiv:2005.12271. https://arxiv.org/abs/2005.12271
  • King, John-Douglas; Li, Yuefeng; Tao, Xiaohui; Nayak, Richi (2007). “Mining world knowledge for analysis of search engine content”. Web intelligence and agent systems: An international journal, v. 5, n. 3, pp. 233-253. https://dl.acm.org/doi/10.5555/1377776.1377777
  • Landau, Yuval; Kiryati, Nahum (2019). “Dataset growth in medical image analysis research”. Arxiv.org. https://arxiv.org/abs/1908.07765
  • Le-Guillou, Ian (2020). “Covid-19: How unprecedented data sharing has led to faster-than-ever outbreak research”. Horizon. The UE research & innovation magazine, 23 March. https://horizon-magazine.eu/article/covid-19-how-unprecedented-data-sharing-has-led-faster-ever-outbreak-research.html
  • Lewandowski, Dirk (2015). “Evaluating the retrieval effectiveness of web search engines using a representative query sample”. Journal of the Association for Information Science and Technology, v. 66, n. 9, pp. 1763-1775. https://doi.org/10.1002/asi.23304
  • López-Borrull, Alexandre; Ollé-Castellà, Candela; García-Grimau, Francesc; Abadal, Ernest (2020). “Plan S y ecosistema de revistas españolas de ciencias sociales hacia el acceso abierto: amenazas y oportunidades”. El profesional de la información, v. 29, n. 2. https://doi.org/10.3145/epi.2020.mar.14
  • Marcial, Laura-Haak; Hemminger, Bradley M. (2010). “Scientific data repositories on the Web: An initial survey”. Journal of the American Society for Information Science and Technology, v. 61, n. 10, pp. 2029-2048. https://doi.org/10.1002/asi.21339
  • McKiernan, Erin C.; Bourne, Philip E.; Brown, C. Titus; Buck, Stuart; Kenall, Amye; Lin, Jennifer; McDougall, Damon; Nosek, Brian A.; Ram, Karthik; Soderberg, Courtney K.; Spies, Jeffrey R.; Thaney, Kaitlin; Updegrove, Andrew; Woo, Kara H.; Yarkoni, Tal (2016). “Point of view: How open science helps researchers succeed”. Elife, v. 5, e16800. https://doi.org/10.7554/eLife.16800.001
  • Mello, Michelle M.; Lieou, Van; Goodman, Steven N. (2018). “Clinical trial participants’ views of the risks and benefits of data sharing”. New England journal of medicine, v. 378, n. 23, pp. 2202-2211. https://doi.org/10.1056/NEJMsa1713258
  • Nosek, Brian A.; Alter, George; Banks, George C., (2015). “Promoting an open research culture”. Science, v. 348, n. 6242, pp. 1422–1425. https://doi.org/10.1126/science.aab2374
  • Polonetsky, Jules; Tene, Omer; Finch, Kelsey (2016). “Shades of gray: Seeing the full spectrum of practical data de-intentification”. Santa Clara law review. v. 56, n. 593, pp. 593-618. https://digitalcommons.law.scu.edu/cgi/viewcontent.cgi?article=2827&context=lawreview
  • Qian, Xiaoyuan; Bailey, James; Leckie, Christopher (2006). “Mining generalised emerging patterns”. In: Sattar, Abdul; Kang, Byeong-Ho (eds.). Australasian joint conference on artificial intelligence. Berlin, Heidelberg: Springer, pp. 295-304. ISBN: 978 3 540 49788 2 https://doi.org/10.1007/11941439_33
  • Saheb, Tahereh; Izadi, Leila (2019). “Paradigm of IoT big data analytics in healthcare industry: a review of scientific literature and mapping of research trends”. Telematics and informatics, v. 41, pp. 70-85 https://doi.org/10.1016/j.tele.2019.03.005
  • Schneier, Bruce (2012). “Securing medical research: A cybersecurity point of view”. Science, v. 336, n. 6088, pp. 1527-1529. https://doi.org/10.1126/science.1224321
  • Science Europe (2019). Plan S: Making full and immediate Open Access a reality. https://www.scienceeurope.org/coalition-s
  • Singhal, Ayush; Srivastava, Jaideep (2013). “Data extract: Mining context from the web for dataset extraction”. International journal of machine learning and computing, v. 3, n. 2, pp. 219-223. https://doi.org/10.7763/IJMLC.2013.V3.306
  • Wang, C. Jason; Ng, Chun Y.; Brook, Robert H. (2020). “Response to Covid-19 in Taiwan: big data analytics, new technology, and proactive testing”. Jama, v. 323, n. 14, pp. 1341-1342. https://doi.org/10.1001/jama.2020.3151
  • Weston, Sara J.; Ritchie, Stuart J.; Rohrer, Julia M.; Przybylski, Andrew K. (2019). “Recommendations for increasing the transparency of analysis of preexisting data sets”. Advances in methods and practices in psychological science, v. 2, n.3, pp. 214-227. https://doi.org/10.1177/2515245919848684
  • Zhou, Chenghu; Su, Fenzhen; Pei, Tao; Zhang, An; Du, Yunyan; Luo, Bin; Cao, Zhidong; Wang, Juanle; Yuan, Wen; Zhu, Yunqiang; Song, Ci; Chen, Jie; Xu, Jun; Li, Fujia; Ma, Ting; Jiang, Lili; Yan, Fengqin; Yi, Jiawei; Hu, Yunfeng; Liao, Yilan; Xiao, Han (2020). “Covid-19: challenges to GIS with big data”. Geography and sustainability, v. 1, n, 1, pp. 77-87. https://doi.org/10.1016/j.geosus.2020.03.005