Proyecto de web semántica de autoridades en PARESextracción y análisis inicial

  1. Manuel Blázquez-Ochando 1
  2. María-Antonia Ovalle-Perandones 1
  1. 1 Universidad Complutense de Madrid, España
Journal:
Revista Panamericana de Comunicación

ISSN: 2683-2208

Year of publication: 2024

Volume: 6

Issue: 1

Type: Article

DOI: 10.21555/RPC.V6I1.3121 DIALNET GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Revista Panamericana de Comunicación

Abstract

The research focuses on describing the types of authorities within the Spanish Archives Portal, providing their quantification and relational ratio to outline the initial graph of this sector in PARES. To achieve this, web scraping methods have been employed, allowing for the compi-lation of all authority records for processing and analysis. The collected data demonstrates the greater relevance of personal authorities and families, followed by institutions and concepts. This approach reflects the importance of individuals and family relationships in the historical and archival context. Additionally, associative relationships between individuals and institu-tions are highlighted, suggesting the complexity of social and organizational interactions in the past. Furthermore, a strong interconnection between places and individuals is observed, as well as between places and other entities such as institutions and norms. This underscores the importance of geolocation and geographic context in understanding the historical and cultu-ral heritage represented in PARES. Moreover, an equitable proportion of family relationships is identified, indicating a rich representation of social and family life. Conversely, there is a low proportion of associative relationships with information sources, suggesting the need to expand the documentation and references used in descriptive records.

Bibliographic References

  • Agrawal, N., & Johari, S. (2019). A survey on content-based crawling for deep and surface web. Fifth International Conference on Image Information Processing (ICIIP) (pp. 491-496). IEEE. https://doi.org/10.1109/ICIIP47207.2019.8985906
  • APEF (n.d.) Who we are. Archives Portal Europe. https://www.archivesportaleurope.net/about-us/who-we-are/
  • Bae, S. W., Lee, H. D. & Cho, D. (2018). Design and implementation of a web crawler system for collection of structured and unstructured data. Journal of Korea Multimedia Society, 21(2), 199-209. https://doi.org/10.9717/kmms.2018.21.2.199
  • Chang, Z. (2022). A survey of modern crawler methods. Proceedings of the 6th International Conference on Control Engineering and Artificial Intelligence (pp. 21-28). https://doi.org/10.1145/3522749.3523076
  • CRUE (2017). Guía Linked Open Data para archivos universitarios. Grupo de Trabajo Linked Open Data y Archivos Universitarios, CRUE. http://cau.crue.org/wp-content/uploads/GT_9_Gu%C3%ADa_Linked_Open_Data_para_Archivos_Universitarios_2017.pdf
  • Dombrowski, A., & Dombrowski, Q. (2010). A formal approach to XML semantics: Implications for archive standards. Proceedings of the International Symposium on XML for the Long Haul: Issues in the Long-Term Preservation of XML. https://doi.org/10.4242/BalisageVol6.Dombrowski01
  • Gracy, K. F. (2015). Archival description and linked data: a preliminary study of opportunities and implementation challenges. Archival Science, 15, 239-294. https://doi.org/10.1007/s10502-014-9216-2
  • Guernaccini, F., Mazzini, S., & Bruno, G. (2019). LOD publication in the archival domain: methods and practices. ODOCH@ CaiSE, (pp. 15-26). https://ceur-ws.org/Vol-2375/paper2.pdf
  • Gunawan, R., Rahmatulloh, A., Darmawan, I., & Firdaus, F. (2019). Comparison of web scraping techniques: regular expression, HTML DOM and Xpath. 2018 International Conference on Industrial Enterprise and System Engineering (ICoIESE 2018). Atlantis Press (pp. 283-287). https://doi.org/10.2991/icoiese-18.2019.50
  • Hogan, A., Blomqvist, E., Cochez, M., D’Amato, C., Melo, G. D., Gutierrez, C., Kirrane, S., Labra Gayo, J. E., Navigli, R., Neumaier, S., Ngonga Ngomo, A. C., Polleres, A., Rashid, S. M., Rula, A., Schmelzeisen, L., Sequeda, J. F., Staab, S., & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys, 54(4). https://doi.org/10.1145/3447772
  • Jacobs, C. T., Avdis, A., Mouradian, S. L., & Piggott, M. D. (2015). Integrating research data management into geographical information systems. Roceedings of the 5th International Workshop on Semantic Digital Archives (SDA 2015) (pp. 7–17). http://ceur-ws.org/Vol-1529/paper2.pdf
  • Koch, I., Freitas, N., Ribeiro, C., Lopes, C. T., & Da Silva, J. R. (2019). Knowledge graph implementation of archival descriptions through CIDOC-CRM. International conference on theory and practice of digital libraries (pp. 99-106). Cham: Springer International Publishing.
  • Llanes-Padrón, D., & Pastor-Sánchez, J.A. (2017). Records in contexts: the road of archives to semantic interoperability. Program, 2017, 51(4), 387-405. https://doi.org/10.1108/PROG-03-2017-0021
  • López Cuadrado, A. M., & Requejo Zalama, J. (2021). Estrategias y modelos de gestión de datos archivísticos. Tábula, 24, 97–111. https://publicaciones.acal.es/tabula/article/view/874
  • López Cuadrado, A. M. (2016). PARES 2.0: tecnología para mejorar el acceso de los ciudadanos a los documentos y a la información en los Archivos Estatales. En González Cachafeiro, J. (coord.). Actas de las jornadas 9ª Jornadas archivando: usuarios, retos y oportunidades. León, 10 y 11 de noviembre (pp. 36-59). ISBN 978-84-617-7452-4
  • Marciano, R., Lemieux, V., Hedges, M., Esteva, M., Underwood, W., Kurtz, M., & Conrad, M. (2018). Archival records and training in the age of Big Data. In: J. Percell, L. C. Sarin, P. T. Jaeger, & J. C. Bertot (Eds.) Re-envisioning the MLS: Perspectives on the Future of Library and Information Science Education (Advances in Librarianship, vol. 44B, pp. 179-199). Emerald Publishing Limited, Leeds. https://doi.org/10.1108/S0065-28302018000044B010
  • Maynard, D., & Greenwood, M. A. (2012). Large scale semantic annotation, indexing, and search at the national archives. Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012 (pp. 3487–3494). http://www.lrec-conf.org/proceedings/lrec2012/pdf/122_Paper.pdf
  • Miller, E. (2001). Semantic Web Layer Cake. https://www.w3.org/2001/09/06-ecdl/slide17-0.html
  • Niu, J. (2016). Linked data for archives. Archivaria, 82(1), 83-110. https://archivaria.ca/index.php/archivaria/article/view/13582
  • O’Reilly, T. (30 de septiembre de 2005). What is Web 2.0: Design patterns and business models for the next generation of software. O’Reilly. https://www.oreilly.com/pub/a/web2/archive/what-is-web-20.html
  • Portal de Archivos Españoles (n.d.). Estadísticas de PARES. https://pares.culturaydeporte.gob.es/estadisticas.html
  • Radilova, M., Kamencay, P., Hudec, R., Benco, M., & Radil, R. (2022). Tool for parsing important data from web pages. applied sciences, 12(23), 12031. https://doi.org/10.3390/app122312031
  • Society of American Archivists (2011). Encoded Archival Context - Corporate bodies, Persons, and Families (EAC-CPF). https://www2.archivists.org/node/23669
  • Vafaie, M., Bruns, O., Pilz, N., Dessí, D. & Sack, H. (2021). Modelling archival hierarchies in practice: Key aspects and lessons learned. CEUR Workshop Proceedings, 2981. https://doi.org/10.34657/8006
  • Zhang, S., Wu, J., & Yang, K. (2020). A webpage segmentation method based on node information entropy of DOM tree. Journal of Physics: Conference Series, 1624(3), 032023. https://doi.org/10.1088/1742-6596/1624/3/032023