Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems

Costanzo, Manuel; Rucci, Enzo; García-Sanchez, Carlos; Naiouf, Marcelo; Prieto-Matías, Manuel

doi:10.1007/S11227-024-05907-2

Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems

Costanzo, Manuel
Rucci, Enzo
García-Sanchez, Carlos
Naiouf, Marcelo
Prieto-Matías, Manuel

Revista:

The Journal of Supercomputing

ISSN: 0920-8542, 1573-0484

Año de publicación: 2024

Volumen: 80

Número: 9

Páginas: 12599-12622

Tipo: Artículo

DOI: 10.1007/S11227-024-05907-2 SCOPUS: 2-s2.0-85185296631 GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: The Journal of Supercomputing

Resumen

Bioinformatics and computational biology are two fields that have been exploiting GPUs for more than two decades, with being CUDA the most used programming language for them. However, as CUDA is an NVIDIA proprietary language, it implies a strong portability restriction to a wide range of heterogeneous architectures, like AMD or Intel GPUs. To face this issue, the Khronos group has recently proposed the SYCL standard, which is an open, royalty-free, cross-platform abstraction layer that enables the programming of a heterogeneous system to be written using standard, single-source C++ code. Over the past few years, several implementations of this SYCL standard have emerged, being oneAPI the one from Intel. This paper presents the migration process of the SW# suite, a biological sequence alignment tool developed in CUDA, to SYCL using Intel’s oneAPI ecosystem. The experimental results show that SW# was completely migrated with a small programmer intervention in terms of hand-coding. In addition, it was possible to port the migrated code between different architectures (considering multiple vendor GPUs and also CPUs), with no noticeable performance degradation on five different NVIDIA GPUs. Moreover, performance remained stable when switching to another SYCL implementation. As a consequence, SYCL and its implementations can offer attractive opportunities for the bioinformatics community, especially considering the vast existence of CUDA-based legacy codes.

€ Ver financiación

Información de financiación

Financiadores

Spanish MCIN/AEI
- PID2021-126576NB-I00
- PID2021-126576NB-I00
Universidad Complutense de Madrid

Referencias bibliográficas

Dally WJ, Turakhia Y, Han S (2020) Domain-specific hardware accelerators. Commun ACM 63(7):48–57. https://doi.org/10.1145/3361682
Robert D (2021) GPU shipments increase year-over-year in Q3. https://www.jonpeddie.com/press-releases/gpu-shipments-increase-year-over-year-in-q3
Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D (2016) Graphics processing units in bioinformatics, computational biology and systems biology. Brief Bioinform 18(5):870–885. https://doi.org/10.1093/bib/bbw058
De Oilveira Sandes EF, Boukerche A, De Melo ACMA (2016) Parallel optimal pairwise biological sequence comparison: algorithms, platforms, and classification. ACM Comput Surv. https://doi.org/10.1145/2893488
Ohue M, Shimoda T, Suzuki S, Matsuzaki Y, Ishida T, Akiyama Y (2014) Megadock 4.0: an ultra-high-performance protein-protein docking software for heterogeneous supercomputers. Bioinformatics 30(22):3281–3283
Loukatou S, Papageorgiou L, Fakourelis P, Filntisi A, Polychronidou E, Bassis I, Megalooikonomou V, Makałowski W, Vlachakis D, Kossida S (2014) Molecular dynamics simulations through GPU video games technologies. J Mole Biochem 3(2):64
Mrozek D, Brożek M, Małysiak-Mrozek B (2014) Parallel implementation of 3d protein structure similarity searches using a GPU and the CUDA. J Mol Model 20(2):1–17
Group K (2009) The OpenCL specification. Version 1.0. https://www.khronos.org/registry/cl/specs/opencl-1.0.pdf
Jin Z, Vetter JS (2022) Performance portability study of epistasis detection using sycl on nvidia gpu. In: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. BCB ’22. Association for Computing Machinery, New York. https://doi.org/10.1145/3535508.3545591
Christgau S, Steinke T (2020) Porting a legacy CUDA stencil code to oneAPI. In: 2020 IEEE IPDPSW, pp 359–367. https://doi.org/10.1109/IPDPSW50202.2020.00070
Korpar M, Sikic M (2013) SW# - GPU-enabled exact alignments on genome scale. Bioinformatics 29(19):2494–2495. https://doi.org/10.1093/bioinformatics/btt410
Costanzo M, Rucci E, García-Sánchez C, Naiouf M, Prieto-Matías M (2022) Migrating CUDA to oneAPI: a smith-waterman case study. In: Rojas I, Valenzuela O, Rojas F, Herrera LJ, Ortuño F (eds) Bioinform Biomed Eng. Springer, Cham, pp 103–116
De O, Sandes EF, Miranda G, Martorell X, Ayguade E, Teodoro G, De Melo ACMA (2016) Masa: a multiplatform architecture for sequence aligners with block pruning. ACM Trans Parallel Comput 2(4):28–12831. https://doi.org/10.1145/2858656
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453. https://doi.org/10.1016/0022-2836(70)90057-4
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Hasan L, Al-Ars Z (2011) In: Lopes H, Cruz L (eds) An overview of hardware-based acceleration of biological sequence alignment, pp 187–202. Intech
Isaev A (2006) Introduction to mathematical methods in bioinformatics. Universitext, 1st edn. Springer, Heidelberg
Daily J (2016) Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. https://doi.org/10.1186/s12859-016-0930-z
Mneimneh S (2024) Computational biology lecture 4: overlap detection, Local Alignment, Space Efficient Needleman–Wunsch
Korpar M, Sosic M, Blazeka D, Sikic M (2016) SWdb: GPU-accelerated exact sequence similarity database search. PLoS ONE 10(12):1–11. https://doi.org/10.1371/journal.pone.0145857
Khoo AA, Ogrizek-Tomaš M, Bulović A, Korpar M, Gürler E, Slijepčević I, Šikić M, Mihalek I (2013) ExoLocator-an online view into genetic makeup of vertebrate proteins. Nucl Acids Res 42(D1):879–881. https://doi.org/10.1093/nar/gkt1164
Ghorpade J, Parande J, Kulkarni M, Bawaskar A (2012) Gpgpu processing in CUDA architecture. arXiv:1202.4347
Software (2023) ComputeCpp Comunity Edition. https://developer.codeplay.com/products/computecpp/ce/home
Intel Corp (2021) Intel oneAPI. https://software.intel.com/en-us/oneapi
The triSYCL project. https://github.com/triSYCL/triSYCL (2023)
Alpay: OpenSYCL implementation. https://github.com/AdaptiveCpp/AdaptiveCpp (2023)
Alpay A, Soproni B, Wünsche H, Heuveline V (2022) Exploring the possibility of a hipsycl-based implementation of oneapi. In: International workshop on OpenCL. IWOCL’22. Association for Computing Machinery, New York. https://doi.org/10.1145/3529538.3530005
Alpay A, Heuveline V (2023) One pass to bind them: The first single-pass sycl compiler with unified code representation across backends. In: Proceedings of the 2023 international workshop on OpenCL. IWOCL ’23. Association for Computing Machinery, New York. https://doi.org/10.1145/3585341.3585351
Rucci E, Garcia C, Botella G, Giusti AED, Naiouf M, Prieto-Matias M (2018) Oswald: Opencl smith-waterman on altera’s FPGA for large protein databases. Int J High Perform Comput Appl 32(3):337–350. https://doi.org/10.1177/1094342016654215
Rucci E, Garcia C, Botella G, De Giusti A, Naiouf M, Prieto-Matias M (2018) SWIFOLD: Smith-waterman implementation on FPGA with OpenCL for long DNA sequences. BMC Syst Biol 12(Suppl 5):96. https://doi.org/10.1186/s12918-018-0614-6
NVIDIA (2022) Nsight Compute. https://developer.nvidia.com/nsight-compute
Tsai YM, Cojean T, Anzt H (2021) Porting a sparse linear algebra math library to Intel GPUs
Costanzo M, Rucci E, Sanchez CG, Naiouf M (2021) Early experiences migrating cuda codes to oneapi. In: Short Papers of the 9th Conference on Cloud Computing Conference, Big Data and Emerging Topics, pp 14–18. http://sedici.unlp.edu.ar/handle/10915/125138
Martínez PA, Peccerillo B, Bartolini S, García JM, Bernabé G (2022) Applying intel’s oneAPI to a machine learning case study. Concurrency Comput Pract Exp 34(13):6917. https://doi.org/10.1002/cpe.6917
Faqir-Rhazoui Y, García C (2023) Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures. J Supercomput 79(16):18480–18506. https://doi.org/10.1007/s11227-023-05373-2
Jin Z, Vetter J (2021) Evaluating cuda portability with HIPCL and DPCT. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 371–376. https://doi.org/10.1109/IPDPSW52791.2021.00065
Castaño G, Faqir-Rhazoui Y, García C, Prieto-Matías M (2022) Evaluation of intel’s DPC++ compatibility tool in heterogeneous computing. J Parall Distrib Comput 165:120–129. https://doi.org/10.1016/j.jpdc.2022.03.017
Yong W, Yongfa Z, Scott W, Wang Y, Qing X, Chen W (2021) Developing medical ultrasound imaging application across gpu, fpga, and CPU using oneapi. In: International workshop on OpenCL. IWOCL’21. Association for Computing Machinery, New York. https://doi.org/10.1145/3456669.3456680
Marinelli E, Appuswamy R (2021) Xjoin: portable, parallel hash join across diverse xpu architectures with OneaPI. In: Proceedings of the 17th international workshop on data management on new hardware. DAMON ’21. Association for Computing Machinery, New York. https://doi.org/10.1145/3465998.3466012
Jin Z, Vetter JS (2022) Understanding performance portability of bioinformatics applications in sycl on an nvidia gpu. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 2190–2195. https://doi.org/10.1109/BIBM55620.2022.9995222
Haseeb M, Ding N, Deslippe J, Awan M (2021) Evaluating performance and portability of a core bioinformatics kernel on multiple vendor GPUS. In: 2021 International workshop on performance, portability and productivity in HPC (P3HPC), pp 68–78. https://doi.org/10.1109/P3HPC54578.2021.00010
Solis-Vasquez L, Mascarenhas E, Koch A (2023) Experiences migrating cuda to sycl: a molecular docking case study. In: Proceedings of the 2023 international workshop on OpenCL. IWOCL ’23. Association for Computing Machinery, New York. https://doi.org/10.1145/3585341.3585372
Marinelli E, Appuswamy R (2021) OneJoin: cross-architecture, scalable edit similarity join for DNA data storage using oneAPI. In: ACM (ed) ADMS 2021, 12th international workshop on accelerating analytics and data management systems using modern processor and storage architectures, in conjunction with VLDB 2021, 16 August 2021, Copenhagen, Denmark, Copenhagen
Johnston B, Vetter JS, Milthorpe J (2020) Evaluating the performance and portability of contemporary sycl implementations. In: 2020 IEEE/ACM international workshop on performance, portability and productivity in HPC (P3HPC), pp 45–56. https://doi.org/10.1109/P3HPC51967.2020.00010
Breyer M, Daiß G, Pflüger D (2021) Performance-portable distributed k-nearest neighbors using locality-sensitive hashing and sycl. In: International workshop on OpenCL. IWOCL’21. Association for Computing Machinery, New York. https://doi.org/10.1145/3456669.3456692
Shilpage WR, Wright SA (2023) An investigation into the performance and portability of sycl compiler implementations. In: Bienz A, Weiland M, Baboulin M, Kruse C (eds) High performance computing. Springer, Cham, pp 605–619
Rognes T (2011) Faster Smith–Waterman database searches with inter-sequence SIMD parallelization. BMC Bioinform 12:221
Constantinescu D-A, Navarro A, Corbera F, Fernández-Madrigal J-A, Asenjo R (2021) Efficiency and productivity for decision making on low-power heterogeneous cpu+gpu socs. J Supercomput 77(1):44–65. https://doi.org/10.1007/s11227-020-03257-3
Nozal R, Bosque JL (2021) Exploiting co-execution with OneAPI: heterogeneity from a modern perspective. In: Sousa L, Roma N, Tomás P (eds) Euro-Par 2021: parallel processing. Springer, Cham, pp 501–516
Marowka A (2022) Reformulation of the performance portability metric. Softw Pract Exp 52(1):154–171. https://doi.org/10.1002/spe.3002