Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems
- Costanzo, Manuel
- Rucci, Enzo
- García-Sanchez, Carlos
- Naiouf, Marcelo
- Prieto-Matías, Manuel
ISSN: 0920-8542, 1573-0484
Año de publicación: 2024
Volumen: 80
Número: 9
Páginas: 12599-12622
Tipo: Artículo
Otras publicaciones en: The Journal of Supercomputing
Resumen
Bioinformatics and computational biology are two fields that have been exploiting GPUs for more than two decades, with being CUDA the most used programming language for them. However, as CUDA is an NVIDIA proprietary language, it implies a strong portability restriction to a wide range of heterogeneous architectures, like AMD or Intel GPUs. To face this issue, the Khronos group has recently proposed the SYCL standard, which is an open, royalty-free, cross-platform abstraction layer that enables the programming of a heterogeneous system to be written using standard, single-source C++ code. Over the past few years, several implementations of this SYCL standard have emerged, being oneAPI the one from Intel. This paper presents the migration process of the SW# suite, a biological sequence alignment tool developed in CUDA, to SYCL using Intel’s oneAPI ecosystem. The experimental results show that SW# was completely migrated with a small programmer intervention in terms of hand-coding. In addition, it was possible to port the migrated code between different architectures (considering multiple vendor GPUs and also CPUs), with no noticeable performance degradation on five different NVIDIA GPUs. Moreover, performance remained stable when switching to another SYCL implementation. As a consequence, SYCL and its implementations can offer attractive opportunities for the bioinformatics community, especially considering the vast existence of CUDA-based legacy codes.
Información de financiación
Financiadores
-
Spanish MCIN/AEI
- PID2021-126576NB-I00
- PID2021-126576NB-I00
- Universidad Complutense de Madrid
Referencias bibliográficas
- Dally WJ, Turakhia Y, Han S (2020) Domain-specific hardware accelerators. Commun ACM 63(7):48–57. https://doi.org/10.1145/3361682
- Robert D (2021) GPU shipments increase year-over-year in Q3. https://www.jonpeddie.com/press-releases/gpu-shipments-increase-year-over-year-in-q3
- Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D (2016) Graphics processing units in bioinformatics, computational biology and systems biology. Brief Bioinform 18(5):870–885. https://doi.org/10.1093/bib/bbw058
- De Oilveira Sandes EF, Boukerche A, De Melo ACMA (2016) Parallel optimal pairwise biological sequence comparison: algorithms, platforms, and classification. ACM Comput Surv. https://doi.org/10.1145/2893488
- Ohue M, Shimoda T, Suzuki S, Matsuzaki Y, Ishida T, Akiyama Y (2014) Megadock 4.0: an ultra-high-performance protein-protein docking software for heterogeneous supercomputers. Bioinformatics 30(22):3281–3283
- Loukatou S, Papageorgiou L, Fakourelis P, Filntisi A, Polychronidou E, Bassis I, Megalooikonomou V, Makałowski W, Vlachakis D, Kossida S (2014) Molecular dynamics simulations through GPU video games technologies. J Mole Biochem 3(2):64
- Mrozek D, Brożek M, Małysiak-Mrozek B (2014) Parallel implementation of 3d protein structure similarity searches using a GPU and the CUDA. J Mol Model 20(2):1–17
- Group K (2009) The OpenCL specification. Version 1.0. https://www.khronos.org/registry/cl/specs/opencl-1.0.pdf
- Jin Z, Vetter JS (2022) Performance portability study of epistasis detection using sycl on nvidia gpu. In: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. BCB ’22. Association for Computing Machinery, New York. https://doi.org/10.1145/3535508.3545591
- Christgau S, Steinke T (2020) Porting a legacy CUDA stencil code to oneAPI. In: 2020 IEEE IPDPSW, pp 359–367. https://doi.org/10.1109/IPDPSW50202.2020.00070
- Korpar M, Sikic M (2013) SW# - GPU-enabled exact alignments on genome scale. Bioinformatics 29(19):2494–2495. https://doi.org/10.1093/bioinformatics/btt410
- Costanzo M, Rucci E, García-Sánchez C, Naiouf M, Prieto-Matías M (2022) Migrating CUDA to oneAPI: a smith-waterman case study. In: Rojas I, Valenzuela O, Rojas F, Herrera LJ, Ortuño F (eds) Bioinform Biomed Eng. Springer, Cham, pp 103–116
- De O, Sandes EF, Miranda G, Martorell X, Ayguade E, Teodoro G, De Melo ACMA (2016) Masa: a multiplatform architecture for sequence aligners with block pruning. ACM Trans Parallel Comput 2(4):28–12831. https://doi.org/10.1145/2858656
- Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453. https://doi.org/10.1016/0022-2836(70)90057-4
- Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
- Hasan L, Al-Ars Z (2011) In: Lopes H, Cruz L (eds) An overview of hardware-based acceleration of biological sequence alignment, pp 187–202. Intech
- Isaev A (2006) Introduction to mathematical methods in bioinformatics. Universitext, 1st edn. Springer, Heidelberg
- Daily J (2016) Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform. https://doi.org/10.1186/s12859-016-0930-z
- Mneimneh S (2024) Computational biology lecture 4: overlap detection, Local Alignment, Space Efficient Needleman–Wunsch
- Korpar M, Sosic M, Blazeka D, Sikic M (2016) SWdb: GPU-accelerated exact sequence similarity database search. PLoS ONE 10(12):1–11. https://doi.org/10.1371/journal.pone.0145857
- Khoo AA, Ogrizek-Tomaš M, Bulović A, Korpar M, Gürler E, Slijepčević I, Šikić M, Mihalek I (2013) ExoLocator-an online view into genetic makeup of vertebrate proteins. Nucl Acids Res 42(D1):879–881. https://doi.org/10.1093/nar/gkt1164
- Ghorpade J, Parande J, Kulkarni M, Bawaskar A (2012) Gpgpu processing in CUDA architecture. arXiv:1202.4347
- Software (2023) ComputeCpp Comunity Edition. https://developer.codeplay.com/products/computecpp/ce/home
- Intel Corp (2021) Intel oneAPI. https://software.intel.com/en-us/oneapi
- The triSYCL project. https://github.com/triSYCL/triSYCL (2023)
- Alpay: OpenSYCL implementation. https://github.com/AdaptiveCpp/AdaptiveCpp (2023)
- Alpay A, Soproni B, Wünsche H, Heuveline V (2022) Exploring the possibility of a hipsycl-based implementation of oneapi. In: International workshop on OpenCL. IWOCL’22. Association for Computing Machinery, New York. https://doi.org/10.1145/3529538.3530005
- Alpay A, Heuveline V (2023) One pass to bind them: The first single-pass sycl compiler with unified code representation across backends. In: Proceedings of the 2023 international workshop on OpenCL. IWOCL ’23. Association for Computing Machinery, New York. https://doi.org/10.1145/3585341.3585351
- Rucci E, Garcia C, Botella G, Giusti AED, Naiouf M, Prieto-Matias M (2018) Oswald: Opencl smith-waterman on altera’s FPGA for large protein databases. Int J High Perform Comput Appl 32(3):337–350. https://doi.org/10.1177/1094342016654215
- Rucci E, Garcia C, Botella G, De Giusti A, Naiouf M, Prieto-Matias M (2018) SWIFOLD: Smith-waterman implementation on FPGA with OpenCL for long DNA sequences. BMC Syst Biol 12(Suppl 5):96. https://doi.org/10.1186/s12918-018-0614-6
- NVIDIA (2022) Nsight Compute. https://developer.nvidia.com/nsight-compute
- Tsai YM, Cojean T, Anzt H (2021) Porting a sparse linear algebra math library to Intel GPUs
- Costanzo M, Rucci E, Sanchez CG, Naiouf M (2021) Early experiences migrating cuda codes to oneapi. In: Short Papers of the 9th Conference on Cloud Computing Conference, Big Data and Emerging Topics, pp 14–18. http://sedici.unlp.edu.ar/handle/10915/125138
- Martínez PA, Peccerillo B, Bartolini S, García JM, Bernabé G (2022) Applying intel’s oneAPI to a machine learning case study. Concurrency Comput Pract Exp 34(13):6917. https://doi.org/10.1002/cpe.6917
- Faqir-Rhazoui Y, García C (2023) Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures. J Supercomput 79(16):18480–18506. https://doi.org/10.1007/s11227-023-05373-2
- Jin Z, Vetter J (2021) Evaluating cuda portability with HIPCL and DPCT. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 371–376. https://doi.org/10.1109/IPDPSW52791.2021.00065
- Castaño G, Faqir-Rhazoui Y, García C, Prieto-Matías M (2022) Evaluation of intel’s DPC++ compatibility tool in heterogeneous computing. J Parall Distrib Comput 165:120–129. https://doi.org/10.1016/j.jpdc.2022.03.017
- Yong W, Yongfa Z, Scott W, Wang Y, Qing X, Chen W (2021) Developing medical ultrasound imaging application across gpu, fpga, and CPU using oneapi. In: International workshop on OpenCL. IWOCL’21. Association for Computing Machinery, New York. https://doi.org/10.1145/3456669.3456680
- Marinelli E, Appuswamy R (2021) Xjoin: portable, parallel hash join across diverse xpu architectures with OneaPI. In: Proceedings of the 17th international workshop on data management on new hardware. DAMON ’21. Association for Computing Machinery, New York. https://doi.org/10.1145/3465998.3466012
- Jin Z, Vetter JS (2022) Understanding performance portability of bioinformatics applications in sycl on an nvidia gpu. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 2190–2195. https://doi.org/10.1109/BIBM55620.2022.9995222
- Haseeb M, Ding N, Deslippe J, Awan M (2021) Evaluating performance and portability of a core bioinformatics kernel on multiple vendor GPUS. In: 2021 International workshop on performance, portability and productivity in HPC (P3HPC), pp 68–78. https://doi.org/10.1109/P3HPC54578.2021.00010
- Solis-Vasquez L, Mascarenhas E, Koch A (2023) Experiences migrating cuda to sycl: a molecular docking case study. In: Proceedings of the 2023 international workshop on OpenCL. IWOCL ’23. Association for Computing Machinery, New York. https://doi.org/10.1145/3585341.3585372
- Marinelli E, Appuswamy R (2021) OneJoin: cross-architecture, scalable edit similarity join for DNA data storage using oneAPI. In: ACM (ed) ADMS 2021, 12th international workshop on accelerating analytics and data management systems using modern processor and storage architectures, in conjunction with VLDB 2021, 16 August 2021, Copenhagen, Denmark, Copenhagen
- Johnston B, Vetter JS, Milthorpe J (2020) Evaluating the performance and portability of contemporary sycl implementations. In: 2020 IEEE/ACM international workshop on performance, portability and productivity in HPC (P3HPC), pp 45–56. https://doi.org/10.1109/P3HPC51967.2020.00010
- Breyer M, Daiß G, Pflüger D (2021) Performance-portable distributed k-nearest neighbors using locality-sensitive hashing and sycl. In: International workshop on OpenCL. IWOCL’21. Association for Computing Machinery, New York. https://doi.org/10.1145/3456669.3456692
- Shilpage WR, Wright SA (2023) An investigation into the performance and portability of sycl compiler implementations. In: Bienz A, Weiland M, Baboulin M, Kruse C (eds) High performance computing. Springer, Cham, pp 605–619
- Rognes T (2011) Faster Smith–Waterman database searches with inter-sequence SIMD parallelization. BMC Bioinform 12:221
- Constantinescu D-A, Navarro A, Corbera F, Fernández-Madrigal J-A, Asenjo R (2021) Efficiency and productivity for decision making on low-power heterogeneous cpu+gpu socs. J Supercomput 77(1):44–65. https://doi.org/10.1007/s11227-020-03257-3
- Nozal R, Bosque JL (2021) Exploiting co-execution with OneAPI: heterogeneity from a modern perspective. In: Sousa L, Roma N, Tomás P (eds) Euro-Par 2021: parallel processing. Springer, Cham, pp 501–516
- Marowka A (2022) Reformulation of the performance portability metric. Softw Pract Exp 52(1):154–171. https://doi.org/10.1002/spe.3002