Publicacions en què col·labora amb Enrique Salvador Quintana Ortí (81)

2024

  1. Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM

    ACM Transactions on Mathematical Software, Vol. 50, Núm. 1

  2. Automatic generation of ARM NEON micro-kernels for matrix multiplication

    Journal of Supercomputing, Vol. 80, Núm. 10, pp. 13873-13899

  3. Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors

    International Journal of High Performance Computing Applications, Vol. 38, Núm. 2, pp. 55-68

  4. Inference with Transformer Encoders on ARM and RISC-V Multicore Processors

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

  5. Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures

    Journal of Systems Architecture, Vol. 153

2022

  1. Anatomy of the BLIS Family of Algorithms for Matrix Multiplication

    Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022

  2. NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors

    Proceedings - Symposium on Computer Architecture and High Performance Computing

  3. QR Factorization Using Malleable BLAS on Multicore Processors

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

2021

  1. A New Generation of Task-Parallel Algorithms for Matrix Inversion in Many-Threaded CPUs

    Proceedings of the 12th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2021

  2. Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors

    Journal of Supercomputing, Vol. 77, Núm. 10, pp. 11257-11269

  3. Scalable Hybrid Loop- And Task-Parallel Matrix Inversion for Multicore Processors

    2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - In conjunction with IEEE IPDPS 2021

2020

  1. Integration and exploitation of intra-routine malleability in BLIS

    Journal of Supercomputing, Vol. 76, Núm. 4, pp. 2860-2875

  2. Programming parallel dense matrix factorizations with look-ahead and OpenMP

    Cluster Computing, Vol. 23, Núm. 1, pp. 359-375