FRANCISCO DANIEL
IGUAL PEÑA

Profesor titular de universidad

Foto de FRANCISCO DANIEL

Foto de Universidad Politécnica de Valencia

Universidad Politécnica de Valencia

Valencia, España

Publications en collaboration avec des chercheurs de Universidad Politécnica de Valencia (26)

2024

Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM
ACM Transactions on Mathematical Software, Vol. 50, Núm. 1
Automatic generation of ARM NEON micro-kernels for matrix multiplication
Journal of Supercomputing
Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors
International Journal of High Performance Computing Applications, Vol. 38, Núm. 2, pp. 55-68

2023

Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors
ACM International Conference Proceeding Series
Fine-grain task-parallel algorithms for matrix factorizations and inversion on many-threaded CPUs
Concurrency and Computation: Practice and Experience
Micro-kernels for portable and efficient matrix multiplication in deep learning
Journal of Supercomputing, Vol. 79, Núm. 7, pp. 8124-8147
Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures
Journal of Parallel and Distributed Computing, Vol. 175, pp. 51-65

2022

Anatomy of the BLIS Family of Algorithms for Matrix Multiplication
Proceedings - 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2022
NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors
Proceedings - Symposium on Computer Architecture and High Performance Computing
QR Factorization Using Malleable BLAS on Multicore Processors
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

2021

A New Generation of Task-Parallel Algorithms for Matrix Inversion in Many-Threaded CPUs
Proceedings of the 12th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2021
Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors
Journal of Supercomputing, Vol. 77, Núm. 10, pp. 11257-11269
Scalable Hybrid Loop- And Task-Parallel Matrix Inversion for Multicore Processors
2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - In conjunction with IEEE IPDPS 2021

2020

Integration and exploitation of intra-routine malleability in BLIS
Journal of Supercomputing, Vol. 76, Núm. 4, pp. 2860-2875
Programming parallel dense matrix factorizations with look-ahead and OpenMP
Cluster Computing, Vol. 23, Núm. 1, pp. 359-375

2018

Optimized Fundamental Signal Processing Operations for Energy Minimization on Heterogeneous Mobile Devices
IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 65, Núm. 5, pp. 1614-1627

2017

Solving Weighted Least Squares (WLS) problems on ARM-based architectures
Journal of Supercomputing, Vol. 73, Núm. 1, pp. 530-542

2015

Time and energy modeling of high-performance Level-3 BLAS on x86 architectures
Simulation Modelling Practice and Theory, Vol. 55, pp. 77-94
Vectorization of binaural sound virtualization on the ARM Cortex-A15 architecture
2015 23rd European Signal Processing Conference, EUSIPCO 2015

2014

Enhancing performance and energy consumption of runtime schedulers for dense linear algebra
Concurrency Computation Practice and Experience