Publications in collaboration with researchers from University of Texas at Austin (17)

2016

  1. Analytical modeling is enough for high-performance BLIS

    ACM Transactions on Mathematical Software, Vol. 43, Núm. 2

  2. The BLIS framework: Experiments in portability

    ACM Transactions on Mathematical Software, Vol. 42, Núm. 2

2013

  1. Implementing triple adjacent Error Correction in double error correction Orthogonal Latin Squares Codes

    Proceedings - IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems

  2. Scheduling algorithms-by-blocks on small clusters

    Concurrency Computation Practice and Experience, Vol. 25, Núm. 3, pp. 367-384

2012

  1. A runtime system for programming out-of-core maatrix algorithms-by-tiles on multithreaded architectures

    ACM Transactions on Mathematical Software, Vol. 38, Núm. 4

  2. Level-3 BLAS on a GPU: Picking the low hanging fruit

    AIP Conference Proceedings

  3. Level-3 BLAS on the TI C6678 multi-core DSP

    Proceedings - Symposium on Computer Architecture and High Performance Computing

  4. The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations

    Journal of Parallel and Distributed Computing, Vol. 72, Núm. 9, pp. 1134-1143

  5. Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC

    International Conference for High Performance Computing, Networking, Storage and Analysis, SC

2011

  1. Power-aware dense linear algebra implementations on multi-core and many-core processors

    3rd Many-Core Applications Research Community Symposium, MARC 2011

2010

  1. Retargeting PLAPACK to clusters with hardware accelerators

    Proceedings of the 2010 International Conference on High Performance Computing and Simulation, HPCS 2010

2009

  1. Out-of-core solution of linear systems on graphics processors

    International Journal of Parallel, Emergent and Distributed Systems, Vol. 24, Núm. 6, pp. 521-538

  2. Solving dense linear systems on platforms with multiple hardware accelerators

    ACM SIGPLAN Notices, Vol. 44, Núm. 4, pp. 121-129

  3. Solving dense linear systems on platforms with multiple hardware accelerators

    Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP