Performance evaluation of applications for heterogeneous systems by means of performance probes

Strube, Alexandre Otto

Performance evaluation of applications for heterogeneous systems by means of performance probes

Strube, Alexandre Otto

Dirigida por:

Emilio Luque Fadón Director/a

Universidad de defensa: Universitat Autònoma de Barcelona

Fecha de defensa: 15 de julio de 2011

Tribunal:

Francisco Tirado Fernández Presidente
Juan Touriño Secretario/a
Felix Wolf Vocal

Tipo: Tesis

Teseo: 312215 DIALNET DDD editor

Resumen

This doctoral Thesis describes a novel way to select the best computer node out of a pool of available potentially heterogeneous computing nodes for the execution of computational tasks. This is a very basic and dificult problem of computer science and computing centres tried to get around it by using only homogeneous compute clusters. Usually this fails as like any technical equipment, clusters get extended, adapted or repaired over time, and you end up with a heterogeneous configuration. So far, the solution for this, was: • To leave it to the computer users to choose the right node(s) for execution, or •To make extensive tests by executing and measuring all tasks on every type of computing node available in the pool. In the typical case, where a large number of tasks would need to be tested on many different types of nodes, this could use a lot of computing resources, sometimes even more than the actual execution one wants to optimize. In a specific situation (hierarchical multi-clusters), the situation is worse, as the configuration of the cluster changes over time, so that the execution tests would have to be done over and over, every time the configuration of the cluster is changed. I developed a novel and elegant solution for this problem, named "Performance Probe", or just "Probe", for short. A probe is a striped-down version of a compu- tational task which includes all important characteristics of the original task, but can be executed in a much shorter time (seconds, instead of hours), is much smaller than the original task (about 5% of the original size in the worst cases), but allows to predict the execution time of the original within reasonable bounds (around 90% accuracy). These results are very important: as scheduling is a basic problem of computer science, these results cannot only be used in the setting described by the thesis (of setting the right compute node for tasks in a hierarchical multi-cluster), but can also be applied in many diferent contexts every time scheduling and/or selection decisions have to be made: selecting where a computational task would run most efficiently (which cluster at which centre); picking the right execution nodes in a large complex (grid, cloud), work ows and many more.