Coordinated scheduling and resource management for heterogeneous clusters and grid systems

Rodero Castro, Iván

Coordinated scheduling and resource management for heterogeneous clusters and grid systems

Rodero Castro, Iván

Dirigida por:

Julita Corbalán González Director/a

Universidad de defensa: Universitat Politècnica de Catalunya (UPC)

Fecha de defensa: 05 de febrero de 2009

Tribunal:

Ignacio Martín Llorente Presidente
David Carrera Pérez Secretario/a
Juan Fernández Peinador Vocal
Xavier Martorell Bofill Vocal
Santiago Montero Herrero Vocal

Tipo: Tesis

Teseo: 275549 DIALNET

Resumen

Job scheduling strategies have been extensively studied in the last decades, The increasing demand for resources of High Performance Computing (HPC) systems has led to new forms of collaboration of distributed systems. In these new distributed scenarios, such as grid systems, traditional scheduling techniques have evolved into more complex and sophisticated approaches where other factors, such the heterogeneity of resources or geographical distribution, have been taken into account. In these architectures, existing HPC applications which were previously developed and probably paralleled, become inefficient when they are executed in a system that does not fit their original specifications. Moreover, due to the amount of software layers and components that are involved in a job execution, the global system becomes very complex. As well as the overhead of the new components that can decrease the jobs execution performance and the resource utilization, it is more sensitive to failures. Since the information and control available at the highest scheduling levels is far less than that available at local scheduling levels, the global scheduling and resource management of these systems becomes extremely complex and tedious. This Thesis aims to provide good low level support in order to improve scheduling at higher levels. To do this, we propose a coordinated architecture that considers all the scheduling layers that can be involved in a grid job execution: from brokering in the interoperable grid systems to the local resource scheduling. We also propose a set of coordination mechanisms and well-defined APIs between the scheduling layers. Finally, we propose scheduling policies to improve the applications execution performance and to enhance the resource utilization in three different scenarios: the cluster scheduling scenario, the grid scheduling scenario, and the interoperable grid scenario. The main contributions of this Thesis are summarized as follows: - In the cluster scheduling scenario we have developed an infrastructure based on the coordination between job scheduling and processor allocation tools. We also propose and evaluate scheduling policies based on co- allocation and dynamic processor allocation techniques. The scheduling strategy aims to improve the performance of the CPU-intensive parallel applications in heterogeneous clusters, based on SMP architectures. - In the grid scheduling scenario we have developed a grid resource management system and two different coordination mechanisms between the grid and the cluster scheduling layers. We also propose and evaluate job scheduling policies based on backfilling, and resource selection policies that consider dynamic performance information from the cluster scheduling layer. - In the interoperable grid scenario we propose and evaluate broker selection strategies that use aggregated resource information and dynamic performance information from the grid scheduling layer. Since they use aggregated resource information, we also propose and evaluate different resource aggregation algorithms. We have performed the evaluation of the cluster scheduling scenario in a real execution system, and we have used trace-driven simulations to evaluate the policies in the grid scenarios. The obtained results clearly support the argument that coordinating the different scheduling layers can improve the applications execution performance and the resource utilization in the different scenarios that we consider in this Thesis. The results achieved in this Thesis encourage us to continue our research in grid environments, especially in grid interoperability. It also motivates the extension of the work done in this Thesis to other newer paradigms such as service-oriented architectures, virtualization, and cloud computing.