New contributions for modeling and simulating high performance computing applications on parallel and distributed architectures

  1. Núñez Covarrubias, Alberto
Dirigida por:
  1. Javier Fernández Muñoz Director/a
  2. Jesús Carretero Pérez Director/a

Universidad de defensa: Universidad Carlos III de Madrid

Fecha de defensa: 04 de febrero de 2011

Tribunal:
  1. Félix García Carballeira Presidente/a
  2. Jose Daniel Garcia Sanchez Secretario/a
  3. Emilio Luque Fadón Vocal
  4. Antonio Plaza Miguel Vocal
  5. Claudia Casali Vocal

Tipo: Tesis

Resumen

In this thesis we propose a new simulation platform specifically designed for modeling parallel and distributed architectures, which consists on integrating the model of the four basic systems into a single simulation platform. Those systems consist of storage system, memory system, processing system and network system. The main characteristics of this platform are flexibility, to embrace the widest range of possible designs; scalability, to check the limits of extending the architecture designs; and the necessary trade-offs between the execution time and the accuracy obtained. This simulation platform is aimed to model both existent and new designs of HPC architectures and applications. Then, depending on the user's requirements, the model can be focused on a set of the basic systems, or by the contrary on the complete system. Therefore, a complete distributed system can be modeled by integrating those basic systems in the model, each one with the corresponding level of detail, which provides a high level of flexibility. Moreover, it provides a good compromise between accuracy and performance, and flexibility provided for building a wide range of architectures with different configurations. A validation process of the proposed simulation platform has been fulfilled by comparing the results obtained in real architectures with those obtained in the analogous simulated environments. Furthermore, in order to evaluate and analyze how evolve both scalability and bottlenecks existent on a typical HPC multi-core architecture using different configurations, a set of experiments have been achieved. Basically those experiments consist on executing the two application models (HPC and checkpointing applications) in several HPC architectures. Finally, performance results of the simulation itself for executing the corresponding experiments have been achieved. The main purpose of this process is to calculate both the amount of time and memory needed for executing a specific simulation, depending of the size of the environment to be modeled, and the hardware resources available for executing each simulation. ----------------------------------------------------------------------------------------------------------------------------------------------------------