Speeding up sequential applications on multicore platforms

RANJAN, RAKESH

Speeding up sequential applications on multicore platforms

RANJAN, RAKESH

Dirigida por:

Antonio González Colás Director/a
Fernando Latorre Salinas Codirector/a
Pedro Marcuello Pascual Codirector/a

Universidad de defensa: Universitat Politècnica de Catalunya (UPC)

Fecha de defensa: 11 de noviembre de 2010

Tribunal:

Francisco Tirado Fernández Presidente
Jordi Tubella Murgadas Secretario/a
Mario Daniel Nemirovsky Vocal
José Francisco Duato Marin Vocal
Enric Gibert Codina Vocal

Tipo: Tesis

Teseo: 111475 DIALNET

Resumen

For the past several decades Moore's law has enabled the semiconductor industry to double the transistors on the chip roughly every 18 Months. For a long time this continuous increase in transistor budget translated into increase in performance as the processors continued to exploit ILP In the programs. This pattern hit a roadblock circa early 2000 when ILP reached limits of diminishing returns as well as designing larger and more complex cores became difficult due to power and complexity reasons. As a way out of this problem, designers started making Multicore processors which include several cores on the same chip. With Moore's law still continuing, the doubling of transistors has now translated into roughly doubling of number of cores on chip every 18 months. The Multicore processors improve the performance of applications by explOiting Thread Level Parallelism (TLP) while the Instruction Level Parallelism (ILP) exploited by each individual core is limited. While this platform is very good for multithreaded and multiprogrammed workloads, it is not viable for sequential applications as conventionally they rely on the ILP improvements of a single core. In order to take benefit of the Multicore platforms for sequential applications, two main directions have been followed in the research immunity: Speculative Multithreading and Non-Speculative Clustered architectures. While Speculative Multithreading splits a sequential application into speculative threads, the later schemes partition the instructions among the cores based on data-dependences but avoid large degree of peculation. While there has been a significant amount of work in both these approaches, the performance improvements shown by these techniques ave not been very impressive. In this thesis we study the primary bottlenecks of a state of the art speculative multithreading architecture based on p-slices and propose ome novel solutions to alleviate them. We also propose a novel hardware only architecture that takes the best of both worlds i.e. Speculative multithreading and Clustered schemes and readily adapts the Multicore resources to the available np and [LP available in the application.