Thermal aware microarchitectures

CHAPARRO MONFERRER, PEDRO

Thermal aware microarchitectures

CHAPARRO MONFERRER, PEDRO

Dirigida por:

Antonio González Colás Director/a
José González González Codirector/a

Universidad de defensa: Universitat Politècnica de Catalunya (UPC)

Fecha de defensa: 27 de febrero de 2008

Tribunal:

José María Llaberia Griño Presidente/a
Ramon Canal Corretger Secretario/a
Manuel Prieto Matías Vocal
Margaret Martonosi Vocal
José Francisco Duato Marin Vocal

Tipo: Tesis

Teseo: 174401 DIALNET

Resumen

Power density and heat removal are increasing challenges in each processor generation Keeping silicon at an operating temperature is becoming more challenging and expensive as the power density of microprocessors increases Higher temperatures increase the cost of the package and the thermal solution of a processor, its leakage power, and penalize its performance. Moreover, localized hot spots may create transient high temperature in a restricted area of the chip, which is a source of faults and reduces chip reliability. In such circumstances, techniques to reduce the performance loss due to thermal emergency mechanism (DTM) are crucial. This work proposes several microarchitectural innovations both core scope and in multi-score systems with the purpose of managing temperature with several objectives reduce cooling and packaging costs, improve performance and reduce power. We first propose techniques for the backend of a processor based on clustering redesigning a monolithic core transforming it into a 4-cluster microarchitecture reduces by 27% the maximum temperature and by 27% leakage. This comes at the expense of a reduction in the average number of instructions committed per cycle of 20%. We also propose novel thermal aware instruction steering schemes -the logic that decides the destination cluster of each instruction this, combined with a cluster hopping scheme-which consists of disabling some particular clusters during a period of time and then rotating the disabled clusters-reduces the maximum temperature by 8% and leakage by 30% with a slowdown of just 5%. For the fronted we propose a mechanism to partition the rename and commit logic, which reduces temperature by more than 30% with a impact on performance of 2%. Moreover, a banked design with a bank hopping scheme is proposed for the trace cache it is also enhanced with a thermal-aware address-to-bank mapping that attempts to balance temperature among banks. The combination of both trace cache techniques reduces the maximum and the average temperature by 14% and 17% respectively. When the partitioned rename and commit it combined with the thermal-aware trace cache, the temperature benefits range between 25% and 35% for the different blocks. Furthermore, fine grain dynamic voltage and frequency scaling (DVFS) is analyzed as a thermal management techniques in a Multiple-Clock Domain architecture, which allows for independent DVFS in different par the chips. The performance improvement in high-power applications ranges between 6% and (18%. We analyze in depth the impact of a range of parameters in multicore designs. In many cases global (chip-wide) DVFS is the least effective technique since it slows down the whole chip whereas percore DVFS is the most effective scheme in almost all configurations However, global DVFS combined with thread migration (TM) is competitive against per-core DVFS with significantly less complexity. Furthermore, we propose novel TM schemes that provide performance improvements (that depend on each particular configuration) over existing schemes. Finally, we explore techniques that use microarchitecturally controlled Thin Film Thermoelectric cooling devices (TFTECs) combined with DVSF and DVFS and M- Our novel schemes provide either a low complexity controller that provides a significant performance boost classical DTM or a system with significantly less complexity that can perform as well as the best classical DTM we how that a microprocessor with TFTECs ca perform whit thin 8% of the performance when an ideal thermal solution is implemented( with impressive speedups over classical DTM).