Memory disambiguation hardwarea Review

  1. Castro, Fernando
  2. Chaver, Daniel
  3. Piñuel, Luis
  4. Prieto, Manuel
  5. Tirado Fernández, Francisco
Revista:
Journal of Computer Science and Technology

ISSN: 1666-6038

Año de publicación: 2008

Título del ejemplar: Twenty-Fourth Issue

Volumen: 8

Número: 3

Páginas: 132-138

Tipo: Artículo

Otras publicaciones en: Journal of Computer Science and Technology

Resumen

One of the main challenges of modern processor designs is the implementation of scalable and efficient mechanisms to detect memory access order violations as a result of out-of-order execution. Conventional structures performing this task are complex, inefficient and power-hungry. This fact has generated a large body of work on optimizing address-based memory disambiguation logic, namely the load-store queue. In this paper we review the most significant proposals in this research field, focusing on our own contributions.

Referencias bibliográficas

  • References [1] J. Tendler, J. Dodson, J. Fields, H. Le and B. Sinharoy, “Power4 System Microarchitecture”, IBM Journal of Research and Development, Vol 46, No. 1, 2002, pp. 5-25.
  • [2] R. Kessler, “The Alpha 21264 Microprocessor”, IEEE Micro, Vol. 9, No. 2, 1999, pp. 24-36.
  • [3] A. Moshovos, S. Breach, T. Vijaykumar and G. Sohi. “Dynamic Speculation and Synchronization of Data Dependences”. In Int’l Symp. on Computer Architecture, 1997, pp. 181-193.
  • [4] G. Chrysos and J. Emer. “Memory Dependence Prediction using Store Sets”. In Int’l Symp. on Computer Architecture, 1998, pp. 142-153.
  • [5] S. Subramaniam and G. Loh. “Store Vectors for Scalable Memory Dependence Prediction and Scheduling”. In Int’l Symp. on High-Performance Computer Architecture, 2006, pp. 65-76.
  • [6] M. Goshima, K. Nishino, Y. Nakashima, S. Mori, T. Kitamura and S. Tomita. “A High-Speed Dynamic Instruction Scheduling Scheme for Superescalar Processors. In Int’l Symp. on Microarchitecture, 2001, pp. 225-236.
  • [7] C. Fang, S. Carr, S. Onder and Z. Wang. “Feedback-Directed Memory Disambiguation through Store Distance Analysis”. In Int’l Conference on Supercomputing, 2006, pp. 278-287.
  • [8] S. Sethumadhavan, R. Desikan, D. Burger, C. R. Moore, S. W. Keckler. “Scalable Hardware Memory Disambiguation for High ILP Processors”. In Int’l Symp. on Microarchitecture, 2003, pp. 399-410.
  • [9] B. Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors”, Communications of the ACM, Vol. 13, No. 7, 1970, pp. 422-426.
  • [10] I. Park, C. L. Ooi, T. N. Vijaykumar. “Reducing Design Complexity of the Load-Store Queue”. In Int’l Symp. on Microarchitecture, 2003, pp. 411-422.
  • [11] T. Sha, M. M. K. Martin, A. Roth. “Scalable Store–Load Forwarding via Store Queue Index Prediction”. In Int’l Symp. on Microarchitecture, 2005, pp. 159-170.
  • [12] L. Baugh and C. Zilles, “Decomposing the Load-Store Queue by Function for Power Reduction and Scalability”, IBM Journal of Research and Development, Vol. 50, No. 2-3, 2006, pp. 287-298.
  • [13] A. Roth. “A High-Bandwidth Load-Store Unit For Single- and Multi- Threaded Processors”. Technical report (CIS), Development of Computer and Information Science, University of Pennsylvania, 2004.
  • [14] S. S. Stone, K. M. Woley and M. I. Frank. “Address-Indexed Memory Disambiguation and Store-to-Load Forwarding”. In Int’l Symp. on Microarchitecture, 2005, pp. 171-182.
  • [15] H. Akkary, R. Rajwar and S. Srinivasan. “Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors”. In Int’l Symp. on Microarchitecture, 2003, pp. 423-434.
  • [16] E. Torres, P. Ibañez, V. Viñals and J. Llaberia. “Store Buffer Design in First-Level Multibanked Data Caches”. In Int’l Symp. on Computer Architecture, 2005, pp. 469-480.
  • [17] S. Sethumadhavan, F. Roesner, J. S. Emer, D. Burger and S. W. Keckler. “Late-Binding: Enabling Unordered Load-Store Queues. In Int’l Symp. on Computer Architecture, 2007, pp. 347-357.
  • [18] H. W. Cain and M. H. Lipasti. “Memory Ordering: a Value-Based Approach”. In Int’l Symp. on Computer Architecture, 2004, pp. 90-101.
  • [19] A. Roth. “Store Vulnerability Window (SVW): ReExecution Filtering for Enhanced Load Optimization”. In Int’l Symp. on Computer Architecture, 2005, pp. 458-468.
  • [20] S. Subramaniam and G. Loh. “Fire-and-Forget: Load-Store Scheduling with no Store Queue”. In Int’l Symp. on Microarchitecture, 2006, pp. 273-284.
  • [21] F. Castro, D. Chaver, L. Piñuel, M. Prieto, M. Huang and F. Tirado “Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism”. In Int’l Conference on Computer Design, 2005, pp. 617-624.
  • [22] A. Garg, F. Castro, M. Huang, L. Piñuel, D. Chaver and M. Prieto. “Substituting Associative Load Queue with Simple Hash Table in Out-of-Order Microprocessors”. In Int’l Symp. on Low-Power Electronics, 2006, pp. 268-273.
  • [23] F. Castro, L. Piñuel, D. Chaver, M. Prieto, M. Huang and F. Tirado “DMDC: Delayed Memory Dependence Checking through Age-Based Filtering”. In Int’l Symposium on Microarchitecture, 2006, pp. 297-308.
  • [24] F. Castro, R. Noor, A. Garg, D. Chaver, M. Huang, L. Piñuel, M. Prieto and F. Tirado. “Replacing Associative Load Queues: a Timing-Centric Approach”. To appear in IEEE Transactions on Computers, 2008.