event

Ph.D. Dissertation Defense - Vinson Young

Primary tabs

TitleIntelligent Cache Management for 3D Memory Systems

Committee:

Dr. Moinuddin Qureshi, ECE, Chair , Advisor

Dr. Hyesoon Kim, CoC

Dr. Sudhakar Yalamanchili, ECE

Dr. Aamer Jaleel, NVIDIA

Dr. Milos Prvulovic, ECE

Abstract:

DRAM caches are important for enabling effective heterogeneous memory systems that can transparently provide the bandwidth of high-bandwidth memories(HBM), latency of lower-latency memories(DRAM), and the capacity of high-capacity memories(DRAM/3D-XPoint). We investigate enabling intelligent cache management for DRAM caches similar to ones already implemented in Intel Knights Landing. Such DRAM caches use a direct-mapped design, co-locate the tag and data within the DRAM array, and  stream out the tag and the data concurrently on an access. But, such a direct-mapped organization can be subject to low hit-rate. We can attempt to use traditional methods to improve hit-rate and performance, such as associativity, intelligent replacement, or prefetching. However, simply applying traditional "well-understood" cache designs and memory designs to stacked memory results in low bandwidth utilization, high latency, and low overall system performance. To fully utilize the potential of stacked memory, we must architect systems to exploit the unique latency and bandwidth characteristics offered by DRAM. Throughout our work, we investigate and show how to enable associativity, intelligent replacement, and cache compression, in a bandwidth-efficient and scalable/low-SRAM-cost manner, to improve the performance of DRAM caches. (1) Associativity can be helpful for improving cache hit-rate, but we find that it cannot come at the cost of latency or bandwidth or it may risk degrading performance. Through ACCORD, we show how to scale way-prediction to giga-scale DRAM caches (by coordinating way-install and way-prediction) to enable similar performance to an ideal 2-way associative cache with <1KB SRAM. (2) Intelligent replacement policies, such as RRIP, can be used improve cache hit-rate. Through RRIP-AOB + ETR, we show how to achieve intelligent replacement in direct-mapped caches, by formulating replacement policies as bypassing policies and reducing state-update cost by coordinating replacement across sets. (3) Cache compression can also be used to improve DRAM-cache performance. Through DICE, we show how to use cache compression to achieve bandwidth-free prefetching. (4) To follow up, we find that future hybrid memory systems containing DRAM + 3D-XPoint are going to even more bandwidth-bound as the two memories likely to share DDR4 channels. To overcome DRAM cache bandwidth bloat, we propose a Dual-Tag approach to enable bandwidth-efficient and scalable/low-cost DRAM cache management. Finally, we combine the proposed techniques proposed to achieve a bandwidth-efficient and scalable cache with intelligent replacement and prefetching, that enables near ideal DRAM cache performance with only 34KB SRAM storage in the memory controller. Such scalable/low-cost high-performance DRAM cache controllers can make DRAM caching suitable for widespread deployment and can improve future memory-technology based caches.

 

Status

  • Workflow Status:Published
  • Created By:Daniela Staiculescu
  • Created:03/04/2019
  • Modified By:Daniela Staiculescu
  • Modified:03/04/2019

Categories

Target Audience