Energy efficient cache architectures for single, multi and many core processors
Thucanakkenpalayam Sundararajan, Karthik
Sundararajan, Karthik T.
MetadataShow full item record
With each technology generation we get more transistors per chip. Whilst processor frequencies have increased over the past few decades, memory speeds have not kept pace. Therefore, more and more transistors are devoted to on-chip caches to reduce latency to data and help achieve high performance. On-chip caches consume a significant fraction of the processor energy budget but need to deliver high performance. Therefore cache resources should be optimized to meet the requirements of the running applications. Fixed configuration caches are designed to deliver low average memory access times across a wide range of potential applications. However, this can lead to excessive energy consumption for applications that do not require the full capacity or associativity of the cache at all times. Furthermore, in systems where the clock period is constrained by the access times of level-1 caches, the clock frequency for all applications is effectively limited by the cache requirements of the most demanding phase within the most demanding application. This motivates the need for dynamic adaptation of cache configurations in order to optimize performance while minimizing energy consumption, on a per-application basis. First, this thesis proposes an energy-efficient cache architecture for a single core system, along with a run-time support framework for dynamic adaptation of cache size and associativity through the use of machine learning. The machine learning model, which is trained offline, profiles the application’s cache usage and then reconfigures the cache according to the program’s requirement. The proposed cache architecture has, on average, 18% better energy-delay product than the prior state-of-the-art cache architectures proposed in the literature. Next, this thesis proposes cooperative partitioning, an energy-efficient cache partitioning scheme for multi-core systems that share the Last Level Cache (LLC), with a core to LLC cache way ratio of 1:4. The proposed cache partitioning scheme uses small auxiliary tags to capture each core’s cache requirements, and partitions the LLC according to the individual cores cache requirement. The proposed partitioning uses a way-aligned scheme that helps in the reduction of both dynamic and static energy. This scheme, on an average offers 70% and 30% reduction in dynamic and static energy respectively, while maintaining high performance on par with state-of-the-art cache partitioning schemes. Finally, when Last Level Cache (LLC) ways are equal to or less than the number of cores present in many-core systems, cooperative partitioning cannot be used for partitioning the LLC. This thesis proposes a region aware cache partitioning scheme as an energy-efficient approach for many core systems that share the LLC, with a core to LLC way ratio of 1:2 and 1:1. The proposed partitioning, on an average offers 68% and 33% reduction in dynamic and static energy respectively, while again maintaining high performance on par with state-of-the-art LLC cache management techniques.