Design and performance evaluation of cache memories for high-performance computers
Cache memory is an important level of the memory hierarchy, and its performance and implementation cost have dramatic effects on overall system performance. This dissertation presents studies on designs and performance/cost evaluations of cache memories for modern high performance computers.^ Through extensive simulations on widely acceptable benchmarks, we identify that the cache line conflicts due to the poor address sequentiality in vector references dominate vector cache misses. Vector caches can benefit greatly from a prime-mapped cache and a limited amount of stride-directed prefetching on each cache miss. The properly designed prime-mapped cache can double and even triple the overall performance of existing vector computers in terms of overall execution time. We conclude that cache memory can improve the performance of vector processing and is a cost-effective enhancement towards a smooth memory hierarchy for vector computers.^ Implementation cost is a vital factor for on-chip cache memory performance. Taking advantage of the address locality that exists in the majority of program executions, we extend a new cache design dimension by introducing an additional caching level in the tag area to reduce the overhead tag area cost. We call this new cache design Caching-Address-Tags cache (CAT). Our performance results show that by keeping only a limited number of distinct tags of cached data rather than having as many tags as cache lines, the CAT cache can reduce the cost of implementing tag memory by an order of magnitude without noticeable performance difference from ordinary caches.^ A CAT-skewed cache (or CATS cache for short) is proposed in this research to improve the data area utilization in the CAT organization by evenly distributing cached data in an on-chip cache memory. Compared to the conventional cache and the CAT cache, the CATS cache improves performance by 10% to 60% in terms of cache miss ratio. The cache coherence issues of the CATS cache have also been investigated for cache-coherent multiprocessors. Two efficient designs are presented for cache-based multiprocessors using CATS caches based on a directory coherence protocol. Simulation results show that the CATS cache can be applied to cache-coherent multiprocessors with the similar advantages as for single-chip processors. Furthermore, better performance is observed if the structure of each coherence directory employs the CAT organization.^ An execution-driven simulation method for both uniprocessor and multiprocessor systems is developed on the IBM 3090VF architecture in this research. Performance results reported in this dissertation are obtained by using this simulation method. ^
Engineering, Electronics and Electrical|Computer Science
"Design and performance evaluation of cache memories for high-performance computers"
Dissertations and Master's Theses (Campus Access).