Performance of Cache Memories for Vector Computers
Date of Original Version
Recent studies  have shown that the memory system is the major bottleneck of vector computers. Performance measurements on CRAY-2 computers indicate that as much as a factor of 4.4 performance degradation results from memory latency. In this paper, we study the performance of incorporating cache memories into vector computers. Two cache organizations that we proposed  are considered: direct-mapped cache and prime-mapped cache. We analyze the caching behavior of three typical blocked algorithms for numerical applications: matrix multiplication, Gaussian elimination, and FFT. By analyzing the algorithm structures in conjunction with system architectures, we develop analytical models based on real applications rather than on statistical estimates. Our performance models give the expected value of execution time of an algorithm averaged over a wide range of problem sizes. Performance measurements of the algorithms on a real machine are carried out to validate our analysis. Numerical results show that the prime-mapped cache minimizes the cache miss ratio caused by line interferences that are critical for numerical applications. © 1993 Academic Press. All rights reserved.
Journal of Parallel and Distributed Computing
Yang, Qing. "Performance of Cache Memories for Vector Computers." Journal of Parallel and Distributed Computing 19, 3 (1993): 163-178. doi:10.1006/jpdc.1993.1102.