The impact of wrong-path memory references in cache-coherent multiprocessor systems
Document Type
Article
Date of Original Version
12-1-2007
Abstract
The core of current-generation high-performance multiprocessor systems is out-of-order execution processors with aggressive branch prediction. Despite their relatively high branch prediction accuracy, these processors still execute many memory instructions down mispredicted paths. Previous work that focused on uniprocessors showed that these wrong-path (WP) memory references may pollute the caches and increase the amount of cache and memory traffic. On the positive side, however, they may prefetch data into the caches for memory references on the correct-path. While computer architects have thoroughly studied the impact of WP effects in uniprocessor systems, there is no comparable work for multiprocessor systems. In this paper, we explore the effects of WP memory references on the memory system behavior of shared-memory multiprocessor (SMP) systems for both broadcast and directory-based cache coherence. Our results show that these WP memory references can increase the amount of cache-to-cache transfers by 32%, invalidations by 8% and 20% for broadcast and directory-based SMPs, respectively, and the number of writebacks by up to 67% for both systems. In addition to the extra coherence traffic, WP memory references also increase the number of cache line state transitions by 21% and 32% for broadcast and directory-based SMPs, respectively. In order to reduce the performance impact of these WP memory references, we introduce two simple mechanisms-filtering WP blocks that are not likely-to-be-used and WP aware cache replacement-that yield speedups of up to 37%. © 2007 Elsevier Inc. All rights reserved.
Publication Title, e.g., Journal
Journal of Parallel and Distributed Computing
Volume
67
Issue
12
Citation/Publisher Attribution
Sendag, Resit, Ayse Yilmazer, Joshua J. Yi, and Augustus K. Uht. "The impact of wrong-path memory references in cache-coherent multiprocessor systems." Journal of Parallel and Distributed Computing 67, 12 (2007): 1256-1269. doi: 10.1016/j.jpdc.2007.03.005.