Date of Original Version
Electrical, Computer and Biomedical Engineering
Lookup operations for in-memory databases are heavily memory bound, because they often rely on pointer-chasing linked data structure traversals. They also have many branches that are hard-to-predict due to random key lookups. In this study, we show that although cache misses are the primary bottleneck for these applications, without a method for eliminating the branch mispredictions only a small fraction of the performance benefit is achieved through prefetching alone. We propose the Node Tracker (NT), a novel programmable prefetcher/pre-execution unit that is highly effective in exploiting inter key-lookup parallelism to improve single-thread performance. We extend NT with branch outcome streaming (BOS) to reduce branch mispredictions and show that this achieves an extra 3× speedup. Finally, we evaluate the NT as a pre-execution unit and demonstrate that we can further improve the performance in both single- and multi-threaded execution modes. Our results show that, on average, NT improves single-thread performance by 4.1× when used as a prefetcher; 11.9× as a prefetcher with BOS; 14.9× as a pre-execution unit and 18.8× as a pre-execution unit with BOS. Finally, with 24 cores of the latter version, we achieve a speedup of 203× and 11× over the single-core and 24-core baselines, respectively.
ACM Transactions on Architecture and Code Optimization
Cavus, M., Shatnawi, M., Sendag, R., & Uht, A. K. (2021). Fast Key-Value Lookups with Node Tracker. ACM Transactions on Architecture and Code Optimization, 18(3), 34. https://doi.org/10.1145/3452099
Available at: https://doi.org/10.1145/3452099
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.