Registor: A platform for unstructured data processing inside SSD storage
Document Type
Conference Proceeding
Date of Original Version
6-4-2018
Abstract
This paper presents REGISTOR, a platform for regular expression grabbing i nside storage. The main idea of Registor is accelerating regular expression (regex) search inside storage where large data set is stored, eliminating the I/O bottleneck problem. A special hardware engine for regex search is designed and augmented inside flash SSD that processes data on-the-fly during data transmission from NAND flash to host. In order to make the speed of regex search match the internal bus speed of modern SSD, a deep pipeline structure is designed in Registor hardware consisting of file semantics extractor, matching candidates finder, regex matching units (REMUs) and results organizer. Furthermore, each stage of the pipeline makes use of maximal parallelism possible. To make Registor readily usable by high level applications, we have developed a set of APIs and libraries in Linux allowing Registor to process files in SSD by recombining separate data blocks into files efficiently. A working prototype of Registor has been built in our newly designed NVMe-SSD. Extensive experiments and analyses have been carried out to show that Registor achieves high throughput, reduces I/O bandwidth requirement by up to 97% and CPU utilization by as much as 82% for regex search in large data sets.
Publication Title, e.g., Journal
SYSTOR 2018 - Proceedings of the 11th ACM International Systems and Storage Conference
Citation/Publisher Attribution
Pei, Shuyi, Jing Yang, and Qing Yang. "Registor: A platform for unstructured data processing inside SSD storage." SYSTOR 2018 - Proceedings of the 11th ACM International Systems and Storage Conference (2018): 13-25. doi: 10.1145/3211890.3211900.