Processing in Storage, the next Generation of Storage System

Dongyang Li, University of Rhode Island


In conventional computation models, the software relies on the CPU to handle the process applications and assign computation tasks to heterogeneous accelerators such as GPU, TPU and FPGA. It requires the CPU to first fetch raw data out of the storage device and move the data to the heterogeneous accelerators. After the accelerators finished computation task, the results flush to the main memory of the host sever for software applications. There are one major disadvantages of this architecture, which the accelerators are located far from the raw data storage. There are data movement on system bus such as NVM-e/PCI-e which requires a lot of transmission time and bus bandwidth. When data go back and forth on Storage data bus, it decreases the overall performance. This dissertation presents the in-storage processing (ISP) architecture that offloading the computation tasks into storage device. The proposed ISP architecture eliminates the back and forth data movement on the system bus. It only delivers the computation results to the host main memory which saves the storage bus bandwidth. The ISP uses FPGA as data processing unit to process computation tasks in real-time. Due to the parallel and pipeline architecture of the FPGA implementation, the ISP architecture can process data in short latency and it has minimal effects on the data flow of the original storage system. In this dissertation, we proposed four ISP applications. The first ISP application is the Hardware Object Deserialization in SSD (HODS) which is designed to tailor to high-speed data conversion inside storage device. The HODS shows visible differences compare to software object deserializaion regarding application execution time while running Matlab, 3D modeling, and scientific computations. The second ISP application is called the CISC: Coordinating Intelligent SSD and CPU. It speeds up the Minimum Spanning Tree (MST) applications in graph processing. The CISC coordinates the computing power inside SSD storage with host CPU cores. It outperforms the traditional software MST by 35% speedup. The third application speeds up the data fingerprint computation inside storage device. By pipelining multi data computation units, the proposed architecture can process the Rabin fingerprint computaion in the wire speed of the storage data bus transmission. The scheme is extensible to other types of fingerprints and CRC computations, and is readily applicable to primary storages and caches in hybrid storage systems. The fourth application is the storage data deduplication. It eliminates duplicate date inside storage and provides at least 6 times speedup in throughput or latency over software data dedupe running on the state-of-art servers. The four ISP applications prove the concept of the computational storage. In the future, more computation intensive tasks can be offloaded into storage device instead of processing in the CPU or in the heterogeneous accelerators (GPU, TPU/FPGA). From our previous work, we can infer that the ISP conputational storage can be the next generation of storage architecture.

Subject Area

Electrical engineering

Recommended Citation

Dongyang Li, "Processing in Storage, the next Generation of Storage System" (2019). Dissertations and Master's Theses (Campus Access). Paper AAI13807380.