Date of Award


Degree Type


Degree Name

Doctor of Philosophy in Electrical Engineering


Electrical, Computer, and Biomedical Engineering

First Advisor

Qing Yang


In conventional computer systems, software relies on the CPU to handle the process applications and assign computation tasks to heterogeneous accelerators such as GPU, TPU and FPGA. It requires the CPU to fetch data out of the storage device and move the data to the heterogeneous accelerators. After the accelerators complete computation tasks, the results are flushed to the main memory of the host server for software applications. In this architecture, the heterogeneous accelerators are located far away from the storage device. There are data movements on the system bus (NVM-express/PCI-express), which requires a lot of transmission time and bus bandwidth. When data move back and forth on the storage data bus, it decreases the overall performance of the storage system.

This dissertation presents the in-storage processing (ISP) architecture that offloads the computation tasks into the storage device. The proposed ISP architecture eliminates the back and forth data movements on the system bus. It only delivers the computation results to the host memory, saving the storage bus bandwidth. The ISP uses FPGA as a data processing unit to process computation tasks in real-time. Due to the parallel and pipeline architecture of the FPGA implementation, the ISP architecture processes data in short latency, and it has minimal effects on the data ow of the original storage system.

In this dissertation, we proposed four ISP applications. The first ISP application is the Hardware Object Deserialization in SSD (HODS), which is designed to tailor to the high-speed data conversion inside the storage device. The HODS shows visible differences compared to software object deserialization regarding application execution time while running Matlab, 3D modeling, and other scientific computations. The second ISP application is called the CISC: Coordinating Intelligent SSD and CPU. It speeds up the Minimum Spanning Tree (MST) applications in graph processing. The CISC coordinates the computing power inside SSD storage with the host CPU cores. It outperforms the traditional software MST by 35%. The third application speeds up the data fingerprint computation inside the storage device. By pipelining multi data computation units, the proposed architecture processes the Rabin fingerprint computation in wire speed of the storage data bus transmission. The scheme is extensible to other types of fingerprint/CRC computations and readily applicable to primary storage and caches in hybrid storage systems. The fourth application is data deduplication. It eliminates duplicate date inside the storage and provides at least six times speedup in throughput over software.

The proposed ISP applications in this dissertation prove the concept of computational storage. In the future, more compute-intensive tasks can be deployed into the storage device instead of processing in the CPU or heterogeneous accelerators (GPU, TPU/FPGA). The ISP is extensible to the primary storage and applicable for the next generation of the storage system.