Date of Award

2016

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Engineering

First Advisor

Qing Yang

Abstract

Yearly increases in computer performance have diminished as of late, mostly due to the inability of transistors, the building blocks of computers, to deliver the same rate of performance seen in the 1980’s and 90’s. Shifting away from traditional CPU design, accelerator architectures have been shown to offer a potentially untapped solution. These architectures implement unique, custom hardware to increase the speed of certain tasking, such as graphics processing. The studies undertaken for this dissertation examine the ability of unique accelerator hardware to provide improved power and speed performance over traditional means, with an emphasis on classification tasking.

In the first study, the compression algorithm Lempel-Ziv-Oberhumer (LZO) 1x-1-15 is analyzed and documented. This algorithm family has seen widespread use and can be found in the NASA mars space rover and the B-tree Linux file system. A thorough analysis of the algorithm is seen to yield x86 vector and other CPU parallelization improvements that can be utilized for acceleration. Real-world datasets are used to concretely benchmark the improved performance.

The second study shifts the focus from CPU instruction acceleration to optimized hardware acceleration. A real-world embedded application of machine learning involving Support Vector Machine (SVM) accelerated hardware is examined. Prior work developed by URI’s Biomedical Engineering department investigated the use of a state-of-the-art SVM-based algorithm to control an artificial limb in real-time. Evaluation of the algorithm was performed using general processing means, using a Core i7 CPU and an Intel ATOM mobile CPU. This study builds on the prior work, investigating the performance advantages imparted by implementing the SVM decision function in hardware and combining it with a hardware-based feature extractor on a Field Programmable Gate Array (FPGA). The design is evaluated for both accuracy and real-time response to determine if the FPGA implementation is a better choice for implementation in a power-limited cyber physical system.

The third study examines the SVM classification portion of the FPGA design that was constructed for use in the artificial limb in further detail. A general purpose hardware architecture for fast, accurate SVM classification, R2SVM, is proposed. While several similar architectures have been published, our architecture is shown to be superior in a several ways. To prove the performance, accuracy, and power consumption of the architecture, a prototype is constructed and multiple machine learning datasets are run and analyzed.

The final study takes a look at the creation of a smart city architecture. A novel multi-tiered hierarchical architecture, Reflex Tree, is proposed as a solution to automated city management in the future. The four layers of the architecture are able to perform massive parallel sensing, pattern recognition, spatial-temporal association, and system-wide behavioral analysis. Like the human nervous system, each layer in the hierarchy is able to detect specific events and inject feedback without the need for higher level intervention. Simulations of the architecture are performed in two scenarios: a gas pipeline and a city power supply network.

Share

COinS