Date of Award
Doctor of Philosophy in Computer Science
Computer Science and Stastistics
This dissertation presents the culmination of research performed over six years into developing a parallel and stochastic implementation to the University of Rhode Island’s (URI) Computer Science Department Vectorized Self Organizing Maps (VSOM) algorithm. The Parallel VSOM (Par-VSOM) and the High-Level Synthesis VSOM (HLS-VSOM) algorithms are inspired by ideas from tensor algebra and are implemented using parallel kernels and vectorization in modern hardware accelerators.
The map quality generated by the algorithm is of significant importance since higher quality maps provide in-depth knowledge that allows the researcher to identify clusters of information within the datasets. Furthermore, of importance is developing a more efficient and scalable parallel solution, such that it can be executed in newer hardware accelerator architectures. The URI Computer Science Department addressed part of these challenges, leading to its Vectorized Self-Organizing Maps Central Processing Unit (CPU) solution. The VSOM CPU solution was the first vectorized SOM algorithm with sufficient processing throughput to execute the algorithm 60 times faster than Kohone’s iterative algorithm. In addition, the VSOM produced quality maps that matched the Kohone’s SOM and outperformed the quality of the maps produced by the BatchSOM.
Due to the significant results achieved with the VSOM and the algorithm’s vectorization nature, we decided to use the VSOM as the starting point for our proposed algorithms. Furthermore, the state of the art hardware accelerators offer hardware vectorization capability and serve as the perfect environment to improve the previous speed-up gains obtained in CPUs.
The successor to the VSOM CPU-based algorithm has further pushed the limit of state of the art by providing a Graphical Processor Unit (GPU) parallel solution that has undergone testing in the Amazon Web Service (AWS) cloud. The GPU solution has generated the same map quality as the VSOM CPU-based solution and provides scalable speedup enhancements over the original Kohone’s SOM algorithm and the VSOM CPU implementations using large maps. The obtained scalable speedup made the GPU solution URI’s fastest for the most optimal solution for larger maps. More importantly, the GPU algorithm provides a roadmap for a higher-performance algorithm hosted in Field Programmable Gate Array (FPGA).
URI’s successor to the GPU Par-VSOM algorithm provides an embedded accelerator architecture solution in an FPGA environment. The FPGA experimental results demonstrate that we are not sacrificing map accuracy for performance. The FPGA solution provides a speedup enhancement over the VSOM CPU and the Kohone’s SOM algorithm implementation with maps and datasets with the same dimensionality constraints. In addition, compared to GPU implementations, the HLS-VSOM outperforms the SOM GPU variants by two or more orders of magnitude.
Two schools of thought clearly stand out as part of our literary search of other groups performing state-of-the-art parallel SOM solutions: network partitioning and data partition methodology. The network partitioning strategy separated the maps in multiple section to obtain some level of parallelism. To accomplish this, this method employs separated threads to calculate the winning neurons updates of map partitions. This result in faster execution but the separation of the maps in subsections adds complexity to the analysis of the data and may result in a lower quality of the entire merge maps. The data partitioning methodology is a more common approach, where the data is distributed among the individual threads for faster parallel execution. A very popular variant of the methodology is the BatchSOM, which executes the best matching unit (BMU) part of the algorithm in parallel is unable to preserve a consistent map quality.This approach provides a good option for parallelism but does not allow for a complete parallel solution as the PAR-VSOM and HLS-VSOM variants.
The URI’s Par-VSOM and HLS-VSOM solutions have advantages over other state-of-the-art parallel SOM algorithms, the most notable advantage being their high performance. The higher performance can be attributed to the use of fully vectorized data structures, neighborhood caching, asynchronous memory speed gains, pipelining, loop unrolling, and array partitioning. As a result, URI’s parallel algorithm solutions are leading the way toward highly optimized SOM algorithms, thereby providing a high-performance alternative to SOM algorithm.
Rivera Morales, Omar X., "PARALLELIZATION OF VECTORIZED SELF-ORGANIZING MAPS IN HARDWARE ACCELERATOR ARCHITECTURES" (2022). Open Access Dissertations. Paper 1393.