Date of Award
Master of Science in Electrical Engineering (MSEE)
Electrical, Computer, and Biomedical Engineering
Using two full applications with different characteristics, this thesis explores the performance and energy efficiency of CUDA-enabled GPUs and multi-core SIMD CPUs. Our implementations efficiently exploit both SIMD and thread-level parallelism on multi-core CPUs and the computational capabilities of CUDA-enabled GPUs. We discuss general optimization techniques and cost comparison for our CPU-only and CPU-GPU platforms. Finally, we present an evaluation of the implementation effort required to efficiently utilize multi-core SIMD CPUs and CUDA-enabled GPUs. One of the applications, seam carving, has been widely used for content-aware resizing of images and videos with little to no perceptible distortion. The gradient kernel was improved and achieves over 102x speedup on the GPU; this fraction (gradient kernel) of the seam carving operation has largest execution time. The overall resizing operation achieves 32x speedup on multi-core SIMD CPU. The time to resize one minute of a 1920x1080 video with seam carving was reduced from 6 hours to 17 minutes on a heterogeneous CPU-GPU system. The second application, numerical simulations of cardiac action potential propagation (CAPPS), is a valuable tool for understanding the mechanisms that promote arrhythmias that may degenerate into spiral wave propagation. Our implementation of CAPPS reduces the simulation time from 10 days (single-core implementation) to approximately 4 hours and 8 minutes. This is 54% faster than the execution time of CAPPS on a 60-core CPU-only cluster using MPI. Moreover, our implementation is 18.4x more energy-efficient than the 60-core cluster implementation
Duarte, Ronald, "Improving Performance of Data-Parallel Applications on CPU-GPU Heterogeneous Systems" (2013). Open Access Master's Theses. Paper 48.