New and efficient FFT algorithm for distributed memory systems

Document Type

Conference Proceeding

Date of Original Version



This paper presents a new and optimal parallel implementation of multidimensional fast Fourier transform algorithm on distributed memory multiprocessors. Its optimality is obtained by minimizing the number of message passings necessary, at the cost of increase in message length. This distinctive feature of the new algorithm effectively utilizes the important architectural property of most of today's distributed memory multiprocessors-wormhole routing for interprocessor communications. By using the algebra of stride permutations and tenser products as a mathematical tool, we are able to derive and formulate an efficient data partition and communication scheme that reduces communication cost from O(N/sup 2/) required for the best known FFT to O(N) on an N/sup 2/-processor machine. Our data partition scheme is natural and efficient for solving discretized boundary value problems such as partial differential equations and finite element analysis. To evaluate the actual performance of our new algorithm in comparison with other existing parallel FFT algorithms, we have carried out implementation experiments on the Intel's Touchstone Delta machine.

Publication Title, e.g., Journal

Proceedings of the Internatoinal Conference on Parallel and Distributed Systems - ICPADS