Major

Computer Science

Minor(s)

Mathematics

Advisor

Noah M. Daniels

Advisor Department

Computer Science and Statistics

Date

12-2020

Keywords

Phonetic; Algorithms; Performance

Abstract

AARON SCHNEIDEREIT (Computer Science BS) Phonetic Algorithm Performance Sponsors: Noah Daniels (Computer Science and Statistics) Phonetic Algorithms are used for classifying words based on their pronunciation. These algorithms are used in many text-to-speech technologies and spell-checkers to ensure that a word can be correctly recognized despite minor spelling/pronunciation errors. The process of encoding a word to its phonetic surname is known as Phonetic Matching. Since 1918, there have only been a handful of phonetic algorithms that have been created. The main three algorithms that other phonetic algorithms are built from are Soundex, New York State Identification and Intelligence System (NYSIIS), and Metaphone. This research explores the performance of these phonetic algorithms and analyzes their performance of the phonetic matching under specified test cases. These test cases include homophones, swapping of specified vowels, and swapping of characters. The purpose of these tests is to understand which algorithm performs the best when the words are slightly different from what they are expected to be. These tests simulate the mispronunciation of words, common spelling errors, and characters that may have removed due to background noise. The results indicate that there is no algorithm that performs perfectly under these test cases; however, some algorithms perform better than others in specific circumstances. Soundex performed perfectly under the swapped vowel tests, while Metaphone and NYSIIS had similar results that under performed compared to Soundex. However, Metaphone performed very well when comparing the phonetic matching of homophones, and marginally outperformed NYSIIS and Soundex. These results have displayed the strengths and weakness of these algorithms and provide some insight on how certain types of words should be approached when searching for their phonetic matching.

COinS