Date of Award


Degree Type


Degree Name

Master of Science in Computer Science


Computer Science and Statistics

First Advisor

Lutz Hamel


As a member of Artificial Neural Networks, Self-Organizing Maps (SOMs) have been well researched since 1980s, and have been implemented in C, Fortran, R [1] and Python [2]. Python is an efficient high-level language widely used in the machine learning field for years, but most of the SOM-related packages which are written in Python only perform model construction and visualization. However, the POPSOM package, written in R, is capable of performing functionality beyond model construction and visualization, such as evaluating the model’s quality with statistical methods and plotting marginal probability distributions of the neurons. In order to give the Python user the POPSOM package’s advantages, it is important to migrate the POPSOM package to be Python-based. This study shows the details of this implementation.

There are three major tasks for the implementation: 1) Migrate the POPSOM package from R to Python; 2) Refactor the source code from procedural programming paradigm to object-oriented programming paradigm; 3) Improve the package by adding normalization options to the model construction function. In addition to constructing the model in Python, Fortran is also embedded to accelerate the speed of model construction significantly in this project.

The final program has been completed, and it is necessary to guarantee the correctness of the program. The best way to achieve this goal is to compare the output of the Python-based program to the output generated by the R-based program. For the model construction function, the SOM algorithm initializes the weight vector of the neurons randomly at the very beginning, and then selects the input vectors randomly during the training. Due to these two random factors, one cannot expect the same input (data set) will result in exactly the same output (neurons). Instead, to give evidence that the Python program is working properly, there are two solutions that have been proposed and applied in this project: 1) measuring the average difference of vectors between two neurons which have been generated by the R and Python functions respectively; 2) measuring the ratio of the variances and the difference of features’ mean for the two neurons. Besides the model construction, model visualization and other functions which take neurons as their input should return the same results by feeding the same input (neurons). The detail of above verification will be represented in the following chapters. (45 kB)