Date of Award
Master of Science in Electrical Engineering (MSEE)
Electrical, Computer, and Biomedical Engineering
Learning from imbalanced data has drawn growing attentions nowadays in the machine learning and data mining area. The imbalanced distribution will influence the performance of many machine learning algorithms, especially those need big amount of data. To reduce the influence of skewed data distribution on dis- criminative models, various synthetic oversampling methods have been proposed to generate extra samples for data balance. However, most of the classic oversampling algorithms, such as Synthetic Minority Over-sampling Technique (SMOTE) or Adaptive Synthetic Sampling Approach (ADASYN), were developed only focusing on balancing the data distribution of low dimensional data in a binary feature space, which limits their application on high dimensional multi-class data.
To deal with the deficiency of current imbalanced learning methods, this thesis proposed a deep generative model based multi-class imbalanced learning algorithm. Both Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) are implemented as data generators for creating high dimensional image data. Besides, we designed an Extended Nearest Neighbor (ENN) based selection process to add the most relevant samples to the original imbalanced database to further improve the classification performance. Based on our experiments on two data sets and comparisons with traditional oversampling algorithms, we demonstrate the effectiveness and robustness of our model.
Zhang, Yazhou, "DEEP GENERATIVE MODEL FOR MULTI-CLASS IMBALANCED LEARNING" (2018). Open Access Master's Theses. Paper 1277.