Imbalanced learning for pattern recognition: An empirical study

Document Type

Conference Proceeding

Date of Original Version



The imbalanced learning problem (learning from imbalanced data) presents a significant new challenge to the pattern recognition and machine learning society because in most instances real-world data is imbalanced. When considering military applications, the imbalanced learning problem becomes much more critical because such skewed distributions normally carry the most interesting and critical information. This critical information is necessary to support the decision-making process in battlefield scenarios, such as anomaly or intrusion detection. The fundamental issue with imbalanced learning is the ability of imbalanced data to compromise the performance of standard learning algorithms, which assume balanced class distributions or equal misclassification penalty costs. Therefore, when presented with complex imbalanced data sets these algorithms may not be able to properly represent the distributive characteristics of the data. In this paper we present an empirical study of several popular imbalanced learning algorithms on an army relevant data set. Specifically we will conduct various experiments with SMOTE (Synthetic Minority Over-Sampling Technique), ADASYN (Adaptive Synthetic Sampling), SMOTEBoost (Synthetic Minority Over-Sampling in Boosting), and AdaCost (Misclassification Cost-Sensitive Boosting method) schemes. Detailed experimental settings and simulation results are presented in this work, and a brief discussion of future research opportunities/challenges is also presented. © 2010 SPIE.

Publication Title

Proceedings of SPIE - The International Society for Optical Engineering