Nonstationary stream data learning with imbalanced class distribution

Document Type


Date of Original Version



The ubiquitous imbalanced class distribution occurring in real-world datasets has stirred considerable interest in the study of imbalanced learning. However, it is still a relatively uncharted area when it is a nonstationary data stream with imbalanced class distribution that needs to be processed. Difficulties in this case are generally twofold. First, a dynamically structured learning framework is required to catch up with the evolution of unstable class concepts, that is, concept drifts. Second, an imbalanced class distribution over data streams demands a mechanism to intensify the underrepresented class concepts for improved overall performance. For instance, in order to design an intelligent spam filtering system, one needs to make a system that can self-tune its learning parameters to keep pace with the rapid evolution of spam mail patterns and tackle the fundamental problem of normal emails being severely outnumbered by spam emails in some situations; yet it is so much more expensive to misclassify a normal email as spam, for example, confirmation of a business contract, than the other way around. This chapter introduces learning algorithms that were specifically proposed to tackle the problem of learning from nonstationary datasets with imbalanced class distribution. System-level principles and a framework of these methods are described at an algorithmic level, the soundness of which is further validated through theoretical analysis as well as simulations on both synthetic and real-world benchmarks with varied levels of imbalanced ratio and noise.

Publication Title

Imbalanced Learning: Foundations, Algorithms, and Applications