"Entropy-based Sampling Approaches for Multi-Class Imbalanced Problems" by Lusi Li, Haibo He et al.
 

Entropy-based Sampling Approaches for Multi-Class Imbalanced Problems

Document Type

Article

Date of Original Version

11-1-2020

Abstract

In data mining, large differences between multi-class distributions regarded as class imbalance issues have been known to hinder the classification performance. Unfortunately, existing sampling methods have shown their deficiencies such as causing the problems of over-generation and over-lapping by oversampling techniques, or the excessive loss of significant information by undersampling techniques. This paper presents three proposed sampling approaches for imbalanced learning: The first one is the entropy-based oversampling (EOS) approach; the second one is the entropy-based undersampling (EUS) approach; the third one is the entropy-based hybrid sampling (EHS) approach combined by both oversampling and undersampling approaches. These three approaches are based on a new class imbalance metric, termed entropy-based imbalance degree (EID), considering the differences of information contents between classes instead of traditional imbalance-ratio. Specifically, to balance a data set after evaluating the information influence degree of each instance, EOS generates new instances around difficult-to-learn instances and only remains the informative ones. EUS removes easy-to-learn instances. While EHS can do both simultaneously. Finally, we use all the generated and remaining instances to train several classifiers. Extensive experiments over synthetic and real-world data sets demonstrate the effectiveness of our approaches.

Publication Title, e.g., Journal

IEEE Transactions on Knowledge and Data Engineering

Volume

32

Issue

11

Plum Print visual indicator of research metrics
PlumX Metrics
  • Citations
    • Citation Indexes: 61
  • Usage
    • Abstract Views: 9
  • Captures
    • Readers: 47
see details

Share

COinS