Electrical, Computer, and Biomedical Engineering Faculty Publications

Handling imbalanced dataset using SVM and k-NN approach

Yap Bee Wah, Universiti Teknologi MARA
Hezlin Aryani Abd Rahman, Universiti Teknologi MARA
Haibo He, Universiti Malaya
Awang Bulgiba, Universiti Malaya

Document Type

Conference Proceeding

Date of Original Version

6-21-2016

Abstract

Data mining classification methods are affected when the data is imbalanced, that is, when one class is larger than the other class in size for the case of a two-class dependent variable. Many new methods have been developed to handle imbalanced datasets. In handling a binary classification task, Support Vector Machine (SVM) is one of the methods reported to give a high accuracy in predictive modeling compared to the other techniques such as Logistic Regression and Discriminant Analysis. The strength of SVM is the robustness of its algorithm and the capability to integrate with kernel-based learning that results in a more flexible analysis and optimized solution. Another popular method to handle imbalanced data is the random sampling method, such as random undersampling, random oversampling and synthetic sampling. The application of the Nearest Neighbours techniques in sampling approach has been seen as having a bigger advantage compared to other methods, as it can handle both structured and non-structured data. There are some studies that implement an ensemble method of both SVM and Nearest Neighbours with good results. This paper discusses the various methods in handling imbalanced data and an illustration of using SVM and k-Nearest Neighbours (k-NN) on a real-data set.

Publication Title, e.g., Journal

Aip Conference Proceedings

Volume

1750

Citation/Publisher Attribution

Wah, Yap Bee, Hezlin Aryani Abd Rahman, Haibo He, and Awang Bulgiba. "Handling imbalanced dataset using SVM and k-NN approach." Aip Conference Proceedings 1750, (2016). doi: 10.1063/1.4954536.

Link to Full Text

COinS

DOI

https://doi.org/10.1063/1.4954536

Electrical, Computer, and Biomedical Engineering Faculty Publications

Handling imbalanced dataset using SVM and k-NN approach

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Volume

Citation/Publisher Attribution

DOI

Search

Browse

Author Corner

Electrical, Computer, and Biomedical Engineering Faculty Publications

Handling imbalanced dataset using SVM and k-NN approach

Authors

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Volume

Citation/Publisher Attribution

Share

DOI

Search

Browse

Author Corner