Electrical, Computer, and Biomedical Engineering Faculty Publications

Comparisons of ADABOOST, KNN, SVM and logistic regression in classification of imbalanced dataset

Hezlin Aryani Abd Rahman, Universiti Teknologi MARA
Yap Bee Wah, Universiti Teknologi MARA
Haibo He, University of Rhode IslandFollow
Awang Bulgiba, Universiti Malaya

Document Type

Conference Proceeding

Date of Original Version

1-1-2015

Abstract

Data mining classification techniques are affected by the presence of imbalances between classes of a response variable. The difficulty in handling the imbalanced data issue has led to an influx of methods, either resolving the imbalance issue at data or algorithmic level. The R programming language is one of the many tools available for data mining. This paper compares some classification algorithms in R for an imbalanced medical data set. The classifiers ADABOOST, KNN, SVM-RBF and logistic regression were applied to the original, random oversampling and undersampling data sets. Results show that ADABOOST, KNN and SVM-RBF exhibits over-fitting when applied to the original dataset. No over-fitting occurs for the random oversampling dataset where by SVM-RBF has the highest accuracy (Training: 91.5%, Testing: 90.6%), sensitivity (Training:91.0%, Testing: 91.0%), specificity (Training: 92.0%,Testing: 90.2%) and precision (Training:91.9%, Testing 90.5%) for training and testing data set. For random undersampling, no over-fitting occurs only for ADABOOST and logistic regression. Logistic regression is the most stable classifier exhibiting consistent training an testing results.

Publication Title, e.g., Journal

Communications in Computer and Information Science

Volume

545

Citation/Publisher Attribution

Rahman, Hezlin Aryani Abd, Yap Bee Wah, Haibo He, and Awang Bulgiba. "Comparisons of ADABOOST, KNN, SVM and logistic regression in classification of imbalanced dataset." Communications in Computer and Information Science 545, (2015): 54-64. doi: 10.1007/978-981-287-936-3_6.

Link to Full Text

COinS

DOI

https://doi.org/10.1007/978-981-287-936-3_6

Electrical, Computer, and Biomedical Engineering Faculty Publications

Comparisons of ADABOOST, KNN, SVM and logistic regression in classification of imbalanced dataset

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Volume

Citation/Publisher Attribution

DOI

Search

Browse

Author Corner

Electrical, Computer, and Biomedical Engineering Faculty Publications

Comparisons of ADABOOST, KNN, SVM and logistic regression in classification of imbalanced dataset

Authors

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Volume

Citation/Publisher Attribution

Share

DOI

Search

Browse

Author Corner