Electrical, Computer, and Biomedical Engineering Faculty Publications

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Bo Tang, University of Rhode Island
Steven Kay, University of Rhode IslandFollow
Haibo He, University of Rhode IslandFollow

Document Type

Article

Date of Original Version

9-1-2016

Abstract

Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (MD) and MD-χ2 methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.

Publication Title, e.g., Journal

IEEE Transactions on Knowledge and Data Engineering

Volume

Issue

Citation/Publisher Attribution

Tang, Bo, Steven Kay, and Haibo He. "Toward Optimal Feature Selection in Naive Bayes for Text Categorization." IEEE Transactions on Knowledge and Data Engineering 28, 9 (2016): 2508-2521. doi: 10.1109/TKDE.2016.2563436.

Link to Full Text

COinS

DOI

https://doi.org/10.1109/TKDE.2016.2563436

Electrical, Computer, and Biomedical Engineering Faculty Publications

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Volume

Issue

Citation/Publisher Attribution

DOI

Search

Browse

Author Corner

Electrical, Computer, and Biomedical Engineering Faculty Publications

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Authors

Document Type

Date of Original Version

Abstract

Publication Title, e.g., Journal

Volume

Issue

Citation/Publisher Attribution

Share

DOI

Search

Browse

Author Corner