Toward Optimal Feature Selection in Naive Bayes for Text Categorization
Document Type
Article
Date of Original Version
9-1-2016
Abstract
Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (MD) and MD-χ2 methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.
Publication Title, e.g., Journal
IEEE Transactions on Knowledge and Data Engineering
Volume
28
Issue
9
Citation/Publisher Attribution
Tang, Bo, Steven Kay, and Haibo He. "Toward Optimal Feature Selection in Naive Bayes for Text Categorization." IEEE Transactions on Knowledge and Data Engineering 28, 9 (2016): 2508-2521. doi: 10.1109/TKDE.2016.2563436.