Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Document Type

Article

Date of Original Version

9-1-2016

Abstract

Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (MD) and MD-χ2 methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.

Publication Title, e.g., Journal

IEEE Transactions on Knowledge and Data Engineering

Volume

28

Issue

9

Share

COinS