An information criterion for use in predictive data mining

Eric Kyper, University of Rhode Island


This dissertation contains three manuscripts related to each other. The first manuscript is a review of existing data mining literature in the areas of: machine learning, rule induction, neural networks, case based reasoning, genetic algorithms, and rough sets. For each area a brief description of what the topic is, examples of applications in the area, current research, and directions for future research are provided. The second manuscript presents an information criterion for choosing between decision trees exhibiting different characteristics of accuracy and complexity. The information criterion allows decision-makers to choose between decision tree model subsets based on their preference for parsimony and their individual problem domain. The second manuscript also presents a metric to quantify opportunity losses between decision trees thereby providing quantitative data to better enable decision-making. The proposed decision tree information criterion and opportunity loss measure provides decision support for managerial decision-making. The third manuscript details an implementation of the decision tree information criterion and opportunity loss measure developed in manuscript 2. It outlines the construction of a program to automate the discretization process and decision tree analysis. The program analyzes a dataset containing insurance company call center statistics, and the results confirm that the measures developed in manuscript 2 perform as predicted. Implications for managerial decision-making are then discussed. ^

Subject Area

Business Administration, Management|Information Science

Recommended Citation

Eric Kyper, "An information criterion for use in predictive data mining" (2006). Dissertations and Master's Theses (Campus Access). Paper AAI3225319.