A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5*Stat

Dyuthi/Manakin Repository

A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5*Stat

Show full item record

Title: A Novel Decision Tree Algorithm for Numeric Datasets - C 4.5*Stat
Author: Sumam, Mary Idicula; Sudheep, Elayidom M; Joseph, Alexander
Abstract: Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining
Description: International Journal of Advanced Computing, ISSN:2051-0845, Vol.36, Issue.1
URI: http://dyuthi.cusat.ac.in/purl/4105
Date: 2013


Files in this item

Files Size Format View Description
A Novel Decisio ... c Datasets - C 4.5Stat.pdf 179.0Kb PDF View/Open pdf

This item appears in the following Collection(s)

Show full item record

Search Dyuthi


Advanced Search

Browse

My Account