Now showing items 1-4 of 4
Abstract: | Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining |
Description: | International Journal of Advanced Computing, ISSN:2051-0845, Vol.36, Issue.1 |
URI: | http://dyuthi.cusat.ac.in/purl/4105 |
Files | Size |
---|---|
A Novel Decisio ... c Datasets - C 4.5Stat.pdf | (183.3Kb) |
Abstract: | Learning Disability (LD) is a classification including several disorders in which a child has difficulty in learning in a typical manner, usually caused by an unknown factor or factors. LD affects about 15% of children enrolled in schools. The prediction of learning disability is a complicated task since the identification of LD from diverse features or signs is a complicated problem. There is no cure for learning disabilities and they are life-long. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. The aim of this paper is to develop a new algorithm for imputing missing values and to determine the significance of the missing value imputation method and dimensionality reduction method in the performance of fuzzy and neuro fuzzy classifiers with specific emphasis on prediction of learning disabilities in school age children. In the basic assessment method for prediction of LD, checklists are generally used and the data cases thus collected fully depends on the mood of children and may have also contain redundant as well as missing values. Therefore, in this study, we are proposing a new algorithm, viz. the correlation based new algorithm for imputing the missing values and Principal Component Analysis (PCA) for reducing the irrelevant attributes. After the study, it is found that, the preprocessing methods applied by us improves the quality of data and thereby increases the accuracy of the classifiers. The system is implemented in Math works Software Mat Lab 7.10. The results obtained from this study have illustrated that the developed missing value imputation method is very good contribution in prediction system and is capable of improving the performance of a classifier. |
Description: | I.J. Intelligent Systems and Applications, 2013, 12, 34-52 |
URI: | http://dyuthi.cusat.ac.in/purl/4209 |
Files | Size |
---|---|
Performance Imp ... in School-age Children.pdf | (779.6Kb) |
Abstract: | The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified |
URI: | http://dyuthi.cusat.ac.in/purl/4192 |
Files | Size |
---|---|
Significance Of ... Learning Disabilities.pdf | (200.4Kb) |
Abstract: | In our study we use a kernel based classification technique, Support Vector Machine Regression for predicting the Melting Point of Drug – like compounds in terms of Topological Descriptors, Topological Charge Indices, Connectivity Indices and 2D Auto Correlations. The Machine Learning model was designed, trained and tested using a dataset of 100 compounds and it was found that an SVMReg model with RBF Kernel could predict the Melting Point with a mean absolute error 15.5854 and Root Mean Squared Error 19.7576 |
Description: | PROCEEDINGS OF ICETECT 2011 |
URI: | http://dyuthi.cusat.ac.in/purl/4235 |
Files | Size |
---|---|
Using Neural Ne ... Drug – like compounds.pdf | (1.156Mb) |
Now showing items 1-4 of 4
Dyuthi Digital Repository Copyright © 2007-2011 Cochin University of Science and Technology. Items in Dyuthi are protected by copyright, with all rights reserved, unless otherwise indicated.