Now showing items 19-21 of 21
Abstract: | Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining |
Description: | International Journal of Advanced Computing, ISSN:2051-0845, Vol.36, Issue.1 |
URI: | http://dyuthi.cusat.ac.in/purl/4105 |
Files | Size |
---|---|
A Novel Decisio ... c Datasets - C 4.5Stat.pdf | (183.3Kb) |
Abstract: | This paper presents a novel approach to recognize Grantha, an ancient script in South India and converting it to Malayalam, a prevalent language in South India using online character recognition mechanism. The motivation behind this work owes its credit to (i) developing a mechanism to recognize Grantha script in this modern world and (ii) affirming the strong connection among Grantha and Malayalam. A framework for the recognition of Grantha script using online character recognition is designed and implemented. The features extracted from the Grantha script comprises mainly of time-domain features based on writing direction and curvature. The recognized characters are mapped to corresponding Malayalam characters. The framework was tested on a bed of medium length manuscripts containing 9-12 sample lines and printed pages of a book titled Soundarya Lahari writtenin Grantha by Sri Adi Shankara to recognize the words and sentences. The manuscript recognition rates with the system are for Grantha as 92.11%, Old Malayalam 90.82% and for new Malayalam script 89.56%. The recognition rates of pages of the printed book are for Grantha as 96.16%, Old Malayalam script 95.22% and new Malayalam script as 92.32% respectively. These results show the efficiency of the developed system |
Description: | (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 3, No. 7, 2012 |
URI: | http://dyuthi.cusat.ac.in/purl/4106 |
Files | Size |
---|---|
An Online Chara ... ha Script to Malayalam.pdf | (548.4Kb) |
Abstract: | Due to the emergence of multiple language support on the Internet, machine translation (MT) technologies are indispensable to the communication between speakers using different languages. Recent research works have started to explore tree-based machine translation systems with syntactical and morphological information. This work aims the development of Syntactic Based Machine Translation from English to Malayalam by adding different case information during translation. The system identifies general rules for various sentence patterns in English. These rules are generated using the Parts Of Speech (POS) tag information of the texts. Word Reordering based on the Syntax Tree is used to improve the translation quality of the system. The system used Bilingual English –Malayalam dictionary for translation. |
Description: | 2012 International Conference on Data Science & Engineering (ICDSE) |
URI: | http://dyuthi.cusat.ac.in/purl/4108 |
Files | Size |
---|---|
Syntactic Based ... m English to Malayalam.pdf | (134.8Kb) |
Now showing items 19-21 of 21
Dyuthi Digital Repository Copyright © 2007-2011 Cochin University of Science and Technology. Items in Dyuthi are protected by copyright, with all rights reserved, unless otherwise indicated.