Now showing items 1-7 of 7
Abstract: | Thunderstorm is one of the most spectacular weather phenomena in the atmosphere. Many parts over the Indian region experience thunderstorms at higher frequency during pre-monsoon months (March- May), when the atmosphere is highly unstable because of high temperatures prevailing at lower levels. Most dominant feature of the weather during the pre-monsoon season over the eastern Indo-Gangetic plain and northeast India is the outburst of severe local convective storms, commonly known as ‘Nor’wester’ or ‘Kalbaishakhi’. The severe thunderstorms associated with thunder, squall line, lightning and hail cause extensive losses in agriculture, damage to structure and also loss of life. The casualty due to lightning associated with thunderstorms in this region is the highest in the world. The highest numbers of aviation hazards are reported during occurrence of these thunderstorms. In India, 72% of tornadoes are associated with this thunderstorm. |
Description: | Department of Computer Science, Cochin University of Science and Technology |
URI: | http://dyuthi.cusat.ac.in/purl/3709 |
Files | Size |
---|---|
Dyuthi-T1667.pdf | (7.709Mb) |
Abstract: | This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements |
Description: | Dept. Of Computer Science Cochin University Of Science And Technology |
URI: | http://dyuthi.cusat.ac.in/purl/3698 |
Files | Size |
---|---|
Dyuthi-T1663.pdf | (4.368Mb) |
Abstract: | Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods. |
Description: | Department of Computer Science, Cochin University of Science and Technology |
URI: | http://dyuthi.cusat.ac.in/purl/3127 |
Files | Size |
---|---|
Dyuthi-T1101.pdf | (2.945Mb) |
Abstract: | Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically. |
Description: | Dept.of Computer Science,Cochin University of Science and Technology |
URI: | http://dyuthi.cusat.ac.in/purl/2956 |
Files | Size |
---|---|
Dyuthi-T0947.pdf | (8.801Mb) |
Abstract: | Microarray data analysis is one of data mining tool which is used to extract meaningful information hidden in biological data. One of the major focuses on microarray data analysis is the reconstruction of gene regulatory network that may be used to provide a broader understanding on the functioning of complex cellular systems. Since cancer is a genetic disease arising from the abnormal gene function, the identification of cancerous genes and the regulatory pathways they control will provide a better platform for understanding the tumor formation and development. The major focus of this thesis is to understand the regulation of genes responsible for the development of cancer, particularly colorectal cancer by analyzing the microarray expression data. In this thesis, four computational algorithms namely fuzzy logic algorithm, modified genetic algorithm, dynamic neural fuzzy network and Takagi Sugeno Kang-type recurrent neural fuzzy network are used to extract cancer specific gene regulatory network from plasma RNA dataset of colorectal cancer patients. Plasma RNA is highly attractive for cancer analysis since it requires a collection of small amount of blood and it can be obtained at any time in repetitive fashion allowing the analysis of disease progression and treatment response. |
Description: | Department of Computer Science, Cochin University of Science and Technology |
URI: | http://dyuthi.cusat.ac.in/purl/3760 |
Files | Size |
---|---|
Dyuthi-T1722.pdf | (5.663Mb) |
Abstract: | This thesis summarizes the results on the studies on a syntax based approach for translation between Malayalam, one of Dravidian languages and English and also on the development of the major modules in building a prototype machine translation system from Malayalam to English. The development of the system is a pioneering effort in Malayalam language unattempted by previous researchers. The computational models chosen for the system is first of its kind for Malayalam language. An in depth study has been carried out in the design of the computational models and data structures needed for different modules: morphological analyzer , a parser, a syntactic structure transfer module and target language sentence generator required for the prototype system. The generation of list of part of speech tags, chunk tags and the hierarchical dependencies among the chunks required for the translation process also has been done. In the development process, the major goals are: (a) accuracy of translation (b) speed and (c) space. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required. The space requirement also has to be minimised |
Description: | Department of Computer Science, Cochin University of Science and Technology |
URI: | http://dyuthi.cusat.ac.in/purl/3808 |
Files | Size |
---|---|
Dyuthi-T1739.pdf | (5.310Mb) |
Abstract: | Handwriting is an acquired tool used for communication of one's observations or feelings. Factors that inuence a person's handwriting not only dependent on the individual's bio-mechanical constraints, handwriting education received, writing instrument, type of paper, background, but also factors like stress, motivation and the purpose of the handwriting. Despite the high variation in a person's handwriting, recent results from different writer identification studies have shown that it possesses sufficient individual traits to be used as an identification method. Handwriting as a behavioral biometric has had the interest of researchers for a long time. But recently it has been enjoying new interest due to an increased need and effort to deal with problems ranging from white-collar crime to terrorist threats. The identification of the writer based on a piece of handwriting is a challenging task for pattern recognition. The main objective of this thesis is to develop a text independent writer identification system for Malayalam Handwriting. The study also extends to developing a framework for online character recognition of Grantha script and Malayalam characters |
Description: | Department of Computer Science, Cochin University of Science and Technology |
URI: | http://dyuthi.cusat.ac.in/purl/3543 |
Files | Size |
---|---|
Dyuthi-T1511.pdf | (17.50Mb) |
Now showing items 1-7 of 7
Dyuthi Digital Repository Copyright © 2007-2011 Cochin University of Science and Technology. Items in Dyuthi are protected by copyright, with all rights reserved, unless otherwise indicated.