Author Identification in Malayalam using n-grams

Dyuthi/Manakin Repository

Author Identification in Malayalam using n-grams

Show full item record

Title: Author Identification in Malayalam using n-grams
Author: Sumam, Mary Idicula; Bindu, Baby Thomas; Sindhu, L
Abstract: Author identification is the problem of identifying the author of an anonymous text or text whose authorship is in doubt from a given set of authors. The works by different authors are strongly distinguished by quantifiable features of the text. This paper deals with the attempts made on identifying the most likely author of a text in Malayalam from a list of authors. Malayalam is a Dravidian language with agglutinative nature and not much successful tools have been developed to extract syntactic & semantic features of texts in this language. We have done a detailed study on the various stylometric features that can be used to form an authors profile and have found that the frequencies of word collocations can be used to clearly distinguish an author in a highly inflectious language such as Malayalam. In our work we try to extract the word level and character level features present in the text for characterizing the style of an author. Our first step was towards creating a profile for each of the candidate authors whose texts were available with us, first from word n-gram frequencies and then by using variable length character n-gram frequencies. Profiles of the set of authors under consideration thus formed, was then compared with the features extracted from anonymous text, to suggest the most likely author.
URI: http://dyuthi.cusat.ac.in/purl/4103
Date: 2009


Files in this item

Files Size Format View Description
Author Identifi ... alayalam using n-grams.pdf 379.0Kb PDF View/Open pdf

This item appears in the following Collection(s)

Show full item record

Search Dyuthi


Advanced Search

Browse

My Account