Author Identification in Malayalam using n-grams

Dyuthi/Manakin Repository

Author Identification in Malayalam using n-grams

Show simple item record

dc.contributor.author Sumam, Mary Idicula
dc.contributor.author Bindu, Baby Thomas
dc.contributor.author Sindhu, L
dc.date.accessioned 2014-07-18T04:58:29Z
dc.date.available 2014-07-18T04:58:29Z
dc.date.issued 2009
dc.identifier.uri http://dyuthi.cusat.ac.in/purl/4103
dc.description.abstract Author identification is the problem of identifying the author of an anonymous text or text whose authorship is in doubt from a given set of authors. The works by different authors are strongly distinguished by quantifiable features of the text. This paper deals with the attempts made on identifying the most likely author of a text in Malayalam from a list of authors. Malayalam is a Dravidian language with agglutinative nature and not much successful tools have been developed to extract syntactic & semantic features of texts in this language. We have done a detailed study on the various stylometric features that can be used to form an authors profile and have found that the frequencies of word collocations can be used to clearly distinguish an author in a highly inflectious language such as Malayalam. In our work we try to extract the word level and character level features present in the text for characterizing the style of an author. Our first step was towards creating a profile for each of the candidate authors whose texts were available with us, first from word n-gram frequencies and then by using variable length character n-gram frequencies. Profiles of the set of authors under consideration thus formed, was then compared with the features extracted from anonymous text, to suggest the most likely author. en_US
dc.description.sponsorship Cochin University Of Science And Technology en_US
dc.language.iso en en_US
dc.subject stylometrics en_US
dc.subject feature extraction en_US
dc.subject author profile en_US
dc.subject lexical features en_US
dc.subject character features en_US
dc.subject collocations en_US
dc.subject classification en_US
dc.subject n-grams en_US
dc.subject distance measure en_US
dc.title Author Identification in Malayalam using n-grams en_US
dc.type Article en_US


Files in this item

Files Size Format View Description
Author Identifi ... alayalam using n-grams.pdf 379.0Kb PDF View/Open pdf

This item appears in the following Collection(s)

Show simple item record

Search Dyuthi


Advanced Search

Browse

My Account