/TamilCorpus

Open Source Tamil Corpus of 58M words

Primary LanguageShellGNU General Public License v3.0GPL-3.0

TamilCorpus

Open Source Tamil Corpus of 58M words

Source : Wikipedia,TheHindu(Tamil) 

Usage

Run extract.sh to extract the compressed files.

P.S : A little cleansing might be needed.