/corpus-builder

toolkit for compiling corpus from various sources

Primary LanguagePythonMIT LicenseMIT

banglakit/corpus-builder

Having a large enough set of text is essential for NLP tasks; this tool is designed for the sole purpose of building large collection of text documents from the web.

A practical understanding of Python and Scrapy is essential for using the tool.

Example Usage

scrapy crawl bangladesh_pratidin -a start_date='2016-06-01' -a end_date='2016-06-05' -o test3.csv