/scrapy-monkeylearn

A Scrapy pipeline to categorize items using MonkeyLearn

Primary LanguagePython

scrapy-monkeylearn

A Scrapy middleware to categorize items using MonkeyLearn.

Settings

     Option Name            |                         Value                         |    Example Value

------------------------------- | ----------------------------------------------------- | ------------------- MONKEYLEARN_BATCH_SIZE | The size of the item batches sent to MonkeyLearn. Default=200 | 200 MONKEYLEARN_MODULE | The ID of the monkeylearn module. | 'cl_oFKL5wft' MONKEYLEARN_USE_SANDBOX | In case of using a classifier, if the sandbox version should be used. Default=False | False MONKEYLEARN_TOKEN | The auth token. | 'TWFuIGlzIGRp...' MONKEYLEARN_FIELD_TO_PROCESS | A field or list of Item text fields to use for classification. | ['title', 'description'] MONKEYLEARN_FIELD_OUTPUT | The field where the MonkeyLearn output will be stored. | 'categories'

An example value of the MONKEYLEARN_FIELD_OUTPUT field after classification is: [{'label': 'English', 'probability': 0.321}].

Usage

In your settings.py file, add the previously described settings and add MonkeylearnMiddleware to your middlewares, e.g.:

SPIDER_MIDDLEWARES = {
    'scrapy_monkeylearn.middlewares.MonkeylearnMiddleware': 1000,
}

License

Copyright (c) 2015 MonkeyLearn.

Released under the MIT license.