A Command Line Interface at your hands to deal with the features of ScrapingHub.
You must install it using pip...
$ pip install shub-cli
... or pipsi
$ pipsi install shub-cli
Shub CLI will look for the .scrapinghub.yml
created by ScrapingHub in your home directory and read the default API_KEY and PROJECT_ID.
If you do not have that file, set it up according to the example below:
~/.scrapinghub.yml
apikeys:
default: <API_KEY>
projects:
default: <PROJECT_ID>
If you set up ~/.scrapinghub.yml file
$ shub-cli repl
Otherwise...
$ shub-cli -api <API KEY> -project <PROJECT_ID> repl
If you just want to run a command
$ shub-cli [credentials|spiders|job|jobs|schedule]
> credentials
> spiders
> job [-show|-cancel|-delete id]
> jobs [-spider spider] [-tag tag] [-lacks tag] [-state pending|running|finished|deleted] [-count count]
> schedule [-spider spider] [-tags tag1,tag2] [-priority 1|2|3|4]
Check what credentials are being used to connect to Scrapinghub.
> credentials
List all spiders available.
> spiders
List the last 10 jobs or the ones according to your criteria.
> jobs
> jobs -spider <spider> -tag <tag> -lacks <lacks> -state <[pending,finished,running,deleted]> -count <[0,1000]>
Example:
> jobs
> jobs -spider example -tag production -lacks consumed -state finished -count 100
Attention: By default, shub-cli will prompt the last 10 jobs. To override that behaviour use the -count parameter with the number of jobs you intend to show.
Show, delete or cancel a id.
> job -show <id>
> job -show <id> --with-log
> job -delete <id>
> job -cancel <id>
Example:
> job -show 11/23/19801
> job -show 11/23/19801 --with-log
> job -delete 11/23/19801
> job -cancel 11/23/19801
Schedule a spider execution.
> schedule -spider <spider> -priority <[1,2,3,4]> -tags <tag1,tag2>
Example:
> schedule -spider my-spider
> schedule -spider my-spider -priority 4 -tags production,periodic
> schedule -spider my-spider -priority 3 -tags test
For help or suggestion please open an issue at the Github Issues page.