Pinned Repositories
dateparser
python parser for human readable dates
extruct
Extract embedded metadata from HTML markup
frontera
A scalable frontier for web crawlers
portia
Visual scraping for Scrapy
python-crfsuite
A python binding for crfsuite
python-scrapinghub
A client interface for Scrapinghub's API
scrapyrt
HTTP API for Scrapy spiders
slackbot
A chat bot for Slack (https://slack.com).
spidermon
Scrapy Extension for monitoring spiders execution.
splash
Lightweight, scriptable browser as a service with an HTTP API
Scrapinghub's Repositories
scrapinghub/portia
Visual scraping for Scrapy
scrapinghub/splash
Lightweight, scriptable browser as a service with an HTTP API
scrapinghub/dateparser
python parser for human readable dates
scrapinghub/frontera
A scalable frontier for web crawlers
scrapinghub/extruct
Extract embedded metadata from HTML markup
scrapinghub/scrapyrt
HTTP API for Scrapy spiders
scrapinghub/python-crfsuite
A python binding for crfsuite
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.
scrapinghub/price-parser
Extract price amount and currency symbol from a raw text string
scrapinghub/article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts
scrapinghub/python-scrapinghub
A client interface for Scrapinghub's API
scrapinghub/shub
Scrapinghub Command Line Client
scrapinghub/number-parser
Parse numbers written in natural language
scrapinghub/scrapy-poet
Page Object pattern for Scrapy
scrapinghub/web-poet
Web scraping Page Objects core library
scrapinghub/scrapinghub-stack-scrapy
Software stack with latest Scrapy and updated deps
scrapinghub/scrapy-frontera
More flexible and featured Frontera scheduler for Scrapy
scrapinghub/docker-images
scrapinghub/scrapinghub-entrypoint-scrapy
Scrapy entrypoint for Scrapinghub job runner
scrapinghub/andi
Library for annotation-based dependency injection
scrapinghub/shublang
Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
scrapinghub/shub-workflow
scrapinghub/Formasaurus
Formasaurus tells you the type of an HTML form and its fields using machine learning
scrapinghub/hcf-backend
Crawl Frontier HCF backend
scrapinghub/varanus
A command line spider monitoring tool
scrapinghub/pgcontents
A Postgres-backed ContentsManager implementation for IPython
scrapinghub/social-app-django
Python Social Auth - Application - Django
scrapinghub/streamparse
streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
scrapinghub/sklearn-crfsuite
scikit-learn inspired API for CRFsuite
scrapinghub/woodpecker
An opinionated fork of the Drone CI system