Zyte (formerly Scrapinghub)
Access clean, valuable data with web scraping services that drive your business forward
Pinned Repositories
clear-html
Remove DIVs, style stuff and normalize HTML preserving structure information
flattering
Flatten, format, and export any JSON-like data to CSV (or any other string output).
python-zyte-api
Python client for Zyte API
spidyquotes
Example site for web scraping tutorials
web-snap
Create "perfect" snapshots of web pages
zyte-autoextract
Python clients for Zyte AutoExtract API
zyte-common-items
Contains the common item definitions used in Zyte.
zyte-smartproxy-headless-proxy
A complimentary proxy to help to use SPM with headless browsers
zyte-spider-templates
Spider templates for automatic crawlers.
zyte-spider-templates-project
Zyte (formerly Scrapinghub)'s Repositories
zytedata/spidyquotes
Example site for web scraping tutorials
zytedata/web-snap
Create "perfect" snapshots of web pages
zytedata/zyte-spider-templates
Spider templates for automatic crawlers.
zytedata/python-zyte-api
Python client for Zyte API
zytedata/zyte-spider-templates-project
zytedata/zyte-parsers
zytedata/html-text
zytedata/zyte-common-items
Contains the common item definitions used in Zyte.
zytedata/clear-html
Remove DIVs, style stuff and normalize HTML preserving structure information
zytedata/url-matcher
zytedata/extract-summit-contest-solutions
Example solutions for the practice and contest websites of the code contest of Web Data Extraction Summit.
zytedata/scrapy-time-machine
Run your spider against a site's snapshot
zytedata/sctools
Analyze your jobs on scrapy cloud
zytedata/web-scraping-tutorial-project
https://docs.zyte.com/web-scraping/tutorial/index.html
zytedata/duplicate-url-discarder
zytedata/kafka-manager
A tool for managing Apache Kafka.
zytedata/zyte-smartproxy-selenium
A wrapper over Selenium Wire to provide Zyte Smart Proxy Manager specific functionalities.
zytedata/installimage
Bash scripts to universally deploy various distributions
zytedata/onefile
Merge multiples files into one!
zytedata/spidermon-workshop
zytedata/zyte-api-workshop
zytedata/dj-cloud-task
Django Cloud Task Queue. Integrate your Django Application with Google Cloud Task from Google Cloud Platform
zytedata/duplicate-url-discarder-rules
Contains rules for https://github.com/zytedata/duplicate-url-discarder.
zytedata/geventhttpclient
A high performance, concurrent http client library for python with gevent
zytedata/http-parser
Fork of 'https://github.com/benoitc/http-parser'
zytedata/locust
Write scalable load tests in plain Python 🚗💨
zytedata/openvscode-server
Run upstream VS Code on a remote machine with access through a modern web browser from any device, anywhere.
zytedata/rrweb
record and replay the web
zytedata/shifter-webhook-artifact-created
zytedata/unsloth_docker
A working Dockerfile that has unsloth with all the other dependencies