Scrapers
A list of scrapers from around the web.
Find your way through with the Table of Contents. It will showcase the entire list with easy navigate to their pros and cons while also providing links to their respective websites.
Please contribute by adding links, adding pros/cons, titles, or anything else you think would be helpful! Please help maintain alphabetical order.
Table Of Contents
- Apifier (link)
- Beautiful Soup (link)
- Clearbit (link)
- Common Crawl (link)
- Crawly (link)
- Dexi.io (link)
- Diffbot (link)
- eLink (link)
- EliteProxySwitcher (link)
- Email Hunter (link)
- FiveFilters (link)
- FMiner (link)
- FullContact (link)
- Grabby (link)
- Import.io (link)
- Kimonolabs (link)
- lxml (link)
- Mozenda (link)
- Nutch (link)
- Outwit Hub (link)
- Octoparse (link)
- rvest (link)
- scrape-it (link)
- ScraperWiki (link)
- Scrapinghub (link)
- Screen Scraper (link)
- Toofr (link)
- UBot Studio (link)
- UiPath (link)
- Web Robots (link)
- Web Scraper (link)
- WrapAPI (link)
- X-Ray (link)
Apifier
Description: Cloud-based scraper for JavaScript.
Pros
Cons
Applicable Language(s)
- JavaScript
Beautiful Soup
Description: A Python library for navigating and parsing results from the Web. It allow for searching the HTML tree to find various tags.
Pros
Cons
Applicable Language(s)
- Python
Clearbit
Description: Service for looking up company and people information.
Pros
Cons
Applicable Language(s)
Common Crawl
Description: Open dataset of crawled websites.
Pros
Cons
Applicable Language(s)
Crawly
Description: Automatic service that turns a website into structured data in the form of JSON or CSV.
Pros
Cons
Applicable Language(s)
Dexi.io
Description: Website data extraction using a visual programming language.
Pros
Cons
Applicable Language(s)
Diffbot
Description: Automated tool for extracting structured information from pages, crawling websites, and turning a website into an API.
Pros
Cons
Applicable Language(s)
eLink
Description: Tool to mine LinkedIn profiles based on keywords.
Pros
Cons
Applicable Language(s)
EliteProxySwitcher
Description: Local software that can download a proxy list and let users choose which one to use.
Pros
Cons
Applicable Language(s)
Email Hunter
Description: API to find e-mail addresses for a given domain name.
Pros
Cons
Applicable Language(s)
FiveFilters
Description: Provide various website extraction and transformation tools such as Full-Text RSS and Term Extraction as services.
Pros
Cons
Applicable Language(s)
FMiner
Description: Local software for web scraping using a recording and a visual programming language.
Pros
Cons
Applicable Language(s)
FullContact
Description: API to retrieve more information on a person.
Pros
Cons
Applicable Language(s)
Grabby
Description: Service that searches a website for e-mails.
Pros
Cons
Applicable Language(s)
Import.io
Description: Automated tool to extract structured information from websites.
Pros
Cons
Applicable Language(s)
Kimonolabs
Description: Kimono was acquired by Palantir. This was a cloud-based service for turning websites into structured APIs. Now they offer a desktop-based alternative for continuing to use their tools.
Pros
Cons
Applicable Language(s)
lxml
Description: lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.
Pros
- Incredibly fast (see: Python HTML Parser Performance)
Cons
Applicable Language(s)
- Python
Mozenda
Description: Extract structured information from HTML, PDF, Excel, and Word by clicking on document elements.
Pros
Cons
Applicable Language(s)
Nutch
Description: Web crawler that can be combined with the Hadoop ecosystem to run in a cluster.
Pros
Cons
Applicable Language(s)
Outwit Hub
Description: Application that can extract information from a website and turn it into structured data (CSV, Excel, etc.).
Pros
Cons
Applicable Language(s)
Octoparse
Description: The free web scraping tool for extracting all the web page data into several structured file formats easily and effectively.
Pros
Cons
Applicable Language(s)
rvest
Description: R package to scrape information from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup.
Pros
Cons
Applicable Language(s)
- R
scrape-it
Description: A Node.js scraper for humans.
Pros
Cons
Applicable Language(s)
- JavaScript (Node.js)
ScraperWiki
Description: Write a scraper in the browser and run on their cloud-based service. This is used by many news organisations.
Pros
Cons
Applicable Language(s)
Scrapinghub
Description: Scraper cloud hosting as a service. Allows developers to deploy their own scrapers on their platform and benefit from their existing infrastructure.
Pros
Cons
Applicable Language(s)
Screen Scraper
Description: Local tool for scraping websites.
Pros
Cons
Applicable Language(s)
Toofr
Description: Service for looking up business e-mails.
Pros
Cons
Applicable Language(s)
UBot Studio
Description: Web automation software using a visual programming language and recorder.
Pros
Cons
Applicable Language(s)
UiPath
Description: Visual tool for GUI automation by recording.
Pros
Cons
Applicable Language(s)
Web Robots
Description: Data as a Service platform for web scraping.
Pros
- Scraping dynamic javascript heavy websites
- Login and form fill on websites
- Data normalization and validation
- Data uploads
Cons
- Currently in beta
- Possible payment model in the future
Applicable Language(s)
Web Scraper
Description: Extension that downloads websites and turns them into structured data. Data is selected by element or by specialised selectors (e.g., for tables).
Pros
Cons
Applicable Language(s)
WrapAPI
Description: Turn a website into an API. The structure of the data is defined by clicking elements or regular expressions.
Pros
Cons
Applicable Language(s)
X-Ray
Description: NPM module for scraping structured data via jQuery-like selectors.
Pros
Cons
Applicable Language(s)
- JavaScript (Node.js)