web-crawling

There are 298 repositories under web-crawling topic.

apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Language:TypeScript19.5k 117 1k1k
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Language:Python6.3k 34 373439
omkarcloud/botasaurus
The All in One Framework to Build Undefeatable Scrapers
Language:Python3k 24 197251
cxcscmu/Craw4LLM
Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"
Language:Python637 4 956
scrapehero-code/amazon-scraper
A simple web scraper to extract Product Data and Pricing from Amazon
Language:Python387 9 11157
crwlrsoft/crawler
Library for Rapid (Web) Crawler and Scraper Development
Language:PHP366 4 2013
spyboy-productions/omnisci3nt
Omnisci3nt – See What They’ve Tried to Hide Extract deep intelligence from any domain. From subdomains to SSL certs, archived secrets to exposed ports — Omnisci3nt gives you the full picture in seconds.
Language:Python299 3 237
jrbadiabo/Bet-on-Sibyl
Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
Language:Jupyter Notebook268 41 194
TurnerSoftware/InfinityCrawler
A simple but powerful web crawler library for .NET
Language:C#252 10 1336
godkingjay/selenium-twitter-scraper
This is a Twitter Scraper which uses Selenium for scraping tweets. It is capable of scraping tweets from home, user profile, hashtag, query or search, and advanced searches.
Language:Jupyter Notebook245 3 2061
ayakashi-io/ayakashi
:zap: Ayakashi.io - The next generation web scraping framework
Language:TypeScript213 5 08
serpapi/clauneck
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Language:Ruby176 6 113
scrapinghub/scrapy-training
Scrapy Training companion code
Language:Python174 116 345
brianmadden/krawler
A web crawling framework written in Kotlin
Language:Kotlin128 6 1516
fintech-hub/bancocentralbrasil
💵 💰 :brazil: Informações sobre taxas oficiais diárias de Inflação, Selic, Poupança, Dólar, Dólar PTAX, Euro e Euro PTAX pelo site do Banco Central do Brasil
Language:Python124 11 1734
MaxValue/Terpene-Profile-Parser-for-Cannabis-Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
Language:Python123 17 018
my8100/scrapyd-cluster-on-heroku
Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO :point_right:
Language:Python122 6 1288
maxmindlin/scout-lang
A web crawling programming language
Language:Rust113 2 36
SoheilKhodayari/JAW
JAW: A Graph-based Security Analysis Framework for Client-side JavaScript
Language:JavaScript105 2 1516
jonasjacek/robots.txt
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
87 7 038
ScrapingAnt/amazon_scraper
Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt
Language:JavaScript85 5 719
alyakhtar/Katastrophe
Command Line Tool to download torrents
Language:Python83 6 1112
spyboy-productions/PhantomCrawler
Boost website hits by generating requests from multiple proxy IPs.
Language:Python73 1 09
GoTrained/Scrapy-Craigslist
Web Scraping Craigslist's Engineering Jobs in NY with Scrapy
Language:Python66 4 137
sushantPatrikar/Amazon-Flipkart-Price-Comparison-Engine
Compares price of the product entered by the user from e-commerce sites Amazon and Flipkart :moneybag: :bar_chart:
Language:Python64 4 535
dongweiming/daenerys
Scraping and Web Crawling Framework For Zhihu Live
Language:Python63 6 130
jgujerry/python-frameworks
Another curated list of Python frameworks
Language:Python61 2 04
MohamedHmini/tweetsOLAPing
implementing an end-to-end tweets ETL/Analysis pipeline.
Language:Python57 3 07
ScaleUnlimited/flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
Language:Java52 10 11118
mike-gee/webtranspose
Web scraping API for building AI applications.
Language:Python41 1 42
Cheng-Lin-Li/KnowledgeGraph
This repository for Web Crawling, Information Extraction, and Knowledge Graph build up.
Language:Julia33 2 04
chrislicodes/udacity-data-analyst-nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
Language:Jupyter Notebook33 1 023
ScrapingAnt/zoominfo_scraper
Zoominfo scraper with using of rotating proxies and headless Chrome from ScrapingAnt
Language:Python32 4 89
HRN-Projects/amazon-captcha-solver
A TensorFlow (Deep Learning - CNN) based solution for tackling captcha when collecting data from Amazon.
Language:Python31 2 214
zytedata/spidyquotes
Example site for web scraping tutorials
Language:Julia31 9 418
kapilkchaurasia/Data-mining-python-script
It contain various script on web crawling/ data mining of social web(RSS,facebook,twitter,Linkedin)
Language:Python29 2 019

web-crawling

apify/crawlee

apify/crawlee-python

omkarcloud/botasaurus

cxcscmu/Craw4LLM

scrapehero-code/amazon-scraper

crwlrsoft/crawler

spyboy-productions/omnisci3nt

jrbadiabo/Bet-on-Sibyl

TurnerSoftware/InfinityCrawler

godkingjay/selenium-twitter-scraper

ayakashi-io/ayakashi

serpapi/clauneck

scrapinghub/scrapy-training

brianmadden/krawler

fintech-hub/bancocentralbrasil

MaxValue/Terpene-Profile-Parser-for-Cannabis-Strains

my8100/scrapyd-cluster-on-heroku

maxmindlin/scout-lang

SoheilKhodayari/JAW

jonasjacek/robots.txt

ScrapingAnt/amazon_scraper

alyakhtar/Katastrophe

spyboy-productions/PhantomCrawler

GoTrained/Scrapy-Craigslist

sushantPatrikar/Amazon-Flipkart-Price-Comparison-Engine

dongweiming/daenerys

jgujerry/python-frameworks

MohamedHmini/tweetsOLAPing

ScaleUnlimited/flink-crawler

mike-gee/webtranspose

Cheng-Lin-Li/KnowledgeGraph

chrislicodes/udacity-data-analyst-nanodegree

ScrapingAnt/zoominfo_scraper

HRN-Projects/amazon-captcha-solver

zytedata/spidyquotes

kapilkchaurasia/Data-mining-python-script