web-crawling
There are 298 repositories under web-crawling topic.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
omkarcloud/botasaurus
The All in One Framework to Build Undefeatable Scrapers
cxcscmu/Craw4LLM
Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"
scrapehero-code/amazon-scraper
A simple web scraper to extract Product Data and Pricing from Amazon
crwlrsoft/crawler
Library for Rapid (Web) Crawler and Scraper Development
spyboy-productions/omnisci3nt
Omnisci3nt – See What They’ve Tried to Hide Extract deep intelligence from any domain. From subdomains to SSL certs, archived secrets to exposed ports — Omnisci3nt gives you the full picture in seconds.
jrbadiabo/Bet-on-Sibyl
Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)
TurnerSoftware/InfinityCrawler
A simple but powerful web crawler library for .NET
godkingjay/selenium-twitter-scraper
This is a Twitter Scraper which uses Selenium for scraping tweets. It is capable of scraping tweets from home, user profile, hashtag, query or search, and advanced searches.
ayakashi-io/ayakashi
:zap: Ayakashi.io - The next generation web scraping framework
serpapi/clauneck
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
scrapinghub/scrapy-training
Scrapy Training companion code
brianmadden/krawler
A web crawling framework written in Kotlin
fintech-hub/bancocentralbrasil
💵 💰 :brazil: Informações sobre taxas oficiais diárias de Inflação, Selic, Poupança, Dólar, Dólar PTAX, Euro e Euro PTAX pelo site do Banco Central do Brasil
MaxValue/Terpene-Profile-Parser-for-Cannabis-Strains
Parser and database to index the terpene profile of different strains of Cannabis from online databases
my8100/scrapyd-cluster-on-heroku
Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO :point_right:
maxmindlin/scout-lang
A web crawling programming language
SoheilKhodayari/JAW
JAW: A Graph-based Security Analysis Framework for Client-side JavaScript
jonasjacek/robots.txt
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
ScrapingAnt/amazon_scraper
Amazon products scraper with using of rotating proxies and headless Chrome from ScrapingAnt
alyakhtar/Katastrophe
Command Line Tool to download torrents
spyboy-productions/PhantomCrawler
Boost website hits by generating requests from multiple proxy IPs.
GoTrained/Scrapy-Craigslist
Web Scraping Craigslist's Engineering Jobs in NY with Scrapy
sushantPatrikar/Amazon-Flipkart-Price-Comparison-Engine
Compares price of the product entered by the user from e-commerce sites Amazon and Flipkart :moneybag: :bar_chart:
dongweiming/daenerys
Scraping and Web Crawling Framework For Zhihu Live
jgujerry/python-frameworks
Another curated list of Python frameworks
MohamedHmini/tweetsOLAPing
implementing an end-to-end tweets ETL/Analysis pipeline.
ScaleUnlimited/flink-crawler
Continuous scalable web crawler built on top of Flink and crawler-commons
mike-gee/webtranspose
Web scraping API for building AI applications.
Cheng-Lin-Li/KnowledgeGraph
This repository for Web Crawling, Information Extraction, and Knowledge Graph build up.
chrislicodes/udacity-data-analyst-nanodegree
Repository for the projects needed to complete the Data Analyst Nanodegree.
ScrapingAnt/zoominfo_scraper
Zoominfo scraper with using of rotating proxies and headless Chrome from ScrapingAnt
HRN-Projects/amazon-captcha-solver
A TensorFlow (Deep Learning - CNN) based solution for tackling captcha when collecting data from Amazon.
zytedata/spidyquotes
Example site for web scraping tutorials
kapilkchaurasia/Data-mining-python-script
It contain various script on web crawling/ data mining of social web(RSS,facebook,twitter,Linkedin)