apify

There are 107 repositories under apify topic.

  • crawlee

    apify/crawlee

    Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Language:TypeScript15.5k101880664
  • apify/crawlee-python

    Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

    Language:Python4.2k26211295
  • apify/apify-sdk-js

    Apify SDK monorepo

    Language:TypeScript12372535
  • apify/apify-cli

    Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.

    Language:TypeScript1221119518
  • apify-sdk-python

    apify/apify-sdk-python

    The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.

    Language:Python119107311
  • apify/actor-scraper

    House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.

    Language:JavaScript11785844
  • superryeti/Hands-on-WebScraping

    This repo is a part of blog series on several web scraping projects where we will explore scraping techniques to crawl data from simple websites to websites using advanced protection.

    Language:Python82
  • VaclavRut/actor-amazon-crawler

    Amazon crawler - this configuration will extract items for a keywords that you will specify in the input, and it will automatically extract all pages for the given keyword. You can specify more keywords on the input for one run.

    Language:JavaScript7252033
  • maxCopell/tripadvisor-scraper

    Scrape Tripadvisor restaurant, hotels, and places.

    Language:JavaScript4984329
  • apify/apify-client-python

    Apify API client for Python

    Language:Python47106411
  • MrXujiang/crawel

    基于Apify+node+react搭建的有点意思的爬虫平台

    Language:JavaScript416017
  • JuroOravec/crawlee-one

    Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with features for robust and highly configurable web scrapers.

    Language:TypeScript25112
  • actor-youtube-scraper

    bernardro/actor-youtube-scraper

    Apify actor to scrape Youtube search results. You can set the maximum videos to scrape per page as well as the date from which to start scraping.

    Language:JavaScript2373519
  • apify/actor-content-checker

    You can use this act to monitor any page's content and get a notification when content changes.

    Language:JavaScript19101317
  • apify/super-scraper

    Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!

    Language:TypeScript16603
  • metalwarrior665/actor-google-sheets

    No more dealing with Google API. Simple Node.js program to automate access to Google Sheets.

    Language:JavaScript165144
  • lhotanok/actor-ticketmaster-scraper

    Apify actor for scraping events from Ticketmaster based on their categories

    Language:JavaScript14110
  • sauermar/web-browser-recorder

    Web application for recording, management and editing of inteligent RPA workflows using Playwright technology

    Language:TypeScript14431
  • pocesar/actor-shopify-scraper

    Automate monitoring prices on the most popular solution for building online stores and selling products online. Crawl arbitrary Shopify-powered online stores and extract a list of all products in a structured form, including product title, price, description, etc

    Language:JavaScript13249
  • devblack/curlx

    CurlX a basic Curl syntax

    Language:PHP123112
  • pocesar/actor-twitter-scraper

    Scrape any Twitter user profile. Extract tweets, retweets, replies, favorites, and conversation threads with no Twitter API limits

    Language:TypeScript1211611
  • metalwarrior665/actor-article-extractor-smart

    Combines Apify's crawling system and article parsing with unfluff library.

    Language:JavaScript11375
  • metalwarrior665/actor-rust-scraper

    Experimental scraper in Rust suited for running locally or on the Apify platform. Inspired by Apify SDK.

    Language:Rust11303
  • petrpatek/airbnb-scraper

    Apify public actor for scraping Airbnb homes.

    Language:JavaScript11235
  • pocesar/apify-login-session

    Grab a session for any website for usage on your own actor

    Language:TypeScript10235
  • apify/actor-scrapy-executor

    Apify actor to run web spiders written in Python in the Scrapy library

    Language:Python9345
  • lhotanok/zalando-scraper

    Apify actor extracting data from Zalando

    Language:TypeScript9115
  • apify-projects/store-website-checker

    Analyzes target website for anti-scraping protections and performance. Saves screenshots/HTML snapshots.

    Language:TypeScript8226
  • apify/apify-zapier-integration

    Apify integration for Zapier

    Language:JavaScript810171
  • cermak-petr/act-anti-captcha-recaptcha

    Apify act for solving google recaptcha using the anti-captcha.com service.

    Language:JavaScript81210
  • Nikolay-Lysenko/servifier

    An easy-to-use tool for making web service with API from your own Python functions.

    Language:Python8200
  • ScaleLeap/zine-not-amazon-scraper

    How to Scrape Amazon Search Results

    Language:JavaScript7201
  • apify/actor-legacy-phantomjs-crawler

    The actor implements the legacy Apify Crawler product. It uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of JavaScript code.

    Language:JavaScript6444
  • ganevdev/actor-webdesignernews-scraper

    Scraper for www.webdesignernews.com, using Apify.

    Language:TypeScript6200
  • lhotanok/erasmus-plus-organisation-scraper

    Search organisations from Erasmus+ and European Solidarity Corps programmes. Define search criteria such as legal name, business name, hyperlink, PIC or an OID from: https://webgate.ec.europa.eu/erasmus-esc/index/organisations/search-for-an-organisation

    Language:TypeScript4200
  • Strajk/actor-github-repositories-search-scraper

    Apify actor for extracting repositories from GitHub based on search queries

    Language:JavaScript410