Pinned Repositories
actor-page-analyzer
Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
actor-scraper
House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
apify-cli
Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.
apify-client-js
Apify API client for JavaScript / Node.js.
apify-sdk-js
Apify SDK monorepo
apify-sdk-python
The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
fingerprint-suite
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
got-scraping
HTTP client made for scraping based on got.
proxy-chain
Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
Apify's Repositories
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify/proxy-chain
Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
apify/fingerprint-suite
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
apify/apify-cli
Apify command-line interface helps you create, develop, build and run Apify actors, and manage the Apify cloud platform.
apify/apify-sdk-python
The Apify SDK for Python is the official library for creating Apify Actors in Python. It provides useful features like actor lifecycle management, local storage emulation, and actor event handling.
apify/apify-sdk-js
Apify SDK monorepo
apify/apify-actor-docker
Base Docker images for Apify actors.
apify/apify-client-js
Apify API client for JavaScript / Node.js.
apify/apify-client-python
Apify API client for Python
apify/apify-docs
This project is the home of Apify's documentation.
apify/actor-templates
This project is the :house: home of Apify actor template projects to help users quickly get started.
apify/actor-web-automation-agent
This is the experimental version of Web Automation Agent. The agent uses natural language instructions to browse the web and extract data.
apify/apify-shared-js
Utilities and constants shared across Apify projects.
apify/super-scraper
Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
apify/workflows
Apify's reusable github workflows
apify/apify-storage-local-js
Local emulation of the apify-client NPM package, which enables local use of Apify SDK.
apify/langchain
⚡ Building applications with LLMs through composability ⚡
apify/llama-hub
A library of data loaders for LLMs made by the community -- to be used with GPT Index and/or LangChain
apify/llama_index
LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data.
apify/phantomjs
Scriptable Headless WebKit
apify/actor-html-to-pdf
Apify Actor to convert HTML string to pdf
apify/actor-vector-database-integrations
Transfer data from Apify Actors to vector databases (Pinecone, Chroma)
apify/algolite
An Implementation of Algolia to emulate its REST API
apify/aws-ecr-action
This Action allows you to create Docker images and push into a ECR repository.
apify/docusaurus-plugin-typedoc-api
Apify's fork of `docusaurus-plugin-typedoc-api`, customized for our Python documentation.
apify/gecko-dev
A fork of the official repository of Mozilla Firefox, which is used for Apify's custom Firefox build.
apify/langchainjs
apify/openapi
An OpenAPI specification for the Apify API.
apify/s3-node-20-bug-repro
A temporary repository with reproduction of a bug with uploads to AWS S3