/pydoll

Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.

Primary LanguagePythonMIT LicenseMIT

Pydoll Logo

Pydoll: The Evasion-First Web Automation Framework

A 100% Typed, async-native automation library built for modern bot evasion and high-performance scraping.

Tests Ruff CI MyPy CI Python >= 3.10 Ask DeepWiki

  📖 Full Documentation •   🚀 Getting Started •   ⚡ Advanced Features •   🧠 Deep Dives •   💖 Support This Project

Pydoll is built on a simple philosophy: powerful automation shouldn't require you to fight the browser.

Forget broken webdrivers, compatibility issues, or being blocked by navigator.webdriver=true. Pydoll connects directly to the Chrome DevTools Protocol (CDP), providing a natively asynchronous, robust, and fully typed architecture.

It's designed for modern scraping, combining an intuitive high-level API (for productivity) with deep-level control over the network and browser behavior (for evasion), allowing you to bypass complex anti-bot defenses.

The Pydoll Philosophy

  • Stealth-by-Design: Pydoll is built for evasion. Our human-like interactions simulate real user clicks, typing, and scrolling to pass behavioral analysis, while granular Browser Preferences control lets you patch your browser fingerprint.
  • Async & Typed Architecture: Built from the ground up on asyncio and 100% type-checked with mypy. This means top-tier I/O performance for concurrent tasks and a fantastic Developer Experience (DX) with autocompletion and error-checking in your IDE.
  • Total Network Control: Go beyond basic HTTP proxies. Pydoll gives you tools to intercept (to block ads/trackers) and monitor traffic, plus deep documentation on why SOCKS5 is essential to prevent DNS leaks.
  • Hybrid Automation (The Game-Changer): Use the UI automation to log in, then use tab.request to make blazing-fast API calls that inherit the entire browser session.
  • Ergonomics Meets Power: Easy for the simple, powerful for the complex. Use tab.find() for 90% of cases and tab.query() for complex CSS/XPath selectors.

📦 Installation

pip install pydoll-python

That's it. No webdrivers. No external dependencies.

🚀 Getting Started in 60 Seconds

Thanks to its async architecture and context managers, Pydoll is clean and efficient.

import asyncio
from pydoll.browser import Chrome
from pydoll.constants import Key

async def google_search(query: str):
    # Context manager handles browser start() and stop()
    async with Chrome() as browser:
        tab = await browser.start()
        await tab.go_to('https://www.google.com')

        # Intuitive finding API: find by HTML attributes
        search_box = await tab.find(tag_name='textarea', name='q')
        
        # "Human-like" interactions simulate typing
        await search_box.insert_text(query)
        await search_box.press_keyboard_key(Key.ENTER)

        # Find by text and click (simulates mouse movement)
        first_result = await tab.find(
            tag_name='h3',
            text='autoscrape-labs/pydoll', # Supports partial text matching
            timeout=10,
        )
        await first_result.click()

        # Wait for an element to confirm navigation
        await tab.find(id='repository-container-header', timeout=10)
        print(f"Page loaded: {await tab.title}")

asyncio.run(google_search('pydoll python'))

⚡ The Pydoll Feature Ecosystem

Pydoll is a complete toolkit for professional automation.

1. Hybrid Automation (UI + API): The Game-Changer

Tired of manually extracting and managing cookies to use requests or httpx? Pydoll solves this.

Use the UI automation to pass a complex login (with CAPTCHAs, JS challenges, etc.) and then use tab.request to make authenticated API calls that automatically inherit all cookies, headers, and session state from the browser. It's the best of both worlds: the robustness of UI automation for auth, and the speed of direct API calls for data extraction.

# 1. Log in via the UI (handles CAPTCHAs, JS, etc.)
await tab.go_to('https://my-site.com/login')
await (await tab.find(id='username')).type_text('user')
await (await tab.find(id='password')).type_text('pass123')
await (await tab.find(id='login-btn')).click()

# 2. Now, use the browser's session to hit the API!
# This request automatically INHERITS the login cookies
response = await tab.request.get('https://my-site.com/api/user/profile')
user_data = response.json()
print(f"Welcome, {user_data['name']}!")

📖 Read more about Hybrid Automation

2. Total Network Control: Monitor & Intercept Traffic

Take full control of the network stack. Pydoll allows you to not only monitor traffic for reverse-engineering APIs but also to intercept requests in real-time.

Use this to block ads, trackers, images, or CSS to dramatically speed up your scraping and save bandwidth, or even to modify headers and mock API responses for testing.

import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.protocol.fetch.events import FetchEvent, RequestPausedEvent
from pydoll.protocol.network.types import ErrorReason

async def block_images():
    async with Chrome() as browser:
        tab = await browser.start()

        async def block_resource(event: RequestPausedEvent):
            request_id = event['params']['requestId']
            resource_type = event['params']['resourceType']
            url = event['params']['request']['url']

            # Block images and stylesheets
            if resource_type in ['Image', 'Stylesheet']:
                await tab.fail_request(request_id, ErrorReason.BLOCKED_BY_CLIENT)
            else:
                # Continue other requests
                await tab.continue_request(request_id)

        await tab.enable_fetch_events()
        await tab.on(FetchEvent.REQUEST_PAUSED, block_resource)

        await tab.go_to('https://example.com')
        await asyncio.sleep(3)
        await tab.disable_fetch_events()

asyncio.run(block_images())

📖 Network Monitoring Docs | 📖 Request Interception Docs

3. Deep Browser Control: The Fingerprint Evasion Manual

A User-Agent isn't enough. Pydoll gives you granular control over Browser Preferences, allowing you to modify hundreds of internal Chrome settings to build a robust and consistent fingerprint.

Our documentation doesn't just give you the tool; it explains in detail how canvas, WebGL, font, and timezone fingerprinting works, and how to use these preferences to defend your automation.

options = ChromiumOptions()

# Create a realistic and clean browser profile
options.browser_preferences = {
    'profile': {
        'default_content_setting_values': {
            'notifications': 2,       # Block notification popups
            'geolocation': 2,        # Block location requests
        },
        'password_manager_enabled': False # Disable "save password" prompt
    },
    'intl': {
        'accept_languages': 'en-US,en', # Make consistent with your proxy IP
    },
    'browser': {
        'check_default_browser': False,   # Don't ask to be default browser
    }
}

📖 Full Guide to Browser Preferences

4. Built for Scale: Concurrency, Contexts & Remote Connections

Pydoll is built for scale. Its async architecture allows you to manage multiple tabs and browser contexts (isolated sessions) in parallel.

Furthermore, Pydoll excels in production architectures. You can run your browser in a Docker container and connect to it remotely from your Python script, decoupling the controller from the worker. Our documentation includes guides on how to set up your own remote server.

# Example: Scrape 2 sites in parallel

async def scrape_page(url, tab):
    await tab.go_to(url)
    return await tab.title

async def concurrent_scraping():
    async with Chrome() as browser:
        tab_google = await browser.start()
        tab_ddg = await browser.new_tab() # Create a new tab

        # Execute both scraping tasks concurrently
        tasks = [
            scrape_page('https://google.com/', tab_google),
            scrape_page('https://duckduckgo.com/', tab_ddg)
        ]
        results = await asyncio.gather(*tasks)
        print(results)

📖 Multi-Tab Management Docs | 📖 Remote Connection Docs

5. Robust Engineering: `@retry` Decorator & 100% Typed

Reliable Engineering: Pydoll is fully typed, providing a fantastic Developer Experience (DX) with full autocompletion in your IDE and error-checking before you even run your code. Read about our Type System.

Robust-by-Design: The @retry decorator turns fragile scripts into production-ready automations. It doesn't just "try again"; it lets you execute custom recovery logic (on_retry), like refreshing the page or rotating a proxy, before the next attempt.

from pydoll.decorators import retry
from pydoll.exceptions import ElementNotFound, NetworkError

@retry(
    max_retries=3,
    exceptions=[ElementNotFound, NetworkError], # Only retry on these specific errors
    on_retry=my_recovery_function,          # Run your custom recovery logic
    exponential_backoff=True              # Wait 2s, 4s, 8s...
)
async def scrape_product(self, url: str):
    # ... your scraping logic ...

📖 @retry Decorator Docs


🧠 More Than an API: A Knowledge Base

Pydoll is not a black box. We believe that to defeat anti-bot systems, you must understand them. Our documentation is one of the most comprehensive public resources on the subject, teaching you not just the "how," but the "why."

1. The Battle Against Fingerprinting (Strategic Guide)

Understand how bots are detected and how Pydoll is designed to win.

2. The Advanced Networking Manual (The Foundation)

Proxies are more than just --proxy-server.

3. Transparent Architecture (Software Engineering)

Software engineering you can trust.


🤝 Contributing

We would love your help to make Pydoll even better! Check out our contribution guidelines to get started.

💖 Support This Project

If you find Pydoll useful, consider sponsoring my work on GitHub. Every contribution helps keep the project alive and drives new features!

📄 License

Pydoll is licensed under the MIT License.

  Pydoll — Web automation, taken seriously.