A 100% Typed, async-native automation library built for modern bot evasion and high-performance scraping.
📖 Full Documentation • 🚀 Getting Started • ⚡ Advanced Features • 🧠 Deep Dives • 💖 Support This Project
Pydoll is built on a simple philosophy: powerful automation shouldn't require you to fight the browser.
Forget broken webdrivers, compatibility issues, or being blocked by navigator.webdriver=true. Pydoll connects directly to the Chrome DevTools Protocol (CDP), providing a natively asynchronous, robust, and fully typed architecture.
It's designed for modern scraping, combining an intuitive high-level API (for productivity) with deep-level control over the network and browser behavior (for evasion), allowing you to bypass complex anti-bot defenses.
- Stealth-by-Design: Pydoll is built for evasion. Our human-like interactions simulate real user clicks, typing, and scrolling to pass behavioral analysis, while granular Browser Preferences control lets you patch your browser fingerprint.
- Async & Typed Architecture: Built from the ground up on
asyncioand 100% type-checked withmypy. This means top-tier I/O performance for concurrent tasks and a fantastic Developer Experience (DX) with autocompletion and error-checking in your IDE. - Total Network Control: Go beyond basic HTTP proxies. Pydoll gives you tools to intercept (to block ads/trackers) and monitor traffic, plus deep documentation on why SOCKS5 is essential to prevent DNS leaks.
- Hybrid Automation (The Game-Changer): Use the UI automation to log in, then use
tab.requestto make blazing-fast API calls that inherit the entire browser session. - Ergonomics Meets Power: Easy for the simple, powerful for the complex. Use
tab.find()for 90% of cases andtab.query()for complex CSS/XPath selectors.
pip install pydoll-pythonThat's it. No webdrivers. No external dependencies.
Thanks to its async architecture and context managers, Pydoll is clean and efficient.
import asyncio
from pydoll.browser import Chrome
from pydoll.constants import Key
async def google_search(query: str):
# Context manager handles browser start() and stop()
async with Chrome() as browser:
tab = await browser.start()
await tab.go_to('https://www.google.com')
# Intuitive finding API: find by HTML attributes
search_box = await tab.find(tag_name='textarea', name='q')
# "Human-like" interactions simulate typing
await search_box.insert_text(query)
await search_box.press_keyboard_key(Key.ENTER)
# Find by text and click (simulates mouse movement)
first_result = await tab.find(
tag_name='h3',
text='autoscrape-labs/pydoll', # Supports partial text matching
timeout=10,
)
await first_result.click()
# Wait for an element to confirm navigation
await tab.find(id='repository-container-header', timeout=10)
print(f"Page loaded: {await tab.title}")
asyncio.run(google_search('pydoll python'))Pydoll is a complete toolkit for professional automation.
1. Hybrid Automation (UI + API): The Game-Changer
Tired of manually extracting and managing cookies to use requests or httpx? Pydoll solves this.
Use the UI automation to pass a complex login (with CAPTCHAs, JS challenges, etc.) and then use tab.request to make authenticated API calls that automatically inherit all cookies, headers, and session state from the browser. It's the best of both worlds: the robustness of UI automation for auth, and the speed of direct API calls for data extraction.
# 1. Log in via the UI (handles CAPTCHAs, JS, etc.)
await tab.go_to('https://my-site.com/login')
await (await tab.find(id='username')).type_text('user')
await (await tab.find(id='password')).type_text('pass123')
await (await tab.find(id='login-btn')).click()
# 2. Now, use the browser's session to hit the API!
# This request automatically INHERITS the login cookies
response = await tab.request.get('https://my-site.com/api/user/profile')
user_data = response.json()
print(f"Welcome, {user_data['name']}!")2. Total Network Control: Monitor & Intercept Traffic
Take full control of the network stack. Pydoll allows you to not only monitor traffic for reverse-engineering APIs but also to intercept requests in real-time.
Use this to block ads, trackers, images, or CSS to dramatically speed up your scraping and save bandwidth, or even to modify headers and mock API responses for testing.
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.protocol.fetch.events import FetchEvent, RequestPausedEvent
from pydoll.protocol.network.types import ErrorReason
async def block_images():
async with Chrome() as browser:
tab = await browser.start()
async def block_resource(event: RequestPausedEvent):
request_id = event['params']['requestId']
resource_type = event['params']['resourceType']
url = event['params']['request']['url']
# Block images and stylesheets
if resource_type in ['Image', 'Stylesheet']:
await tab.fail_request(request_id, ErrorReason.BLOCKED_BY_CLIENT)
else:
# Continue other requests
await tab.continue_request(request_id)
await tab.enable_fetch_events()
await tab.on(FetchEvent.REQUEST_PAUSED, block_resource)
await tab.go_to('https://example.com')
await asyncio.sleep(3)
await tab.disable_fetch_events()
asyncio.run(block_images())3. Deep Browser Control: The Fingerprint Evasion Manual
A User-Agent isn't enough. Pydoll gives you granular control over Browser Preferences, allowing you to modify hundreds of internal Chrome settings to build a robust and consistent fingerprint.
Our documentation doesn't just give you the tool; it explains in detail how canvas, WebGL, font, and timezone fingerprinting works, and how to use these preferences to defend your automation.
options = ChromiumOptions()
# Create a realistic and clean browser profile
options.browser_preferences = {
'profile': {
'default_content_setting_values': {
'notifications': 2, # Block notification popups
'geolocation': 2, # Block location requests
},
'password_manager_enabled': False # Disable "save password" prompt
},
'intl': {
'accept_languages': 'en-US,en', # Make consistent with your proxy IP
},
'browser': {
'check_default_browser': False, # Don't ask to be default browser
}
}4. Built for Scale: Concurrency, Contexts & Remote Connections
Pydoll is built for scale. Its async architecture allows you to manage multiple tabs and browser contexts (isolated sessions) in parallel.
Furthermore, Pydoll excels in production architectures. You can run your browser in a Docker container and connect to it remotely from your Python script, decoupling the controller from the worker. Our documentation includes guides on how to set up your own remote server.
# Example: Scrape 2 sites in parallel
async def scrape_page(url, tab):
await tab.go_to(url)
return await tab.title
async def concurrent_scraping():
async with Chrome() as browser:
tab_google = await browser.start()
tab_ddg = await browser.new_tab() # Create a new tab
# Execute both scraping tasks concurrently
tasks = [
scrape_page('https://google.com/', tab_google),
scrape_page('https://duckduckgo.com/', tab_ddg)
]
results = await asyncio.gather(*tasks)
print(results)5. Robust Engineering: `@retry` Decorator & 100% Typed
Reliable Engineering: Pydoll is fully typed, providing a fantastic Developer Experience (DX) with full autocompletion in your IDE and error-checking before you even run your code. Read about our Type System.
Robust-by-Design: The @retry decorator turns fragile scripts into production-ready automations. It doesn't just "try again"; it lets you execute custom recovery logic (on_retry), like refreshing the page or rotating a proxy, before the next attempt.
from pydoll.decorators import retry
from pydoll.exceptions import ElementNotFound, NetworkError
@retry(
max_retries=3,
exceptions=[ElementNotFound, NetworkError], # Only retry on these specific errors
on_retry=my_recovery_function, # Run your custom recovery logic
exponential_backoff=True # Wait 2s, 4s, 8s...
)
async def scrape_product(self, url: str):
# ... your scraping logic ...Pydoll is not a black box. We believe that to defeat anti-bot systems, you must understand them. Our documentation is one of the most comprehensive public resources on the subject, teaching you not just the "how," but the "why."
Understand how bots are detected and how Pydoll is designed to win.
- Evasion Techniques Guide: Our unified 3-layer evasion strategy.
- Network Fingerprinting: How your IP, TTL, and TLS (JA3) headers give you away.
- Browser Fingerprinting: How
canvas, WebGL, and fonts create your unique ID. - Behavioral Fingerprinting: Why mouse/keyboard telemetry is the new front line of detection.
Proxies are more than just --proxy-server.
- HTTP vs. SOCKS5: Why SOCKS5 is superior (it solves DNS leaks).
- Proxy Detection: How sites know you're using a proxy (WebRTC Leaks).
- Build Your Own Proxy: Yes, we even teach you how to build your own SOCKS5 proxy server in Python.
Software engineering you can trust.
- Domain-Driven Design (OOP): The clean architecture behind
Browser,Tab, andWebElement. - The FindElements Mixin: The magic behind the intuitive
find()API. - The Connection Layer: How Pydoll manages
asyncioand the CDP.
We would love your help to make Pydoll even better! Check out our contribution guidelines to get started.
If you find Pydoll useful, consider sponsoring my work on GitHub. Every contribution helps keep the project alive and drives new features!
Pydoll is licensed under the MIT License.
Pydoll — Web automation, taken seriously.
