Scrapy & Playwright 爬蟲

安裝環境

Linux, Mac

# 進入虛擬環境
pip install scrapy-playwright
playwright install chromium

Windows

因為Windows沒辦法直接用scrapy-playwright,所以要退一步用docker。

docker run -it --ipc=host -v="<本專案在你的電腦的目錄, e.g.: C:\Users\blueb\Repo\dynamic_crawler_venv\dynamic_crawler_tutorial
\>":/var mcr.microsoft.com/playwright/python:v1.35.0-jammy tail -f /dev/null
docker exec -it <container id or name> /bin/bash
# 進入容器裡面
cd /var/dynamic_crawler_tutorial # 進入本專案目錄(容器中的在/var/dynamic_crawler_tutorial, 已經和本機的專案目錄連在一起了)
pip install scrapy-playwright # 安裝套件
playwright install chromium # 安裝瀏覽器driver

執行

名言佳句

# 爬 quote 這隻爬蟲並輸出到 results.json(和settings.py檔案同一個階層)
scrapy crawl quote -O results.json

Google評論

scrapy crawl googlereview -O results.json
⚠️
這支程式會存瀏覽器執行過程影片到tutorial目錄底下