Note
Note: Downloading the audio in tweets is not supported yet.
Warning
Do not leak your cookie.json
, this will lead to your Twitter account being stolen.
- This tool can automatically simulate browser operations to crawl all users' tweets and save all static resources (videos, pictures) locally without calling the Twitter API.
- At the same time, sqlite3 is used to save the crawled data as an index file for easy query.
- Install
Python3.10+
- Install
Poetry
- Install
Chrome 119.0+
- Run the command
poetry install
in the directory withpyproject.toml
- Configure
config.yaml
- Edit line 69 of
/twitter_user_tweet_crawler/__main__.py
- Prepare Chrome user data folder (set data_dir to
/twitter_user_tweet_crawler/userdata/
as an example)- Create a new folder under
/twitter_user_tweet_crawler/userdata/
- If you need n browser instances at the same time, create n+1 folders
- For example, you need 3 threads to work at the same time
- Just create new
/twitter_user_tweet_crawler/userdata/1
/twitter_user_tweet_crawler/userdata/2
/twitter_user_tweet_crawler/userdata/3
/twitter_user_tweet_crawler/userdata/4
- Create a new folder under
- Pre-configured Chrome
- Execute the command
/usr/bin/google-chrome-stable --user-data-dir=<data_dir>/1
- Install Tampermonkey extension
- Open the
Tampermonkey extension
interface to create a new js, copy the content inscript.js
Ctrl+S - Change the browser save path to
/twitter_user_tweet_crawler/output/res
- ...and so on until all configurations are completed
- Execute the command
- Run the command in the upper-level directory with
pyproject.toml
poetry run python3 -m twitter_user_tweet_crawler
- Log in to Twitter
- Press the Enter key
- Done.