A fast and reliable tool for exporting YouTube data.
Features
- Exports comments
- Exports live chat
- Efficient parsing using
fastjson
- Detailed access to channels and videos
- Scripting-friendly, favors stdio and works well with
mlr
andjq
. - Streaming-friendly, produces CSV/TSVs and NDJSON dumps.
If you need a large-scale data export, visit https://files.mine.terorie.dev/yt before starting a crawl.
Only use this tool to the extent permitted by the YouTube ToS.
- Battle-tested exporter
- Outputs an NDJSON stream
- With positional arguments: Dumps the comments of specified videos
- Without arguments: Reads the video IDs from stdin
- Example
ytwrk video comments < video_list.txt | jq -r '.content | .[0].text'
- Example
ytwrk video comments < video_list.txt | go run ./scripts/commments_to_csv.go | zstdmt -T8 ...
- Takes a video ID, fetches and parses it
- Outputs some information in JSON
- Not stable, might change
- Outputs an NDJSON stream
- With positional arguments: Dumps the specified videos
- Without arguments: Reads the video IDs from stdin
- Discovers additional videos through the related section (
-n=<count>
flag) - Optionally outputs found but not crawled videos (
-r=<file>
flag) - Remembers already crawled videos, so memory usage is O(n)
- Not stable, might change
- Example
ytwrk video dump < video_list.txt | nc ...
- Dumps a set of videos in raw tar format
- Useful for lossless crawls and analyzing data in-post
- Output is a
.tar
archive stream - Reads the video IDs to dump from stdin
- Example
ytwrk video dumpraw < video_list.txt | tar -xO | jq -r '...'
- Example
ytwrk video dumpraw < video_list.txt | zstdmt -T8 ...
- Consumes a tar stream produced by
video dumpraw
- Produces NDJSON like the
dump
command
- Experimental
- Exports live chat messages
- Get all uploads of channels
- Prints
channel_id<TAB>video_id
lines - With positional arguments: Dumps the specified channels
- Without arguments: Reads the channel IDs from stdin
- Example
ytwrk channel dump < channel_list.txt > video_list.tsv
- Dumps a set of channels in raw tar format
- Useful for lossless crawls and analyzing data in-post
- Output is a
.tar
archive stream - Reads the channel IDs to dump from stdin
- Example
ytwrk channel dumpraw < channel_list.txt | tar -xO | jq -r '...'
- Example
ytwrk channel dumpraw < channel_list.txt | zstdmt -T8 ...
- Experimental! Might change without warning
- Writes all video IDs in the specified playlist to stdout