A lightweight, dependency-free (besides libcurl) command-line tool written in C to download the transcript of any YouTube video. It directly calls YouTube's internal API, mimicking an iOS client, to fetch the transcript data without requiring an API key.
- No API Key Needed: Fetches transcripts by mimicking a legitimate client request.
- Language Selection: Specify the desired language for the transcript (e.g., "en", "es", "fr").
- Lightweight: Written in plain C with
libcurlas the only external dependency. - Simple Output: Prints the transcript text directly to standard output for easy piping and redirection.
- Self-Contained: The required
cJSONlibrary is included in the source.
To build this project, you will need:
- A C compiler (like
gccorclang) makelibcurl(development version)
You can install libcurl on Debian/Ubuntu with:
sudo apt-get update
sudo apt-get install libcurl4-openssl-devOn openSUSE:
sudo zypper install libcurl-develOn macOS (using Homebrew):
brew install curl-
Clone the repository:
git clone https://github.com/Zibri/youtube_transcript cd youtube_transcript -
Compile the code: Simply run
maketo use the providedMakefile.make
This will compile the source files (
youtube_transcript.c,cJSON.c) and create an executable namedyoutube_transcript. The executable is stripped of debug symbols to reduce its size. -
Clean up build files: To remove the compiled object files and the executable, run:
make clean
Run the program from your terminal with the YouTube video ID as the first argument. You can optionally provide a language code as the second argument (defaults to "en").
Syntax:
./youtube_transcript <video_id> [language_code]
-
Get a transcript in English (default):
./youtube_transcript dQw4w9WgXcQ
-
Get a transcript in French:
./youtube_transcript dQw4w9WgXcQ fr
-
Save the transcript to a file:
./youtube_transcript dQw4w9WgXcQ > transcript.txt
This program works by reverse-engineering the API call made by the YouTube iOS application to fetch video transcripts.
- Protobuf Simulation: The tool manually constructs a binary protobuf (Protocol Buffers) message. This message contains the video ID and the desired language parameters, formatted exactly as the official app would send them.
- Encoding: The protobuf data is Base64 encoded and then URL-encoded to ensure it can be safely transmitted within a JSON payload.
- JSON Payload: A JSON object is created, which includes a
contextblock to identify the client as a specific version of the YouTube iOS app. The encoded protobuf data is included as theparamsvalue. - API Request: A POST request is sent to YouTube's internal API endpoint (
/youtubei/v1/get_transcript) usinglibcurl. The request includes specificUser-AgentandX-Youtube-Clientheaders to appear as a legitimate request from an iPhone. - Response Parsing: The JSON response from the API is parsed using the included
cJSONlibrary to navigate the complex object structure and extract the transcript segments. - Output: Each transcript segment is printed to the standard output.
This tool uses an unofficial, internal YouTube API. This API can change at any time without warning, which may cause this tool to stop working. This project is intended for educational purposes to demonstrate how to interact with web APIs in C. Use at your own risk.