- Take a URL, single video, list of URLs, or list of local videos + URLs and feed it into the script and have each video transcribed (and audio downloaded if not local) using faster-whisper.
- Transcriptions can then be shuffled off to an LLM API endpoint of your choice, whether that be local or remote.
- Rolling summaries (i.e. chunking up input and doing a chain of summaries) is supported only through OpenAI currently, though the scripts here will let you do it with exllama or vLLM.
- Any site supported by yt-dl is supported, so you can use this with sites besides just youtube. ( https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md )
For commercial API usage, I personally recommend Sonnet. It's great quality and relatively inexpensive.
As for personal offline usage, Microsoft Phi-3 Mini 128k is great if you don't have a lot of VRAM and want to self-host. (I think it's better than anything up to 70B for summarization - I do not have actual evidence for this)
-
Transcribe audio from a Youtube URL:
python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s
-
Transcribe audio from a Youtube URL & Summarize it using (
anthropic
/cohere
/openai
/llama
(llama.cpp)/ooba
(oobabooga/text-gen-webui)/kobold
(kobold.cpp)/tabby
(Tabbyapi)) API:python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s -api <your choice of API>
- Make sure to put your API key into
config.txt
under the appropriate API variable
- Make sure to put your API key into
-
Transcribe a list of Youtube URLs & Summarize them using (
anthropic
/cohere
/openai
/llama
(llama.cpp)/ooba
(oobabooga/text-gen-webui)/kobold
(kobold.cpp)/tabby
(Tabbyapi)) API:python summarize.py ./ListofVideos.txt -api <your choice of API>
- Make sure to put your API key into
config.txt
under the appropriate API variable
- Make sure to put your API key into
-
Transcribe & Summarize a List of Videos on your local filesytem with a text file:
python summarize.py -v ./local/file_on_your/system
-
Download a Video with Audio from a URL:
python summarize.py -v https://www.youtube.com/watch?v=4nd1CDZP21s
-
Run it as a WebApp
python summarize.py -gui
- This requires you to either stuff your API keys into theconfig.txt
file, or pass them into the app every time you want to use it.- It will expose every CLI option (not currently/is planned)
- Has an option to download the generated transcript, and summary as text files.
- Can also download video/audio as files if selected in the UI (WIP - doesn't currently work)
- Use the script to (download->)transcribe(->summarize) a local file or remote (supported) url.
- What can you transcribe and summarize?
- Any youtube video. Or video hosted at any of these sites: https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md
- (Playlists you have to use the
Get_Playlist_URLs.py
withGet_Playlist_URLs.py <Playlist URL>
and it'll create a text file with all the URLs for each video, so you can pass the text file as input and they'll all be downloaded. Pull requests are welcome.) - Any url youtube-dl supports should work.
- (Playlists you have to use the
- Local Videos
- Pass in the filepath to any local video file, and it will be transcribed.
- You can also pass in a text file containing a list of videos for batch processing.
- Any youtube video. Or video hosted at any of these sites: https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md
- How does it Summarize?
- Remote Summarization
- Pass an API name (anthropic/cohere/grok/openai/) as an argument, ex:
-api anthropic
- Add your API key to the
config.txt
file - The script when ran, will detect that you passed an API name, and will perform summarization of the resulting transcription.
- Pass an API name (anthropic/cohere/grok/openai/) as an argument, ex:
- Local Summarization
- Alternatively, you can pass
llama
/ooba
/kobold
/tabby
as the API name and have the script perform a request to your local API endpoint for summarization.- You will need to modify the
<endpoint_name>_api_IP
value in theconfig.txt
to reflect theIP:Port
of your local server. - Or pass the
--api_url
argument with theIP:Port
to avoid making changes to theconfig.txt
file. - If the self-hosted server requires an API key, modify the appropriate api_key variable in the
config.txt
file.
- You will need to modify the
- Alternatively, you can pass
- The current approach to summarization is currently 'dumb'/naive, and will likely be replaced or additional functionality added to reflect actual practices and not just 'dump txt in and get an answer' approach. This works for big context LLMs, but not everyone has access to them, and some transcriptions may be even longer, so we need to have an approach that can handle those cases.
- Remote Summarization
- APIs Currently Supported
- Anthropic - https://www.anthropic.com/api
- Cohere - https://docs.cohere.com/reference/about
- Groq - https://docs.api.groq.com/index.html
- Llama.cpp - https://github.com/ggerganov/llama.cpp & https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md
- Kobold.cpp - https://github.com/LostRuins/koboldcpp
- Oobabooga - https://github.com/oobabooga/text-generation-webui
- HuggingFace - https://huggingface.co/docs/api-inference/en/index
- Planned to Support
- TabbyAPI - https://github.com/theroyallab/tabbyAPI
- vLLM - https://github.com/vllm-project/vllm
- Linux
- Download necessary packages (Python3, ffmpeg[sudo apt install ffmpeg / dnf install ffmpeg], ?)
- Create a virtual env:
python -m venv ./
- Launch/activate your virtual env:
. .\scripts\activate.sh
- See
Linux && Windows
- Windows
- Download necessary packages (Python3, ffmpeg, ?)
- Create a virtual env:
python -m venv .\
- Launch/activate your virtual env:
. .\scripts\activate.ps1
- See
Linux && Windows
- Linux && Windows
pip install -r requirements.txt
- may take a bit of time...- Run
python ./summarize.py <video_url>
- The video URL does not have to be a youtube URL. It can be any site that ytdl supports. - You'll then be asked if you'd like to run the transcription through GPU(1) or CPU(2).
- Next, the video will be downloaded to the local directory by ytdl.
- Then the video will be transcribed by faster_whisper. (You can see this in the console output) * The resulting transcription output will be stored as both a json file with timestamps, as well as a txt file with no timestamps.
- Finally, you can have the transcription summarized through feeding it into an LLM of your choice.
- For running it locally, pass the '--local' argument into the script. This will download and launch a local inference server as part of the script. * This will take up at least 6 GB of space. (WIP - not in place yet)
- Single file (remote URL) transcription
- Single URL:
python summarize.py https://example.com/video.mp4
- Single URL:
- Single file (local) transcription)
- Transcribe a local file:
python summarize.py /path/to/your/localfile.mp4
- Transcribe a local file:
- Multiple files (local & remote)
- List of Files(can be URLs and local files mixed):
python summarize.py ./path/to/your/text_file.txt"
- List of Files(can be URLs and local files mixed):
Save time and use the config.txt
file, it allows you to set these settings and have them used when ran.
usage: summarize.py [-h] [-v] [-api API_NAME] [-key API_KEY] [-ns NUM_SPEAKERS] [-wm WHISPER_MODEL] [-off OFFSET] [-vad]
[-log {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [-ui] [-demo] [-prompt CUSTOM_PROMPT] [-overwrite] [-roll]
[-detail DETAIL_LEVEL]
[input_path]
Transcribe and summarize videos.
positional arguments:
input_path Path or URL of the video
options:
-h, --help show this help message and exit
-v, --video Download the video instead of just the audio
-api API_NAME, --api_name API_NAME
API name for summarization (optional)
-key API_KEY, --api_key API_KEY
API key for summarization (optional)
-ns NUM_SPEAKERS, --num_speakers NUM_SPEAKERS
Number of speakers (default: 2)
-wm WHISPER_MODEL, --whisper_model WHISPER_MODEL
Whisper model (default: small.en)
-off OFFSET, --offset OFFSET
Offset in seconds (default: 0)
-vad, --vad_filter Enable VAD filter
-log {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Log level (default: INFO)
-ui, --user_interface
Launch the Gradio user interface
-demo, --demo_mode Enable demo mode
-prompt CUSTOM_PROMPT, --custom_prompt CUSTOM_PROMPT
Pass in a custom prompt to be used in place of the existing one. (Probably should just modify the script itself...)
-overwrite, --overwrite
Overwrite existing files
-roll, --rolling_summarization
Enable rolling summarization
-detail DETAIL_LEVEL, --detail_level DETAIL_LEVEL
Mandatory if rolling summarization is enabled, defines the chunk size. Default is 0.01(lots of chunks) -> 1.00 (few
chunks) Currently only OpenAI works.
-Download Audio only from URL -> Transcribe audio:
>python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s
-Transcribe audio from a Youtube URL & Summarize it using (anthropic/cohere/openai/llama (llama.cpp)/ooba (oobabooga/text-gen-webui)/kobold (kobold.cpp)/tabby (Tabbyapi)) API:
>python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s -api <your choice of API>
- Make sure to put your API key into `config.txt` under the appropriate API variable
-Download Video with audio from URL -> Transcribe audio from Video:
>python summarize.py -v https://www.youtube.com/watch?v=4nd1CDZP21s
-Download Audio+Video from a list of videos in a text file (can be file paths or URLs) and have them all summarized:
>python summarize.py --video ./local/file_on_your/system --api_name <API_name>
-Transcribe & Summarize a List of Videos on your local filesytem with a text file:
>python summarize.py -v ./local/file_on_your/system
-Run it as a WebApp:
>python summarize.py -gui
By default videos, transcriptions and summaries are stored in a folder with the video's name under './Results', unless otherwise specified in the config file.
- Setting up Local LLM Runner
- Llama.cpp
- Linux & Mac
git clone https://github.com/ggerganov/llama.cpp
make
in thellama.cpp
folder./server -m ../path/to/model -c <context_size>
- Windows
git clone https://github.com/ggerganov/llama.cpp
- Download + Run: https://github.com/skeeto/w64devkit/releases
- cd to
llama.cpp
folder makein the
llama.cpp` folder server.exe -m ..\path\to\model -c <context_size>
- Linux & Mac
- Kobold.cpp - c/p'd from: https://github.com/LostRuins/koboldcpp/wiki
- Windows
- Download from here: https://github.com/LostRuins/koboldcpp/releases/latest
Double click KoboldCPP.exe and select model OR run "KoboldCPP.exe --help" in CMD prompt to get command line arguments for more control.
Generally you don't have to change much besides the Presets and GPU Layers. Run with CuBLAS or CLBlast for GPU acceleration.
Select your GGUF or GGML model you downloaded earlier, and connect to the displayed URL once it finishes loading.
- Linux
On Linux, we provide a koboldcpp-linux-x64 PyInstaller prebuilt binary on the releases page for modern systems. Simply download and run the binary.
- Alternatively, you can also install koboldcpp to the current directory by running the following terminal command:
curl -fLo koboldcpp https://github.com/LostRuins/koboldcpp/releases/latest/download/koboldcpp-linux-x64 && chmod +x koboldcpp
- When you can't use the precompiled binary directly, we provide an automated build script which uses conda to obtain all dependencies, and generates (from source) a ready-to-use a pyinstaller binary for linux users. Simply execute the build script with
./koboldcpp.sh dist
and run the generated binary.
- Windows
- oobabooga - text-generation-webui - https://github.com/oobabooga/text-generation-webui
- Clone or download the repository.
- Clone:
git clone https://github.com/oobabooga/text-generation-webui
- Download: https://github.com/oobabooga/text-generation-webui/releases/latest -> Download the
Soruce code (zip)
file -> Extract -> Continue below.
- Run the
start_linux.sh
,start_windows.bat
,start_macos.sh
, orstart_wsl.bat
script depending on your OS. - Select your GPU vendor when asked.
- Once the installation ends, browse to http://localhost:7860/?__theme=dark.
- Exvllama2
- Llama.cpp
- Setting up a Local LLM Model
- microsoft/Phi-3-mini-128k-instruct - 3.8B Model/7GB base, 4GB Q8 - https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
- Meta Llama3-8B - 8B Model/16GB base, 8.5GB Q8 - https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
- Workflow
- Setup python + packages
- Setup ffmpeg
- Run
python summarize.py <video_url>
orpython summarize.py <List_of_videos.txt>
- If you want summarization, add your API keys (if not using a local LLM) to the
config.txt
file, and then re-run the script, passing in the name of the API [or URL endpoint - to be added] to the script.
python summarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name anthropic
- This will attempt to download the video, then upload the resulting json file to the anthropic API endpoint, referring to values set in the config file (API key and model) to request summarization.
- Anthropic:
claude-3-opus-20240229
claude-3-sonnet-20240229
claude-3-haiku-20240307
- Cohere:
command-r
command-r-plus
- Groq
llama3-8b-8192
llama3-70b-8192
mixtral-8x7b-32768
- HuggingFace:
CohereForAI/c4ai-command-r-plus
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-8B-Instruct
- Supposedly you can use any model on there, but this is for reference for the free demo instance, in case you'd like to host your own.
- OpenAI:
gpt-4-turbo
gpt-4-turbo-preview
gpt-4
- What's in the repo?
summarize.py
- download, transcribe and summarize audio- First uses yt-dlp to download audio(optionally video) from supplied URL
- Next, it uses ffmpeg to convert the resulting
.m4a
file to.wav
- Then it uses faster_whisper to transcribe the
.wav
file to.txt
- After that, it uses pyannote to perform 'diarorization'
- Finally, it'll send the resulting txt to an LLM endpoint of your choice for summarization of the text.
chunker.py
- break text into parts and prepare each part for LLM summarizationroller-*.py
- rolling summarization- can-ai-code - interview executors to run LLM inference
compare.py
- prepare LLM outputs for webappcompare-app.py
- summary viewer webapp
- https://github.com/Dicklesworthstone/bulk_transcribe_youtube_videos_from_playlist/tree/main
- https://github.com/akashe/YoutubeSummarizer