TLDR; Text extraction, transcription, punctuation restoration, translation, summarization and text to speech
The goal of this project is to extend the functionalities of Fabric. I'm particularly interested in building pipelines using utilities like yt
as a source and chaining them with the |
operator in CI.
However, a major limitation exists: all operations are constrained by the LLM context. For extracting information from books, lengthy documents, or long video transcripts, content may get truncated.
To address this, I started working on adding a summarization step before applying a fabric
template, based on the document length.
Additionally, I explored capabilities like transcripting, translating and listening to the pipeline result or saving it as an audio file for later consumption.
yt --transcript url | tp --cb | tts
tp --ebullets https://en.wikipedia.org/wiki/Text_processing
yt --transcript --lang en url | tp --cb --tr fr | tts
tp my_book.txt --eb | fabric --p extract_wisdom | tts --o my_book_wisdom.mp3
echo "Hello world!" | tp --tr zh | tts
tp doc_fr.txt --tr es > doc_es.txt
tp en.mp4 --tr fr
tp fr.mp3 --tr es | tts
tp es.mp3 --tr fr | tts --o fr.mp3 | tp fr.mp3 --tr en --o tr_en.txt
tp en.mp3 | fabric --p extract_ideas | tp --tr fr --o idées.txt
tp image.png
tp document.docx
tp
receives from stdin
or as first command line argument
It accepts:
- Text.
- File path. Supported formats are: .aiff, .bmp, .cs, .csv, .doc, .docx, .eml, .epub, .flac, .gif, .htm, .html, .jpeg, .jpg, .json, .log, .md, .mkv, .mobi, .mp3, .mp4, .msg, .odt, .ogg, .pdf, .png, .pptx, .ps, .psv, .py, .rtf, .sql, .tff, .tif, .tiff, .tsv, .txt, .wav, .xls, .xlsx
tp
accepts unformatted content, such as automatically generated YouTube transcripts. If the text lacks punctuation, it restores it before further processing, which is necessary for chunking and text-to-speech operations.
Converts audio and video files to text using Whisper.
The primary aim is to summarize books, large documents, or long video transcripts using an LLM with an 8K context size. Various summarization levels are available:
- Splits text into chunks.
- Summarizes all chunks as bullet points.
- Concatenates all bullet summaries.
The goal is to retain as much information as possible.
Executes as many extended bullet summary
phases as needed to end up with a bullet summary smaller than an LLM context size.
A simple summarization that does not rely on bullet points.
Translates the output text to the desired language.
Use two letters code such as en
or fr
.
usage: tp [-h] [--ebullets] [--cbullets] [--text] [--lang LANG] [--translate TRANSLATE] [--output_text_file_path OUTPUT_TEXT_FILE_PATH] [text_or_path]
tp (text processing) provides transcription, punctuation restoration, translation and summarization from stdin, text, url, or file path. Supported file formats are: .aiff, .bmp, .cs, .csv, .doc, .docx, .eml, .epub, .flac, .gif, .htm, .html, .jpeg, .jpg, .json, .log, .md, .mkv, .mobi, .mp3, .mp4, .msg, .odt, .ogg, .pdf, .png, .pptx, .ps, .psv, .py, .rtf, .sql, .tff, .tif, .tiff, .tsv, .txt, .wav, .xls, .xlsx
positional arguments:
text_or_path plain text; file path; file url
options:
-h, --help show this help message and exit
--ebullets, --eb Output an extended bullet summary
--cbullets, --cb Output a condensed bullet summary
--text, --t Output a textual summary
--lang LANG, --l LANG
Forced processing language. Disables the automatic detection.
--translate TRANSLATE, --tr TRANSLATE
Language to translate to
--output_text_file_path OUTPUT_TEXT_FILE_PATH, --o OUTPUT_TEXT_FILE_PATH
output text file path
Listen to the pipeline result or save it as an audio file to listen later.
tts
can also read text files, automatically detecting their language.
usage: tts.py [-h] [--output_file_path OUTPUT_FILE_PATH] [--lang LANG] [input_text_or_path]
tts (text to speech) reads text aloud or to mp3 file
positional arguments:
input_text_or_path Text to read or path of the text file to read.
options:
-h, --help show this help message and exit
--output_file_path OUTPUT_FILE_PATH, --o OUTPUT_FILE_PATH
Output file path. If none, read aloud.
--lang LANG, --l LANG
Forced language. Uses language detection if not provided.
GROQ_API_KEY=gsk_
LITE_LLM_URI='http://localhost:4000/'
SMALL_CONTEXT_MODEL_NAME="groq/llama3-8b-8192"
SMALL_CONTEXT_MAX_TOKENS=8192
-
Make script executable
chmod +x tts.py
-
Create symlink : Link the script to a directory that's in your PATH
sudo ln -s tts.py /usr/local/bin/tts