This document describes the playscribe
automation and manual services that is provided for DevSecOps Engineer and User.
The audience for this document includes:
-
User who will send a command with YouTube URL via a Telegram Bot, and receive a text with summarization of its transcript.
-
DevSecOps Engineer who will design system workflows, configure and manage any SaaS or selfhosted services, and plan for disaster recovery.
-
This summarization-as-a-service leverages on and applies a similar design from the SudokuCliBot automation to its workflow.
-
This service uses resources in both a free tier and a pay-per-usage tier by cloud SaaS providers:
Free | Pay-per-usage |
---|---|
Pipedream | ChatGPT |
GitHub | |
Telegram |
-
This service is inexpensive, easy to implement, and can be duplicated for other services.
-
This service does not require implementing a custom frontend as the User will interact with a Telegram Bot.
-
This service does not require a backend server as the processing will be performed using ephemeral resources.
This project uses several methods and products to optimize your workflow.
- Uses a version control system (GitHub) to track your changes and collaborate with others.
- Uses a diagram as code tool (Mermaid) to draw any system design or diagram.
- Uses a Python LLM-enabled CLI (fabric) to automate summarization of a transcript.
- Uses a cloud LLM (ChatGPT) to provide an LLM API function.
- Uses a build tool (Makefile) to automate your build tasks.
- Uses a containerization platform (Docker) to run your application in any environment.
- Uses a continuous delivery pipeline (GitHub Actions) to download and summarize a YouTube transcript.
- Uses a continuous integration pipeline (GitHub Actions) to automate your Docker image build and deployment.
- Uses an artifactory (Docker Hub) to store and pull your image.
- Uses a cloud workflow automation and integration tool (Pipedream) to integrate the different services.
- Uses a cloud message application (Telegram) as a chat messaging service.
- Uses a cloud messaging bot (Telegram Bot) for a summarization as a service.
This project has several limitations:
- The Telegram Bot has a content limit for messages sent to the User.
- The pipeline job may exceed the time of the fixed delayed step in the workflow.
- The message is truncated after
40
lines and will include a link to the fileexamples/result.txt
. However, the content of this file may be overridden by another pipeline. - The workflow system does not allow conditional branches.
- A manual approval is required by at least one specified GitHub user in the CD pipeline to deploy a Docker image to Docker Hub.
- A verification step that checks the
username
with a list of allowed Telegram users in the Pipedream workflow.
This is a high level overview of the system design and resources:
-
The User sends a command with YouTube URL, i.e.
/<command> <URL>
, to a Telegram Bot that will trigger an automation workflow. -
The workflow updates the file content of this GitHub project, which will trigger a CI pipeline.
-
The CI pipeline runs the application and saves the summary in a file
examples/result.txt
. -
After a delayed step, the workflow automatically reads, and truncates the file content of the summary.
-
The workflow then replies the User with a message containing the truncated summary and a link.
%%{ init: { 'theme': 'dark' } }%%
flowchart TD
user01["Sends a command with YouTube URL"]
user02["Receives a message with truncated summary and link"]
subgraph User
start("Start Telegram Bot")-->user01-->user02-->finish("End Telegram Bot")
end
flow_step01["Telegram Bot Command Received"]
flow_step03["GitHub Update File Content"]
flow_step05["GitHub Get File Content"]
flow_step07["Telegram Bot Reply Message"]
subgraph Pipedream Workflow
user01-->flow_step01
flow_step01-->step02["Verification"]-->flow_step03-->step04("Workflow Delay")-->flow_step05-->step06("Python Truncate Summary")-->flow_step07
flow_step07-->user02
end
subgraph GitHub CI Actions
flow_step03-->job01["Setup Python on Ubuntu"]-->job02["Install Python dependencies"]-->job03["Pull custom Docker image"]-->job04["Run application and save summary"]-->flow_step05
end
subgraph GitHub CD Actions
cd01["Build Docker image"]-->cd02["Publish Docker image"]-->cd03["Manual Approval"]-->cd04["Validate Docker image"]
end
Category | Activity | User | DevSecOps |
---|---|---|---|
Installation & Configuration | Add a Telegram user to the verification step | R,A | |
Execution | Download an autogenerated subtitle file from YouTube | R,A | |
Execution | Convert a subtitle file to text | R,A | |
Maintenance & Updates | Schedule a pipeline to build Docker image | R,A |
-
Open a browser and navigate to your Pipedream console.
-
Select your project
Telegram-Bot
> Resources > Select your workflow. -
Edit the code for
Verification
> Add ausername
to the list of allowed Telegram users. -
Click Deploy to update any changes your workflow.
- Open a terminal and run the command
yt-dlp --version
to check if it is installed.
2023.10.1
- Type the following command to download the autogenerated subtitle of a YouTube video. Replace the YOUTUBE_URL with any YouTube link, e.g.
https://www.youtube.com/watch?v=x3vnCKivCjs
.
yt-dlp --write-auto-sub --skip-download [YOUTUBE_URL]
You should see an output similar to below.
[youtube] Extracting URL: https://www.youtube.com/watch?v=x3vnCKivCjs
[youtube] x3vnCKivCjs: Downloading webpage
[youtube] x3vnCKivCjs: Downloading ios player API JSON
[youtube] x3vnCKivCjs: Downloading android player API JSON
[youtube] x3vnCKivCjs: Downloading m3u8 information
[info] x3vnCKivCjs: Downloading subtitles: en
[info] x3vnCKivCjs: Downloading 1 format(s): 22
[info] Writing video subtitles to: The Fastest Way to Lose Belly Fat [x3vnCKivCjs].en.vtt
[download] Destination: The Fastest Way to Lose Belly Fat [x3vnCKivCjs].en.vtt
[download] 100% of 85.84KiB in 00:00:00 at 879.88KiB/s
The subtitle file name is created using the video title and code, e.g. The Fastest Way to Lose Belly Fat [x3vnCKivCjs].en.vtt
.
If you open the vtt
file, you'll notice that it contains metatags that makes it hard to read. You'll have to convert this vtt
file to a text file using a Python script.
Note: The Python script below has been deprecated in favour of a JSON method using
jq
.
Fortunately, there is a Python script that converts youtube subtitle file (vtt
) to plain text. Credit to glasslion for making it open-source.
- Open a terminal and run the command
jq --version && yt-dlp --version
to check if both apps are installed.
jq-1.6
2023.10.1
- In your terminal, type the following command. Replace the YOUTUBE_URL with any YouTube link, e.g.
https://www.youtube.com/watch?v=x3vnCKivCjs
.
yt-dlp --skip-download --write-auto-sub --quiet --sub-format json3 [YOUTUBE_URL]
- Type the following command to convert the subtitle file to text. Replace the JSON_FILE with your subtitle file, e.g.
The\ Fastest\ Way\ to\ Lose\ Belly\ Fat\ \[x3vnCKivCjs\].en.json3
.
jq -r '.events[]|select(.segs and.segs[0].utf8!="\n")|[.segs[].utf8]|join("")' [JSON_FILE] \
|paste -sd\ -|fold -sw60
- If successful, you should see an output similar to below.
today I'm going to share with you the absolute fastest way
to lose your belly now you could have the best willpower
the best discipline really want it really bad and never
really see any results because you're missing the technique
you're missing the strategy I'm the perfect example I took
guitar lessons for six years right as a teenager and I
never really progressed or never really went anywhere
because the techniques that were taught to me were just not
that great great the same thing happened with tennis in
college I was never taught the right technique and so I
...
- Edit the
.github/workflows/dockerhub.yml
file.
on:
push:
branches: [ main ]
paths: [ Dockerfile, .github/workflows/dockerhub.yml ]
+ schedule: [ cron: '1 0 * * 5' ] # At 01:00 on Friday
-
Run the
git
add, commit and push to update the file in GitHub repository. -
If the CD pipeline fails, you may receive a notification based on your settings.
The following resources were used as a single-use reference.
Title | Author |
---|---|
Look Mum, No Servers! Create a Telegram Bot to communicate with GitHub Actions | Dennis Lee |
How to extract closed caption transcript from YouTube video? | StackOverflow |