PodfAI

This APP allows the user to create content in the style of podcasts based on the provided files. Some examples would be providing a paper, lecture, project description, personal resume, or many others.

I also wrote a blog post to talk about this project, make sure to check "How to use generative AI to create podcast-style content from any input".

How it works

How to use

Provide one or more files.
Optionally, customize the voices of the guest and host, you can check out the voice samples here.
Click on "Generate podcast" and wait a few moments.
Play the audio, and feel free to follow along the textual transcript.

Examples

Podcast generate from my other project "AI beats"

podcast-ai_beast.mp4

Podcast generate from my other project "AI trailer"

podcast-ai_trailer.mp4

Podcast generate from Andrew Huberman's "Optimal Morning Routine" description

podcast-andrew_hubermans.mp4

Podcast generate by a personal resume

podcast-resume.mp4

Local usage

Setup

Clone the GitHub repository

https://github.com/dimitreOliveira/PodfAI.git
cd PodfAI

Create a new venv

python -m venv .venvs/podfai

Activate the venv

source .venvs/podfai/bin/activate

Install the requirements

make build

Alternatively, you can also run using pip

pip install -r requirements

Set up the Google API dependencies

Follow this guide or this other one.

Running the APP

To start the APP, run the Make command below

make app

Alternatively, you can also run using plain Python

streamlit run src/app.py

Configs

Feel free to change the default configs to change the APP behavior or adjust to your needs.

vertex:
  project: {VERTEX_AI_PROJECT}
  location: {VERTEX_AI_LOCATION}
transcript:
  model_id: gemini-1.5-pro-002
  transcript_len: 5000
  max_output_tokens: 8192
  temperature: 1
  top_p: 0.95
  top_k: 32

vertex
- project: Project name used by Vertex AI.
- location: Project location used by Vertex AI.
transcript
- model_id: Model used to create the podcast transcript.
- transcript_len: Suggested transcript length.
- max_output_tokens: Maximum number of tokens generated by the model.
- temperature: Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that expect a true or correct response, while higher temperatures can lead to more diverse or unexpected results. With a temperature of 0 the highest probability token is always selected
- top_p: Top-p changes how the model selects tokens for output. Tokens are selected from most probable to least until the sum of their probabilities equals the top-p value. For example, if tokens A, B, and C have a probability of .3, .2, and .1 and the top-p value is .5, then the model will select either A or B as the next token (using temperature)
- top_k: Top-k changes how the model selects tokens for output. A top-k of 1 means the selected token is the most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature)

TODO

Support voice cloning
Support other languages
Support other input types (images, videos, YouTube URLs)
Add example notebook to run in Colab
Reproduce workflow with open-source models
Experiment with agentic workflows to improve the podcast transcript

References

Contributing

If you are interested in contributing to this project, thanks a lot! Before creating your PR, make sure to lint your code, running the command below:

make lint

Acknowledgments

Google Cloud credits are provided for this project. This project was possible thanks to the support of Google's ML Developer Programs team.
This project was based on Google's NotebookLM, which, aside from the podcast-style content, has many other features, make sure to check it out.