paper2audio

Convert research papers to audio files. Currently, it reads one paragraph at a time, while generating audio files in the cache dir for the entire file.

Important

The repo is hosted on CodeBerg — a true FOSS alternative to Github. The Github version is a mirror. Please contribute or open issues here when possible. Why giving up on Github?

Features

Layout analysis for noise removal (native PDF files only). Following elements are removed by default:
- Footnote
- Page-header
- Page-footer
- Table
- Formula
- Picture
Rule based noise reduction:
- References
- Citations
TTS with GCP (This requires that you have an active project and have enabled TTS feature, make sure you are aware of the cost of doing so)
Convert the paper to a beautiful yet minimal html file so you can use whatever TTS you might prefer

Usage

Convert a paper to audio files

python -m paper2audio to-audio "/Users/chenghao/Zotero/storage/QFKMKFMV/Chen et al. - 2024 - Orion-14B Open-source Multilingual Large Language Models.pdf"

Convert a paper to html file

python -m paper2audio to-html "/Users/chenghao/Zotero/storage/QFKMKFMV/Chen et al. - 2024 - Orion-14B Open-source Multilingual Large Language Models.pdf" --output "output.html"

Examples

Feel free to checkout the html output in examples. Here are some example preview links:

Orion-14B Open-source Multilingual Large Language Models HTML PDF

Acknowledgement

Thanks to pierreguillou for the layout model.

License

MIT

ChenghaoMou/paper2audio