Convert research papers to audio files. Currently, it reads one paragraph at a time, while generating audio files in the cache dir for the entire file.
Important
The repo is hosted on CodeBerg — a true FOSS alternative to Github. The Github version is a mirror. Please contribute or open issues here when possible. Why giving up on Github?
- Layout analysis for noise removal (native PDF files only). Following elements are removed by default:
- Footnote
- Page-header
- Page-footer
- Table
- Formula
- Picture
- Rule based noise reduction:
- References
- Citations
- TTS with GCP (This requires that you have an active project and have enabled TTS feature, make sure you are aware of the cost of doing so)
- Convert the paper to a beautiful yet minimal html file so you can use whatever TTS you might prefer
python -m paper2audio to-audio "/Users/chenghao/Zotero/storage/QFKMKFMV/Chen et al. - 2024 - Orion-14B Open-source Multilingual Large Language Models.pdf"
python -m paper2audio to-html "/Users/chenghao/Zotero/storage/QFKMKFMV/Chen et al. - 2024 - Orion-14B Open-source Multilingual Large Language Models.pdf" --output "output.html"
Feel free to checkout the html output in examples
. Here are some example preview links:
Thanks to pierreguillou for the layout model.
MIT