DrewThomasson/ebook2audiobook

Feature Requests!

Closed this issue Β· 26 comments

Obviously no expectation that you do these but here are some requests!

  1. Batch processing - Allow for loading in multiple books so that the app starts working on the next book immediately
  2. Do not reset the UI on refresh - Right now if I mistakenly reload the web UI, it resets to the default screen (even though the tts is still working in the terminal). This means I can't download the file once it is done from the web UI (have to find it in the Docker files lol)
  3. Ability to set a default output folder for the audiobooks (automatic downloading)

Thx!

  1. it's done on the next version
  2. todo indeed. I started to develop it but not happy from the result and retracted the modif. we should go deeper to find a good compromise.
  3. it's done on the next version, output folder where all conversion go will be audiobooks/cli, audiobooks/gui/[gradio|host]

Awesome! Another huge addition would be the ability to use the new F5-TTS model. It seems to have less hallucinations.

Good idea! πŸ‘

  • We do plan on integrating other tts services into ebook2audiobook.
  • But it's being put on the back-end until we figure out a standardized way of integrating many different tts engines into ebook2audiobook

Best of luck!

About request #2 though,

#38

  • Are you not able to click the download audiobooks button in the GUI to load download links to all the generated audiobook files in the output folder?

  • Cause that button should just show everything in the output folder? Irrespective of if the web gui session was re-loaded or not? πŸ€”

Example from hugging face GUI demo ⬇️

image

image

Okay that is awesome, I never thought about clicking that button πŸ˜…. Maybe those should load automatically or work like a drop down menu in the UI? Or maybe I should learn to read better lol

@scriptpony it's on the way also in the next version

Short answer: gradio is extremely restrictive in reloading anything in the gui.

  • Easiest is to just give the user a reload/update button at the moment lol

I'll look at renaming the button to make it more clear on what it does tho lol,

  • That would probs help

Perhaps renamed to

"View Completed Audiobooks" ? πŸ€”

  • Or "View Created Audiobooks?"

What do you think? Any suggestions?

or show converted files

Show converted files or audiobooks would make most sense to me!

Changed to "Show Converted Files"

for this git shard

905cb21

:)

After a lot of trial and error I was able to modify your code to get it working with F5-TTS!

I am running into issues getting it on Hugging Face (I am new to this) but here it is if you wanted to check it out:
https://drive.google.com/file/d/1tWWxoPX66JOoHcZVkf3idx9ZNpU28g9n/view?usp=sharing

πŸ‘€

If you make a fork of this repo and then apply your changes it'll be easier for us to be able too see your changes you made to it

In the meantime were hard at work on a version 2.0 where the code base is heavily modified πŸ˜…

I'll look into your code tho to see how it preforms!

Well probs look into asking you for help in the future when adding it smoothly to ebook2audiobook v2.0

I'll give you a heads up on the functions we would need you to make once we get our v2.0 stuff finalized πŸ˜…

In the meantime Here's me testing out your code as a hugginface space

https://huggingface.co/spaces/drewThomasson/ebook2audiobook_F5-TTS

I'll probs do some testing with it tomorrow to see if it works

Awesome! Glad it works on hf. Fair warning the code is hot garbage chatGPT slop but it works so πŸ€·β€β™‚οΈ. I just wanted to get something working asap

Super slow on CPU btw, actually idk if it even works on CPU I didn't test that

I works well on CPU, fore sure much slower but ok. the feature 2. is developed on v2.0.0 but only in case of crash on server side, not client. meaning that if the gradio client is losing the connection but the process is still running on server side so it's still not possible to reconnect gradio to the same process, and I'm not sure it's possible unless with a low level code access to the OS.

NEXT VERSION!

https://github.com/jondana/eBook_to_Audiobook_with_F5-TTS/tree/main

  • Batch convert eBooks
  • Pull the cover image correctly and place it in mp3 "icon" slot (I was having trouble with your .m4b files stopping randomly in my audiobook player?) mp3 seems more stable
  • Updated inferencing for more natural phrasing with better pauses
  • UI improvements

I spent way too much time trying to improve the progress bar to show wav stitching but didn't work out
I also tried to make an estimated time to completion feature but also didn't work out (I feel like Gradio should have that feature built in?? total steps/seconds to complete each step)

don't waste too much your effort on the actual version, the next version is full code rebase and integrate what you did in the v2 version will be a lot of complications. better you wait the v2 then you can push a PR.

Sadly AgreedπŸ˜…

Sounds good! I am definitely done for now. Excited to see what you guys come up with

Updated the hugging face demo of ur code with your new app.py though in the meantime

https://huggingface.co/spaces/drewThomasson/ebook2audiobook_F5-TTS

Is point 3 already implemented?

  1. Ability to set a default output folder for the audiobooks (automatic downloading)

@leet1994 yes, it's in ./lib/conf.py, change it to your own.

@quantumlump

  1. DONE
  2. DONE, but when a refresh is done, you have to restart the conversion, and it will resume automatically from the next good sentence.
  3. DONE check ./lib/conf.py "audiobooks_dir" variable.
    Be aware that the next git update conf.py will be modified so you will have to reset it again.