jeff-hughes/shellcaster

Option to convert downloaded episode into another format

Opened this issue · 5 comments

First of all, thank you for the application!

I've been thinking of whether it would be possible to add the ability to automatically convert a podcast episode after it is downloaded to another format using an external program such as ffmpeg.

I've been listening to a podcast whose episodes are in MP3 having a constant bitrate of 192 kb/s. After I downloaded one of them (which is 89 minutes long) with shellcaster, its file size turned out to be ~123 MB. I converted it to Ogg Opus with the command ffmpeg -i "ep02.mp3" -ab 50k "ep02.opus", and the size of the new file was ~32 MB while having a variable bitrate of 50 kb/s. Since the podcast is mostly made up of speech, there's hardly any noticeable degradation in quality (if any), and yet the file size is much smaller, which makes a big difference when a lot of episodes are stored locally.

While I could convert them manually, of course, being able to initiate playing or delete the converted files from shellcaster would be more convenient.

Thanks for the suggestion! In principle, I like the idea. But I do foresee a few practical hurdles:

  1. The application stores the filepath in the database, so any conversion process would either need to keep the same filename, or else communicate it back to shellcaster to update the database. This isn't a dealbreaker, but it does generally rule out a general-purpose "put whatever compression command you want to use here and shellcaster will run it when files are downloaded".
  2. There's a lot of variability when it comes to podcasts. Sure, most podcasts are speech, but many have music for theme songs, background music, etc. Plus there are literally podcasts about music. In addition, while the vast majority of podcasts are .mp3 files, there are also "vodcasts" that involve video files. So when choosing a compression format, it would be important not to optimize for speech in a way that might degrade other forms of audio. And at a minimum it would need to skip over video files (or else have a separate compression method for them.
  3. It's easy enough to have a general flag in the config file to choose whether to use compression or not at a global level. But given the above about speech/music/video, I can immediately foresee that the next Github issue that someone is going to open here is "can we turn compression on and off for individual podcasts?" And that's just...a whole lot of extra work, and I'd have to think through how to do that in a reasonable way.

Those are the issues I can foresee with this, but none of those are absolute dealbreakers. If there's a reasonable format that works well for both speech and music, I'd be open to implementing it. I would just need to play around with audio formats/bitrate/compression with a variety of audio to test what would work well across a wide range. Audio engineering is not my specialty, so if you have any thoughts on that, I'd be happy to hear them.

I would just need to play around with audio formats/bitrate/compression with a variety of audio to test what would work well across a wide range. Audio engineering is not my specialty, so if you have any thoughts on that, I'd be happy to hear them.

I wouldn't call myself an expert in this field either, but I've experimented with converting plenty of lossless CD audio music to various formats, including Ogg Vorbis and Opus.

For finding the right bitrate for Opus, I started with reading the settings recommended for stereo music here. In my experience, having a dedicated soundcard (a SoundBlaster Live 24-bit from 2003), a Yamaha AV receiver and a custom-made speaker set, I had to increase the bitrate to as much as ~180-208 kbps depending on the music style and particular songs before I couldn't notice any difference at all. I temporarily switched to an onboard soundcard (because my PC was upgraded and my old PCI soundcard was no longer compatible), and the quality was just as if I had been listening to MP3. I switched to HDMI, and the quality difference was just like with the SB Live! soundcard.

Vorbis is another decent (and also free) format, which has a better bitrate/quality ratio compared to MP3, but it's still not as good as that of Opus. (I think Opus was actually meant to be a successor to Vorbis.) I would only use this format if the software in question doesn't support Opus for some reason (which should be very rare in 2022).

To sum up, I think Opus has the best bitrate/quality ratio and is very usable for storing both speech and music. Finding the right bitrate depends a lot on your current hardware setup and your personal preferences.

PS.: If the podcast has extensive music in it (and it's also the main topic of the podcast), then I probably would skip conversion to preserve the quality as it was.

What if a callback is added, that similar to playing can run a command with certain interpolation. For example:

on_download: convert %path

That way people can specify a script or command that can convert downloaded episodes.

@a-kenji, that was an option I had considered, but as I mentioned above, the issue is that the filepath needs to be stored in the database. The "play" action just needs to pass the filename on to the command that's stored in the config. But if you're wanting to convert from an MP3 file to, say, an OPUS file, you'd end up either (a) having a bunch of OPUS files that still have the ".mp3" extension, or (b) renaming the file and then shellcaster no longer knowing where the correct file is. I don't think either of those are ideal scenarios.

It might be possible to set up the config like so:

on_download: ffmpeg -i %path.mp3 <other options> %path.opus
download_output: %path.opus

to let shellcaster know what the final path is, but that seems not particularly elegant, and also doesn't provide any flexibility for dealing with different types of files (e.g., as I mentioned above, video files, non-mp3 audio, etc.).

Oh yes, that makes sense. I somehow missed that in your original reply.
Thanks for elaborating!