[Enhancement/Feature Request] Ability to Pitch-Shift A Song (Both at User and System Levels)

Question

[Enhancement/Feature Request] Ability to Pitch-Shift A Song (Both at User and System Levels)

ctlw83 opened this issue 3 years ago · 5 comments

Older Karaoke tracks from DK and other Karaoke providers weren't produced at the original pitch of the song. Singers also sometimes want to pitch-shift songs to fit their own vocal ranges.

Overall Request

Give the Admin the ability to set a pitch-shift for a song that automatically carries over to any user who selects the song, in order to ensure that the song is played in its original pitch if it was produced otherwise by the karaoke track company.
- Create a database column or table to track the system level pitch settings, if any, for each track
- If no pitch change is set, the system should presume there is no pitch-shift required and play the song at the original pitch
- pitch shift should not change the playback speed of the song
Give users the ability to set their own preference pitch-shift per song
- Create a database column or table associated with the user account to track and remember their preference pitch shift
- User pitch shift settings take precedence over any system-level pitch shift
- If no pitch shift is set, the system should use either the system default pitch-shift if set, or play the song normally
- Pitch shift should not change the playback speed of the song

Answer 1 · 2022-04-15T04:48:24.000Z

Definitely want this feature! I like the global + user pitch offset as well.

The good news is that we're already using WebAudio for gain-related things. I haven't seen a WebAudio implementation of pitch shifting while maintaining tempo in a musical (low artifact) way, but it's been a while since last researching this.

Answer 2 · 2022-04-21T07:19:47.000Z

I am not sure whether this is possible but in our PiKaraoke implementation (https://github.com/xuancong84/pikaraoke ), it is done by invoking VLC media player by passing --drawable-xid (or --drawable-hwnd for Windows) which is the instance handle of the target window. In other words, VLC player can draw into another application's window. You can try getting and passing browser window's handle to VLC. But if you don't need video, then that should be simpler and straight-forward.

The rest is simple, just relaunch VLC player (with the new pitch offset) and start playing at the previous seek offset.

Answer 3 · 2022-04-21T23:13:54.000Z

That's pretty neat @xuancong84, thanks for sharing! In our case the player is 100% in-browser, but I did find a WebAudio implementation that I think has potential: https://github.com/olvb/phaze

Looks like it'll need some work to properly step up/down in semitones. Anyone want to have a go at it? :)

Answer 4 · 2023-01-03T20:27:22.000Z

When I was developing a Karaoke Player for Windows 8, I had implemented a pitch shifting algorithm (unlrelease). The method I used was to create a Windows Media Foundation Transform driver that was injected into the stream graph when the app loaded. Clearly that is impossible for iPhones or other devices that don't have Windows MFT support, and possibly also impossible for a browser, even in Windows browsers.

The mechanism is relatively easy to understand. The MP3 data is an audio stream that has gone through a Fast-Fourier Transform (FFT) and saved in the frequency domain and then compressed. When the MP3 file is played, it is first decompressed (which is important because some MP3 files are compressed with a variable bit rate) into a constant rate byte stream. You end up with a byte stream of pairs of values for each channel containing a volume and frequency value. That byte stream is then sent to a PCM encoding filter graph. The PCM encoding uses an FFT to convert the frequency domain byte streams back into time domain and that does to the hardware (digital to analog converters and amplifier circuitry, etc.)

Some hardware chipsets handle MP3 data directly and the software PCM encoder step isn't used. This poses a problem. You have to force the stream to be decoded in software so that you can modify the byte stream while it's still in the frequency domain. If you can do that, then it's simple, you just offset up or down the frequency values mid-stream and send it on to the PCM encoder or the hardware (if it supports direct MP3 streams).

It must be possible because Youtube and other web sites are able to adjust the playback speed of videos without changing the pitch. So they have to be decoding the compressed MPEG audio data streams into a constant byte rate and simply setting a different byte rate to the downstream filters, which leaves the pitch unaffected since they modified the stream before it converts to PCM.

That all being said, I don't have any experience in JS programming and have no idea how to even begin to implement that kind of thing in a browser. But if someone can find a way to get a byte stream of uncompressed frequency domain MPEG audio data, in JS, then the pitch shifting is pretty simple.

Answer 5 · 2023-01-04T19:39:10.000Z

Good info! Take a look at the link in my previous post, as that'll work for any (web) audio stream, it's probably the way forward. Might be fun to hack on :)