Exported WAV files are low quality
maxpereira opened this issue · 4 comments
It appears that the exported wave files (using this.download function) are only 8-bit. They sound much worse than the spoken in-browser output. I messed around in the code for a bit modifying both the wave file header and the this.buf8 and this.buf32 functions and I couldn't find a way to make it sound better.
I have not used the WAV generating part of S.A.M. for ages. I have to check, maybe we broke something lately in the optimizations.
The resulting bit rate (being 8 bit) is correct though, as S.A.M. IS an 8-bit TTS originating from the C64 and therefore no bits have been wasted 😸
Interesting. I wonder what is causing the difference in sound. If the output being 8-bit is correct, what is the difference between the following two lines from README.md:
// Render the passed text as 8bit wave buffer array (Uint8Array).
const buf8 = sam.buf8('Hello world');
// Render the passed text as 32bit wave buffer array (Float32Array).
const buf32 = sam.buf32('Hello world');
Based on that sample code, my assumption was that in the following code from samjs.js we are "speaking" a high quality 32 bit wave file but downloading a lower quality 8 bit wave file. Thoughts?
this.speak = function (text, phonetic) {
return PlayBuffer(this$1.buf32(text, phonetic));
};
this.download = function (text, phonetic) {
RenderBuffer(this$1.buf8(text, phonetic));
};
Based on that sample code, my assumption was that in the following code from samjs.js we are "speaking" a high quality 32 bit wave file but downloading a lower quality 8 bit wave file. Thoughts?
Quite the opposite, we're always generating a low-quality 8-bit waveform, which for speaking is converted into 32 bits, keeping its original low quality.
See https://github.com/discordier/sam/blob/master/src/sam/sam.es6#L44
export const SamBuffer = (input, options) => { const buffer = SamProcess(input, options); if (false === buffer) { return false; } return UInt8ArrayToFloat32Array(buffer); }
They sound much worse than the spoken in-browser output.
This is impossible, given the code above: the spoken output is produced from the 8-bit waveform, not the other way around.
Closing as no feedback came and it does not seem to be a bug. Could not hear any difference in quality myself (yet I'm not a trained audiophile, might miss something).