discordier/sam

Exported WAV files are low quality

maxpereira opened this issue · 4 comments

It appears that the exported wave files (using this.download function) are only 8-bit. They sound much worse than the spoken in-browser output. I messed around in the code for a bit modifying both the wave file header and the this.buf8 and this.buf32 functions and I couldn't find a way to make it sound better.

I have not used the WAV generating part of S.A.M. for ages. I have to check, maybe we broke something lately in the optimizations.

The resulting bit rate (being 8 bit) is correct though, as S.A.M. IS an 8-bit TTS originating from the C64 and therefore no bits have been wasted 😸

Interesting. I wonder what is causing the difference in sound. If the output being 8-bit is correct, what is the difference between the following two lines from README.md:

// Render the passed text as 8bit wave buffer array (Uint8Array).
const buf8 = sam.buf8('Hello world');

// Render the passed text as 32bit wave buffer array (Float32Array).
const buf32 = sam.buf32('Hello world');

Based on that sample code, my assumption was that in the following code from samjs.js we are "speaking" a high quality 32 bit wave file but downloading a lower quality 8 bit wave file. Thoughts?

    this.speak = function (text, phonetic) {
      return PlayBuffer(this$1.buf32(text, phonetic));
    };

    this.download = function (text, phonetic) {
      RenderBuffer(this$1.buf8(text, phonetic));
    };

Based on that sample code, my assumption was that in the following code from samjs.js we are "speaking" a high quality 32 bit wave file but downloading a lower quality 8 bit wave file. Thoughts?

Quite the opposite, we're always generating a low-quality 8-bit waveform, which for speaking is converted into 32 bits, keeping its original low quality.
See https://github.com/discordier/sam/blob/master/src/sam/sam.es6#L44

export const SamBuffer = (input, options) => {
  const buffer = SamProcess(input, options);
  if (false === buffer) {
    return false;
  }

  return UInt8ArrayToFloat32Array(buffer);
}

They sound much worse than the spoken in-browser output.

This is impossible, given the code above: the spoken output is produced from the 8-bit waveform, not the other way around.

Closing as no feedback came and it does not seem to be a bug. Could not hear any difference in quality myself (yet I'm not a trained audiophile, might miss something).