Is the ddsp implementation different from the original tensorflow ddsp?

Question

Is the ddsp implementation different from the original tensorflow ddsp?

Closed this issue 2 years ago · 3 comments

980202006 commented 2 years ago

I tried to test ddsp of this project and ddsp of tensorflow with the same input, got different output, especially filtered_noise_synth.Below is my code.

    n_frames = 1000
    hop_size = 64
    n_samples = n_frames * hop_size
    sample_rate = 16000
    # Amplitude [batch, n_frames, 1].
    # Make amplitude linearly decay over time.
    amps = np.linspace(1.0, -3.0, n_frames)
    amps = amps[np.newaxis, :, np.newaxis]
    # Harmonic Distribution [batch, n_frames, n_harmonics].
    # Make harmonics decrease linearly with frequency.
    n_harmonics = 20
    harmonic_distribution = np.ones([n_frames, 1]) * np.linspace(1.0, -1.0, n_harmonics)[np.newaxis, :]
    harmonic_distribution = harmonic_distribution[np.newaxis, :, :]
    # Fundamental frequency in Hz [batch, n_frames, 1].
    f0_hz = 440.0 * np.ones([1, n_frames, 1])
    n_frames = 1000
    hop_size = 64
    n_samples = n_frames * hop_size
    sample_rate = 16000
    ee = Additive(n_samples=n_samples,sample_rate=sample_rate)
    amps = torch.FloatTensor(amps)
    harmonic_distribution = torch.tensor(harmonic_distribution)
    f0_hz = torch.tensor(f0_hz)
    audio = ee.forward(amps, harmonic_distribution, f0_hz)
    sf.write('tes_ddsp_torch_code2.wav', audio.cpu().detach().numpy()[0], 16000)
    n_frames = 250
    n_frequencies = 1000
    n_samples = 64000
    # Bandpass filters, [n_batch, n_frames, n_frequencies].
    magnitudes = [torch.sin(torch.linspace(0.0, w, n_frequencies)) for w in np.linspace(8.0, 80.0, n_frames)]
    magnitudes = 0.5 * torch.stack(magnitudes)**4.0
    magnitudes = magnitudes[None, :, :]
    filtered_noise_synth = FilteredNoise(n_samples=n_samples,
                                                    scale_fn=None)
    plt_fig(magnitudes.detach().numpy()[0])
    # Generate some audio.
    audio = filtered_noise_synth(magnitudes)
    sf.write('tes_ddsp_noise.wav', audio.numpy()[0], 16000)

Answer 1 · 2022-11-21T09:00:42.000Z

The picture above is the output of tensorflow ddsp, and the picture below is the output of realtimeDDSP.

Answer 2 · 2022-12-03T09:08:54.000Z

Sorry for the late reply, I think the scaling of the synthesis parameters is different from google's DDSP.
This isn't so much of a problem, but we might consider making it the same as google's for compatibility.

Answer 3 · 2022-12-04T11:24:27.000Z

Thank you for your reply.