athena-team/athena

The delta pitch feature seems to be strange

cantabile-kwok opened this issue · 4 comments

Hi and thanks for the fascinating work! I am using Athena to extract MFCC and pitch without the use of Kaldi. It went smoothly, but when I inspect the value of delta pitch feature (the last dimension of the default 3-dim pitch feature), I got confused. Here is what I found:

  1. Although the pitch-feature and the warped NCCF (POV feature) values seems a little bit different from those extracted by Kaldi, the trajectory is similar and the difference is not that huge. This is OK, but the delta-pitch feature is very different from Kaldi output, in default settings. I did not find any difference in Athena default pitch settings and Kaldi. The delta pitch extracted by Athena looks like a noise sequence.
  2. Hence I lowered down the standard deviation of noise added to delta pitch even to 0. Then Athena outputs a sequence of almost all zeros. Therefore I believe there is something wrong with the source code.

Below is some results.

This plot is the 4-dim pitch extracted by Kaldi.
download

This plot is the 3-dim pitch extracted by Athena (in default settings).
Pitch

After setting the delta_pitch_noise_stddev to 0, I get the result below.
Pitch

Hi,
I tried extracting the pitch feature using the athena-transform example audio: 'examples/sm1_cln.wav'. there are some configurations as follows:
'window_length': 0.025,
'soft_min_f0': 10.0,
'delta_pitch_noise_stddev':0,
and the output looks like usual:
Transform: [[3.8811225e-02 3.0000305e-01 3.5762787e-07]
[6.7564729e-03 3.0000973e-01 3.5762787e-07]
[2.4553644e-02 3.0001450e-01 3.5762787e-07]
[2.4535857e-02 3.0002213e-01 3.5762787e-07]
[3.4553111e-02 3.0003071e-01 3.5762787e-07]
[4.2932931e-02 3.0004215e-01 3.5762787e-07]]

@JianweiSun007 Thanks for experimenting this. I tried this as well, and the result is close to yours.

But there's still a problem in this result. As we can see, the last dimension of pitch output is very close to 0 (with 1e-7 order). As this wav obviously has pitch variations, the delta pitch feature should not be such a small value all the time. And this result is much different to Kaldi, in terms of delta pitch.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale commented

This issue is closed. You can also re-open it if needed.