Inference

Question

Inference

codeghees opened this issue 3 years ago · 15 comments

Hi! Great work with this. Was able to reproduce your results I think.
@qute012
Two questions - what is the best way to run inference on the trained model? Any sample you have?
Secondly, I was getting an error on fine-tuning a model trained on Google speech commands to my Urdu dataset.
cfg = convert_namespace_to_omegaconf(state_dict['args'])
Error was a key error 'args' not found. What am I doing wrong?
Was passing the .pth model. I checked the model was being loaded.

Any help would be appreciated.

Answer 1 · 2021-07-07T08:30:44.000Z

The test accuracy for 10 samples for each keyword is over 94 percent. Sounds too good to be true.

Answer 2 · 2021-07-08T11:23:41.000Z

Hello, @codeghees. Could you please provide me with a requirement.txt file or a conda environment.yml file for the environment you used while reproducing the results? I tried to reproduce the results on the google speech v2 dataset and was faced with the same errors.

Answer 3 · 2021-07-08T11:54:08.000Z

Hi~ @codeghees @BeardyMan37

Thank for concerning this project. Truly, i can't afford to maintain this project and can't access server now also ; (
If i have time, i would prefer to develop this project for inferencing. But you guys can reproduce this project referring hyperparameters and model architecture.

Sorry 😐

Answer 4 · 2021-07-08T12:00:11.000Z

can you point me to a direction for inference?

Answer 5 · 2021-07-08T12:00:36.000Z

I can build it myself.

@BeardyMan37 I used Google Colab.

Answer 6 · 2021-07-08T12:23:00.000Z

@codeghees

extract loudest section
Most important for accuracy, because this model can get only 1 sec raw audio file. So you have to check out extracted signal contains voice actually.

def extract_loudest_section(self, wav, win_len=30):
        wav_len = len(wav)
        temp = abs(wav)

        st,et = 0,0
        max_dec = 0

        for ws in range(0, wav_len, win_len):
            cur_dec = temp[ws:ws+16000].sum()
            if cur_dec >= max_dec:
                max_dec = cur_dec
                st,et = ws, ws+16000
            if ws+16000 > wav_len:
                break

        return wav[st:et]

post process (in fairseq)
You don't need to normalize raw audio. And i think it works nothing, i just add it for Wav2Vec 2.0 pipeline. I'm not sure, but it doesn't matter to remove this function.

 def postprocess(self, feats, curr_sample_rate):
        if feats.dim() == 2:
            feats = feats.mean(-1)

        if curr_sample_rate != self.sample_rate:
            raise Exception(f"sample rate: {curr_sample_rate}, need {self.sample_rate}")

        assert feats.dim() == 1, feats.dim()

        if self.normalize:
            with torch.no_grad():
                feats = F.layer_norm(feats, feats.shape)
        return feats

make single batch to feed to model.
predict class from argmax of model output

Answer 7 · 2021-07-08T12:28:34.000Z

Also how do we get which index represents which class i.e 0 for "UP" - is that positioning of the item in the index array?

Answer 8 · 2021-07-08T12:29:38.000Z

@codeghees

Yes, right! like simple classification other method :D

Answer 9 · 2021-07-08T12:30:21.000Z

Oh I meant - how do we know the mapping. Does that come from the CLASSES array?

Thanks!

Answer 10 · 2021-07-08T12:30:40.000Z

Yes. If you can produce training environment, can you PR for others?

Answer 11 · 2021-07-08T12:31:43.000Z

I will go back and check - I just opened colab and followed the instructions. - What is the exact error @BeardyMan37?

Answer 12 · 2021-07-08T12:48:04.000Z

Managed to resolve it. @codeghees

Answer 13 · 2021-07-08T12:55:30.000Z

@qute012 attaching both the requirement.txt and the environment.yml file for your reference.

Answer 14 · 2022-03-02T09:03:12.000Z

Hi! Great work with this. Was able to reproduce your results I think. @qute012 Two questions - what is the best way to run inference on the trained model? Any sample you have? Secondly, I was getting an error on fine-tuning a model trained on Google speech commands to my Urdu dataset. cfg = convert_namespace_to_omegaconf(state_dict['args']) Error was a key error 'args' not found. What am I doing wrong? Was passing the .pth model. I checked the model was being loaded.

Any help would be appreciated.

hello @codeghees . I encountered the same error while trying to finetune a huggingface wav2vec model with fairseq. Have you found out a method to convert a huggingface model(.bin) to fairseq checkpoint(.pt)?

Answer 15 · 2023-04-16T18:07:07.000Z

@codeghees can you please guide me or give me the link of your colab file? I want to reproduce this result and apply same strategy on Urdu Language.