YoungSeng/DiffuseStyleGesture

Regarding codebook

Closed this issue · 10 comments

Hey, this was a fantastic repo I found in my research from the last few weeks I am trying to understand some code things from your repo is it possible for you to solve my issue below written

  1. The codebook is missing will I get this thing after training the model, I have seen the code also but it was not written.
  2. can I use the same codebook that was present CODEBOOK
  3. After Getting BVH, is there anywhere to convert it into the human avatar image?

Waiting for the solution :)

Thanks
Sai

Dear Sai,

Sorry for the confusing codes, you should use sample.py rather than inference.py, I have deleted the main/mydiffusion_zeggs/inference.py. And, this work hasn't used the codebook.

Best wishes.

Hi YoungSeng, Thanks For your reply.

Taking your reply into consideration I started playing with the sample.py

  1. First it worked fine with file 015_Happy_4_x_1_0.wav named this format
  2. I tried with the normal name like `1.wav' the sample.py is throwing the below error
Traceback (most recent call last):
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 418, in <module>
    main(config, save_dir, config.model_path, audio_path=None, mfcc_path=None, audiowavlm_path=config.audiowavlm_path, max_len=config.max_len)
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 378, in main
    style = style2onehot[audiowavlm_path.split('/')[-1].split('_')[1]]
IndexError: list index out of range

do we have any particular format that needs to be given as the input file name, can you please help me with this.

Regarding Input Format

In what format do we need to send the input with the size and shape of the input file could please help with this also.

Thanks
Sai

Dear Sai,

The code is a hard demo, if you want to use your own audio, you can comment out

style = style2onehot[audiowavlm_path.split('/')[-1].split('_')[1]]

and uncomment any of the following lines

# style = [0, 0, 1, 0, 0, 0]
# style = style2onehot['Neutral']

to choose your own Style and Intensity as

style2onehot = {
'Happy':[1, 0, 0, 0, 0, 0],
'Sad':[0, 1, 0, 0, 0, 0],
'Neutral':[0, 0, 1, 0, 0, 0],
'Old':[0, 0, 0, 1, 0, 0],
'Angry':[0, 0, 0, 0, 1, 0],
'Relaxed':[0, 0, 0, 0, 0, 1],
}

Hope this will help you!

Hi YoungSeng, Thanks For You Time and Reply

I am facing a shape error, can you please mention the shape and size of the file need to be given as the input

Traceback (most recent call last):
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 420, in <module>
    main(config, save_dir, config.model_path, audio_path=None, mfcc_path=None, audiowavlm_path=config.audiowavlm_path, max_len=config.max_len)
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 384, in the main
    inference(args, wavlm_model, mfcc, sample_fn, model, n_frames=max_len, smoothing=True, SG_filter=True, minibatch=True, skip_timesteps=0, style=style, seed=123456)      # style2onehot['Happy']
  File "/content/drive/MyDrive/DiffuseStyleGesture/main/mydiffusion_zeggs/sample.py", line 233, in inference
    audio_reshape = torch.from_numpy(audio).to(torch.float32).reshape(num_subdivision, int(stride_poses * 16000 / 20)).to(mydevice).transpose(0, 1)       # mfcc[:, :-2]
RuntimeError: shape '[4, 64000]' is invalid for input of size 237867

Looking forward :)

Model file :- './model000450000.pt'

Thanks
Sai

it seems to be the problems of the shape of audio, do you set a max_len that more than the length of real audio? You may try to set max_len equal to 0. If you still have this problem, please upload the audio file. I will check it.

Hi YoungSeng, Thanks for your Time

  1. I have run the code, BVH file got generated in "./sample_dir", is there any way to convert it into mkv
  2. I am looking to convert directly "bvh" to "some persons image mp4" rendered video can I know if it is possible or can I know the process for it. I will work on it.

Thanks
Sai

Hey Sai,

In practice, I highly recommend using Blender visualization bvh. Similar software are maya, motionbuilder, I have tried them and found Blender more friendly. You can easily perform importing audio, rendering video, or even writing a script like Trimodal.

You can also get a video of the skeleton in Python. Please ref to this issue.

There are some repositories for visualization and you can also try, such as PyMO, npybvh, and Python_BVH_viewer, although I don't really recommend them.

Good luck!

Hi YounSeng,

I have tried a lot but I am not getting how to convert this BVH File to 3D Video With Audio, I need little help. is there any repo or any models or code to like what I needed

Thanks
Sai

I recommend you the method I use:

  • Download blender, it is free! And install it.
  • Import .bvh file and you can play it:
  • For render, setting some parameters:
  • Then render:
  • To add audio:

I also encountered this problem, my audio is about 2 seconds, I set max_lenth=0, but still get this error:
Traceback (most recent call last):
File "sample.py", line 442, in
main(config, save_dir, config.model_path, audio_path=None, mfcc_path=None, audiowavlm_path=config.audiowavlm_path, max_len=config.max_len)
File "sample.py", line 406, in main
inference(args, wavlm_model, mfcc, sample_fn, model, n_frames=max_len, smoothing=True, SG_filter=True, minibatch=True, skip_timesteps=0, style=style, seed=123456) # style2onehot['Happy']
File "sample.py", line 237, in inference
audio_reshape = torch.from_numpy(audio).to(torch.float32).reshape(num_subdivision, int(stride_poses * 16000 / 20)).to(mydevice).transpose(0, 1) # mfcc[:, :-2]
RuntimeError: shape '[4, 64000]' is invalid for input of size 36480