neonbjb/DL-Art-School

The details of ctc code generation?

MlWoo opened this issue · 2 comments

MlWoo commented

The work is very impressive and thanks a lot.
I'm following your tts work. Some modules are introduced into the pipelines, but the configuration of pipelines are coupled.
Could you provide your training configuration and. public datasets of ctc code generation. The code is named as Wav2VecWrapper? Thank you again.

Hey, thanks for the kind words. I'm not really sure how to answer the question. What are you looking to train? A wav2vec2 model? While I configured DLAS to be able to do this, I wouldn't really recommend doing so. I'm assuming huggingface has a better way to do this type of training.

MlWoo commented

@neonbjb
sorry for my unclear words. I want to train a model to generate ctc code. But it should be compatable with Tortorise. Your dvae is 25hz, but that of the mainstreaming (like huggingface)is 50hz or more. So they are conflicted. Moreover, the tortoise use your self-trained bpe. The public ctc with wav2vec has its own bpe tokenizer, too.

class Wav2VecWrapper(nn.Module):
    """
    Basic wrapper class that makes Wav2Vec2 usable by DLAS.
    """
    def __init__(self,
                 vocab_size=148,
                 basis_model='facebook/wav2vec2-large',
                 freeze_transformer=False,
                 output_wer=True,
                 checkpointing_enabled=True,
                 provide_attention_mask=False,
                 spec_augment=True,
                 remove_feature_extractor=False,
                 ramp_dropout_mode=False,
                 ramp_dropout_end=20000,
                 ramp_dropout_min=.1,
                 ramp_dropout_max=.5,
                 layer_drop_pct=.1):

The code of your repo starts with the above. of course, wav2vec of huggingface or facebook is a good choice for basis_model. I want to figure out the other configuration of parameters of Wav2VecWrapper. could you provide them?