
Preprocessing Stages:

  • Step 1 :

    • In the data folder, There are 3 Sub folders - binary , processed & raw.
      • Inside of the raw folder, LJSpeech-1.1 folder need to contains the training datasets as:
        wavs folder(audio wave files) & metadata.csv
  • Step 2 :

  • Step 3 :

    • Then need to create metadata_phone.csv & dict.txt in the data/processed/ljspeech folder metadata_phone.csv creation code:
      ############################## metadata_phone.csv -> creation from metadata.csv #####################
      import pandas as pd
      # load the original data
      df = pd.read_csv('data/raw/LJSpeech-1.1/metadata.csv', sep='|')
      # define the base path for the audio files
      base_path = 'data/raw/LJSpeech-1.1/wavs/'
      # transform the data
      df['item_name'] = df['ID']
      df['spk'] = 'SPK1'  # or use appropriate speaker information if available
      df['txt'] = df['grapheme']
      df['txt_raw'] = df['grapheme']
      df['ph'] = '<BOS> ' + df['phoneme'] + ' <EOS>'
      df['wav_fn'] = base_path + df['ID'] + '.wav'
      # select the required columns and save to new csv file
      df = df[['item_name', 'spk', 'txt', 'txt_raw', 'ph', 'wav_fn']]
      df.to_csv('metadata_phone.csv', index=False)
      dict.txt creation code:
      import csv
      def get_unique_chars(csv_file, output_file):
          unique_chars = set()
          with open(csv_file, 'r', encoding='utf-8') as f:
              reader = csv.DictReader(f)
              for row in reader:
                  ph = row['ph'].split()
          with open(output_file, 'w', encoding='utf-8') as f:
              for char in sorted(unique_chars):
                  f.write(f'{char} {char}\n')
      csv_file = 'data/processed/ljspeech/metadata_phone.csv'
      output_file = 'dict.txt'
      get_unique_chars(csv_file, output_file)
  • Step 4 :

    • Run binarization steps(for faster I/O) as follow:
      export PYTHONPATH=.
      CUDA_VISIBLE_DEVICES=0 python data_gen/tts/ --config ./modules/FastDiff/config/FastDiff.yaml
      This will populate the training files inside of the data/processed/ljspeech folder

Training Stages:

  • Step 1 :
    • Train the prodiff teacher in the cmd:
       CUDA_VISIBLE_DEVICES=0 python tasks/ --config ./modules/FastDiff/config/FastDiff.yaml  --exp_name kallu_mama --reset
    Train weight will be save in the checkpoints folder