declare-lab/tango

What is the proper loss value? My train and val loss is around 6.5-6.6 and do not drop.

BingliangLi opened this issue · 0 comments

Hi, thanks for open source this great project! I fine-tuned the tango model on my own dataset for about 20 epoch, but the train & val loss does not drop at all, and since the loss is around 6.7, I think this mean my model is generating random results.
May I ask what is your loss value for train and val on AudioCaps?
All my data are 10 seconds audio with 48khz 2 channel audio:

Input #0, wav, from 'qslWda0kTxA_70000_80000.wav':
  Metadata:
    encoder         : Lavf59.27.100
  Duration: 00:00:10.00, bitrate: 1536 kb/s
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, 2 channels, s16, 1536 kb/s

Here is the template command and my training command:

# Continue training the LDM from our checkpoint using the --hf_model argument
accelerate launch train.py \
--train_file="data/train_audiocaps.json" --validation_file="data/valid_audiocaps.json" --test_file="data/test_audiocaps_subset.json" \
--hf_model "declare-lab/tango" --unet_model_config="configs/diffusion_model_config.json" --freeze_text_encoder \
--gradient_accumulation_steps 8 --per_device_train_batch_size=1 --per_device_eval_batch_size=2 --augment \
--learning_rate=3e-5 --num_train_epochs 40 --snr_gamma 5 \
--text_column captions --audio_column location --checkpointing_steps="best"

# Continue training on my dataset
HF_ENDPOINT=https://hf-mirror.com accelerate launch train.py \
--hf_model "declare-lab/tango" --unet_model_config="configs/diffusion_model_config.json" --freeze_text_encoder \
--train_file="data/dataset_train.json" --validation_file="data/dataset_val.json" --test_file="data/dataset_test.json" \
--gradient_accumulation_steps 8 --per_device_train_batch_size=1 --per_device_eval_batch_size=1 \
--learning_rate=3e-5 --num_train_epochs 40 --snr_gamma 5 --num_train_epochs 20

Any help would be much appreciated!