What is the proper loss value? My train and val loss is around 6.5-6.6 and do not drop.
BingliangLi opened this issue · 0 comments
BingliangLi commented
Hi, thanks for open source this great project! I fine-tuned the tango model on my own dataset for about 20 epoch, but the train & val loss does not drop at all, and since the loss is around 6.7, I think this mean my model is generating random results.
May I ask what is your loss value for train and val on AudioCaps?
All my data are 10 seconds audio with 48khz 2 channel audio:
Input #0, wav, from 'qslWda0kTxA_70000_80000.wav':
Metadata:
encoder : Lavf59.27.100
Duration: 00:00:10.00, bitrate: 1536 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, 2 channels, s16, 1536 kb/s
Here is the template command and my training command:
# Continue training the LDM from our checkpoint using the --hf_model argument
accelerate launch train.py \
--train_file="data/train_audiocaps.json" --validation_file="data/valid_audiocaps.json" --test_file="data/test_audiocaps_subset.json" \
--hf_model "declare-lab/tango" --unet_model_config="configs/diffusion_model_config.json" --freeze_text_encoder \
--gradient_accumulation_steps 8 --per_device_train_batch_size=1 --per_device_eval_batch_size=2 --augment \
--learning_rate=3e-5 --num_train_epochs 40 --snr_gamma 5 \
--text_column captions --audio_column location --checkpointing_steps="best"
# Continue training on my dataset
HF_ENDPOINT=https://hf-mirror.com accelerate launch train.py \
--hf_model "declare-lab/tango" --unet_model_config="configs/diffusion_model_config.json" --freeze_text_encoder \
--train_file="data/dataset_train.json" --validation_file="data/dataset_val.json" --test_file="data/dataset_test.json" \
--gradient_accumulation_steps 8 --per_device_train_batch_size=1 --per_device_eval_batch_size=1 \
--learning_rate=3e-5 --num_train_epochs 40 --snr_gamma 5 --num_train_epochs 20
Any help would be much appreciated!