facebookresearch/swav

About the loss

Opened this issue · 1 comments

I finetuned a pre-trained model on my own dataset using the Swaw framework, and obtained the loss curve as follows. Is this normal? Considering that the class number of our dataset used is 7, we set the prototype as 21. In the first few epochs (Please see the column marked as epoch), the loss value is approximately ln (21) (which is about 3.044522), and it will eventually reach around 2.88.
I browsed all the issues and the Redme.md text, and found that the loss declined from 8.006 to about 4 in one of your replies. Thus, i want to know whether mine is normal.

I show the content of stats0.pkl as bellow.
epoch loss
0 --------------
1 --------------
2 --------------
3 0.0 3.044831
4 1.0 3.031930
5 2.0 3.014766
6 3.0 3.005409
7 4.0 3.001633
8 5.0 2.997627
9 6.0 2.993937
10 7.0 2.990014
11 8.0 2.986019
12 9.0 2.980988
13 10.0 2.960421
14 11.0 2.952940
15 12.0 2.944062
16 13.0 2.935696
17 14.0 2.929111
18 15.0 2.921501
19 16.0 2.914384
20 17.0 2.908237
21 18.0 2.902894
22 19.0 2.898225
23 20.0 2.895764
24 21.0 2.891767
25 22.0 2.890472
26 23.0 2.889500
27 24.0 2.887900
28 25.0 2.886966
29 26.0 2.886783
30 27.0 2.888465
31 28.0 2.887260
32 29.0 2.885608
model_swa_loss_curve(1)
I also plot the confusion matrix of the cluster id and the true label, which is as follows:
model_swa_cluster_real_label
The following the parameters which I used.
INFO - 10/06/23 10:41:17 - 0:00:00 - Building data done with 20304 audios loaded.
INFO - 10/06/23 10:41:24 - 0:00:07 - Building model done.
INFO - 10/06/23 10:41:24 - 0:00:07 - Building optimizer done.
INFO - 10/06/23 10:41:24 - 0:00:07 - Initializing mixed precision done.
INFO - 10/06/23 10:41:25 - 0:00:08 - ============ Starting epoch 0 ... ============
INFO - 10/06/23 10:55:10 - 0:13:53 - ============ Starting epoch 1 ... ============
INFO - 10/06/23 11:09:12 - 0:27:55 - ============ Starting epoch 2 ... ============
INFO - 10/06/23 11:23:10 - 0:41:53 - ============ Starting epoch 3 ... ============
INFO - 10/06/23 11:37:11 - 0:55:54 - ============ Starting epoch 4 ... ============
INFO - 10/06/23 11:51:14 - 1:09:56 - ============ Starting epoch 5 ... ============
INFO - 10/06/23 12:05:11 - 1:23:54 - ============ Starting epoch 6 ... ============
INFO - 10/06/23 12:19:07 - 1:37:50 - ============ Starting epoch 7 ... ============
INFO - 10/06/23 12:32:58 - 1:51:41 - ============ Starting epoch 8 ... ============
INFO - 10/06/23 12:46:48 - 2:05:31 - ============ Starting epoch 9 ... ============
INFO - 10/06/23 13:00:29 - 2:19:12 - ============ Starting epoch 10 ... ============
INFO - 10/06/23 13:14:08 - 2:32:51 - ============ Starting epoch 11 ... ============
INFO - 10/06/23 13:27:47 - 2:46:30 - ============ Starting epoch 12 ... ============
INFO - 10/06/23 13:41:27 - 3:00:10 - ============ Starting epoch 13 ... ============
INFO - 10/06/23 13:55:02 - 3:13:45 - ============ Starting epoch 14 ... ============
INFO - 10/06/23 14:08:41 - 3:27:23 - ============ Starting epoch 15 ... ============
INFO - 10/06/23 14:22:21 - 3:41:03 - ============ Starting epoch 16 ... ============
INFO - 10/06/23 14:35:59 - 3:54:42 - ============ Starting epoch 17 ... ============
INFO - 10/06/23 14:49:39 - 4:08:22 - ============ Starting epoch 18 ... ============
INFO - 10/06/23 15:03:18 - 4:22:01 - ============ Starting epoch 19 ... ============
INFO - 10/06/23 15:16:57 - 4:35:39 - ============ Starting epoch 20 ... ============
INFO - 10/06/23 15:30:35 - 4:49:17 - ============ Starting epoch 21 ... ============
INFO - 10/06/23 15:44:12 - 5:02:55 - ============ Starting epoch 22 ... ============
INFO - 10/06/23 15:57:51 - 5:16:34 - ============ Starting epoch 23 ... ============
INFO - 10/06/23 16:11:29 - 5:30:11 - ============ Starting epoch 24 ... ============
INFO - 10/06/23 16:25:06 - 5:43:49 - ============ Starting epoch 25 ... ============
INFO - 10/06/23 16:38:44 - 5:57:27 - ============ Starting epoch 26 ... ============
INFO - 10/06/23 16:52:29 - 6:11:12 - ============ Starting epoch 27 ... ============
INFO - 10/06/23 17:06:07 - 6:24:50 - ============ Starting epoch 28 ... ============
INFO - 10/06/23 17:19:39 - 6:38:22 - ============ Starting epoch 29 ... ============
INFO - 10/07/23 16:08:14 - 0:00:00 - ============ Initialized logger ============
INFO - 10/07/23 16:08:14 - 0:00:00 - audioset_pretrain: True
base_lr: 1e-05
batch_size: 22
checkpoint_freq: 25
crops_for_assign: [0, 1]
dist_url: env://dump_checkpoints: ./checkpoints
dump_path: .
epoch_queue_starts: 10
epochs: 30
epsilon: 0.03
feat_dim: 768
final_lr: 0.0
freeze_prototypes_niters: 2000
freqm: 24
fstride: 10
gpu_to_work_on: 1
hidden_mlp: 2048
imagenet_pretrain: True
is_slurm_job: False
local_rank: 1
max_scale_crops: [1.0, 1.0]
min_scale_crops: [0.8, 0.8]
mixup: 0.0
nmb_crops: [2, 0]
nmb_prototypes: 21
queue_length: 1000
rank: 1
seed: 31
sinkhorn_iterations: 3
size_crops: [1, 1]
start_warmup: 0
sync_bn: pytorch
syncbn_process_group_size: 8
temperature: 0.1
timem: 96
tstride: 10
use_fp16: True
warmup_epochs: 3
wd: 1e-06
workers: 10
world_size: 2
Would you please tell me whether the loss curve and the confusion matrix are normal? Or what i should do to improve my performance?

The batch size is 4, not 22, in the above comment. Thank you for your valuable reading time! Best wishes!!