google-research/leaf-audio

Why do I get the opposite conclusion from the paper

Joseph-sun opened this issue · 2 comments

I compared two configs and their results are respectively as below:


**********USE LEAF CONFIG:
"
AudioClassifier.frontend = @leaf()
AudioClassifier.encoder = @pann()
PANN.depth = 7
PANN.dropout_rate = 0.3
"
**********RESULT:
Epoch 1/10
2021-04-06 15:54:15.856190: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-04-06 15:54:16.161494: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
1665/1665 [==============================] - 483s 290ms/step - loss: 0.5501 - sparse_categorical_accuracy: 0.8316 - val_loss: 0.0369 - val_sparse_categorical_accuracy: 1.0000
Epoch 2/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.2049 - sparse_categorical_accuracy: 0.9349 - val_loss: 0.0120 - val_sparse_categorical_accuracy: 1.0000
Epoch 3/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.1594 - sparse_categorical_accuracy: 0.9496 - val_loss: 0.0132 - val_sparse_categorical_accuracy: 1.0000
Epoch 4/10
1665/1665 [==============================] - 479s 288ms/step - loss: 0.1363 - sparse_categorical_accuracy: 0.9572 - val_loss: 0.0063 - val_sparse_categorical_accuracy: 1.0000
Epoch 5/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.1175 - sparse_categorical_accuracy: 0.9635 - val_loss: 0.0018 - val_sparse_categorical_accuracy: 1.0000
Epoch 6/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.1048 - sparse_categorical_accuracy: 0.9676 - val_loss: 0.0021 - val_sparse_categorical_accuracy: 1.0000
Epoch 7/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.0920 - sparse_categorical_accuracy: 0.9715 - val_loss: 1.4136e-04 - val_sparse_categorical_accuracy: 1.0000
Epoch 8/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.0820 - sparse_categorical_accuracy: 0.9743 - val_loss: 9.0474e-04 - val_sparse_categorical_accuracy: 1.0000
Epoch 9/10
1665/1665 [==============================] - 479s 288ms/step - loss: 0.0749 - sparse_categorical_accuracy: 0.9768 - val_loss: 0.0013 - val_sparse_categorical_accuracy: 1.0000
Epoch 10/10
1665/1665 [==============================] - 479s 288ms/step - loss: 0.0697 - sparse_categorical_accuracy: 0.9786 - val_loss: 1.9908e-04 - val_sparse_categorical_accuracy: 1.0000


**********USE MEL CONFIG:
"
AudioClassifier.frontend = @MelFilterbanks()
AudioClassifier.encoder = @pann()
PANN.depth = 7
PANN.dropout_rate = 0.3
"

**********RESULT:
2/1665 [..............................] - ETA: 2:14 - loss: 3.8187 - sparse_categorical_accuracy: 0.3047
WARNING:tensorflow:Callbacks method on_train_batch_end is slow compared to the batch time (batch time: 0.0491s vs on_train_batch_end time: 0.1113s). Check your callbacks.
W0406 14:51:43.080209 139667338893120 callbacks.py:325] Callbacks method on_train_batch_end is slow compared to the batch time (batch time: 0.0491s vs on_train_batch_end time: 0.1113s
1665/1665 [==============================] - 280s 168ms/step - loss: 0.5950 - sparse_categorical_accuracy: 0.8227 - val_loss: 0.0011 - val_sparse_categorical_accuracy: 1.0000
Epoch 2/10
1665/1665 [==============================] - 280s 168ms/step - loss: 0.1996 - sparse_categorical_accuracy: 0.9364 - val_loss: 7.6344e-04 - val_sparse_categorical_accuracy: 1.0000
Epoch 3/10
1665/1665 [==============================] - 281s 169ms/step - loss: 0.1499 - sparse_categorical_accuracy: 0.9533 - val_loss: 8.7197e-04 - val_sparse_categorical_accuracy: 1.0000
Epoch 4/10
1665/1665 [==============================] - 281s 169ms/step - loss: 0.1268 - sparse_categorical_accuracy: 0.9611 - val_loss: 0.0029 - val_sparse_categorical_accuracy: 1.0000
Epoch 5/10
1665/1665 [==============================] - 280s 168ms/step - loss: 0.1081 - sparse_categorical_accuracy: 0.9671 - val_loss: 0.0014 - val_sparse_categorical_accuracy: 1.0000
Epoch 6/10
1665/1665 [==============================] - 282s 169ms/step - loss: 0.0981 - sparse_categorical_accuracy: 0.9698 - val_loss: 0.0017 - val_sparse_categorical_accuracy: 1.0000
Epoch 7/10
1665/1665 [==============================] - 280s 168ms/step - loss: 0.0863 - sparse_categorical_accuracy: 0.9732 - val_loss: 0.1274 - val_sparse_categorical_accuracy: 0.9669
Epoch 8/10
1665/1665 [==============================] - 282s 169ms/step - loss: 0.0793 - sparse_categorical_accuracy: 0.9755 - val_loss: 0.0349 - val_sparse_categorical_accuracy: 1.0000
Epoch 9/10
1665/1665 [==============================] - 280s 168ms/step - loss: 0.0713 - sparse_categorical_accuracy: 0.9782 - val_loss: 0.0388 - val_sparse_categorical_accuracy: 0.9917
Epoch 10/10
1665/1665 [==============================] - 282s 169ms/step - loss: 0.0638 - sparse_categorical_accuracy: 0.9802 - val_loss: 0.0225 - val_sparse_categorical_accuracy: 1.0000

I compared two configs and their results are respectively as below:

**********USE LEAF CONFIG:
"
AudioClassifier.frontend = @leaf()
AudioClassifier.encoder = @pann()
PANN.depth = 7
PANN.dropout_rate = 0.3
"
**********RESULT:
Epoch 1/10
2021-04-06 15:54:15.856190: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-04-06 15:54:16.161494: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
1665/1665 [==============================] - 483s 290ms/step - loss: 0.5501 - sparse_categorical_accuracy: 0.8316 - val_loss: 0.0369 - val_sparse_categorical_accuracy: 1.0000
Epoch 2/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.2049 - sparse_categorical_accuracy: 0.9349 - val_loss: 0.0120 - val_sparse_categorical_accuracy: 1.0000
Epoch 3/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.1594 - sparse_categorical_accuracy: 0.9496 - val_loss: 0.0132 - val_sparse_categorical_accuracy: 1.0000
Epoch 4/10
1665/1665 [==============================] - 479s 288ms/step - loss: 0.1363 - sparse_categorical_accuracy: 0.9572 - val_loss: 0.0063 - val_sparse_categorical_accuracy: 1.0000
Epoch 5/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.1175 - sparse_categorical_accuracy: 0.9635 - val_loss: 0.0018 - val_sparse_categorical_accuracy: 1.0000
Epoch 6/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.1048 - sparse_categorical_accuracy: 0.9676 - val_loss: 0.0021 - val_sparse_categorical_accuracy: 1.0000
Epoch 7/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.0920 - sparse_categorical_accuracy: 0.9715 - val_loss: 1.4136e-04 - val_sparse_categorical_accuracy: 1.0000
Epoch 8/10
1665/1665 [==============================] - 480s 288ms/step - loss: 0.0820 - sparse_categorical_accuracy: 0.9743 - val_loss: 9.0474e-04 - val_sparse_categorical_accuracy: 1.0000
Epoch 9/10
1665/1665 [==============================] - 479s 288ms/step - loss: 0.0749 - sparse_categorical_accuracy: 0.9768 - val_loss: 0.0013 - val_sparse_categorical_accuracy: 1.0000
Epoch 10/10
1665/1665 [==============================] - 479s 288ms/step - loss: 0.0697 - sparse_categorical_accuracy: 0.9786 - val_loss: 1.9908e-04 - val_sparse_categorical_accuracy: 1.0000

**********USE MEL CONFIG:
"
AudioClassifier.frontend = @MelFilterbanks()
AudioClassifier.encoder = @pann()
PANN.depth = 7
PANN.dropout_rate = 0.3
"

**********RESULT:
2/1665 [..............................] - ETA: 2:14 - loss: 3.8187 - sparse_categorical_accuracy: 0.3047
WARNING:tensorflow:Callbacks method on_train_batch_end is slow compared to the batch time (batch time: 0.0491s vs on_train_batch_end time: 0.1113s). Check your callbacks.
W0406 14:51:43.080209 139667338893120 callbacks.py:325] Callbacks method on_train_batch_end is slow compared to the batch time (batch time: 0.0491s vs on_train_batch_end time: 0.1113s
1665/1665 [==============================] - 280s 168ms/step - loss: 0.5950 - sparse_categorical_accuracy: 0.8227 - val_loss: 0.0011 - val_sparse_categorical_accuracy: 1.0000
Epoch 2/10
1665/1665 [==============================] - 280s 168ms/step - loss: 0.1996 - sparse_categorical_accuracy: 0.9364 - val_loss: 7.6344e-04 - val_sparse_categorical_accuracy: 1.0000
Epoch 3/10
1665/1665 [==============================] - 281s 169ms/step - loss: 0.1499 - sparse_categorical_accuracy: 0.9533 - val_loss: 8.7197e-04 - val_sparse_categorical_accuracy: 1.0000
Epoch 4/10
1665/1665 [==============================] - 281s 169ms/step - loss: 0.1268 - sparse_categorical_accuracy: 0.9611 - val_loss: 0.0029 - val_sparse_categorical_accuracy: 1.0000
Epoch 5/10
1665/1665 [==============================] - 280s 168ms/step - loss: 0.1081 - sparse_categorical_accuracy: 0.9671 - val_loss: 0.0014 - val_sparse_categorical_accuracy: 1.0000
Epoch 6/10
1665/1665 [==============================] - 282s 169ms/step - loss: 0.0981 - sparse_categorical_accuracy: 0.9698 - val_loss: 0.0017 - val_sparse_categorical_accuracy: 1.0000
Epoch 7/10
1665/1665 [==============================] - 280s 168ms/step - loss: 0.0863 - sparse_categorical_accuracy: 0.9732 - val_loss: 0.1274 - val_sparse_categorical_accuracy: 0.9669
Epoch 8/10
1665/1665 [==============================] - 282s 169ms/step - loss: 0.0793 - sparse_categorical_accuracy: 0.9755 - val_loss: 0.0349 - val_sparse_categorical_accuracy: 1.0000
Epoch 9/10
1665/1665 [==============================] - 280s 168ms/step - loss: 0.0713 - sparse_categorical_accuracy: 0.9782 - val_loss: 0.0388 - val_sparse_categorical_accuracy: 0.9917
Epoch 10/10
1665/1665 [==============================] - 282s 169ms/step - loss: 0.0638 - sparse_categorical_accuracy: 0.9802 - val_loss: 0.0225 - val_sparse_categorical_accuracy: 1.0000

Which dataset did you use?

I also encountered similar problem. I suspect they're already over-fitting.