Question about ASL and oidv6
Leterax opened this issue · 6 comments
I am trying to replicate your findings with the oidv6 dataset. So far I have managed to load the oidv6 dataset (only about 1.7million valid images) with their labels. I am using a efficientnetV2 backbone with a final dense layer (no activation function. 9605 classes lead to 11million Params on the last layer!). To augment the data I am using Autoaugment and randaugment. I shuffle the data before training.
My problems:
For some reason the loss Is extremely high (100k+) and after every epoch the loss "resets" to a high level. The accuracy only improves by ~0.15% per epoch starting from around 25% after the first epoch.
Batch loss:
Current training run: https://tensorboard.dev/experiment/Xre2GIvmSJGReZktz5Qfwg/#scalars&_smoothingWeight=0&runSelectionState=eyJmaXQvMjAyMTEwMDQtMDg0MzM4L3RyYWluIjpmYWxzZSwiZml0LzIwMjExMDA0LTEwMzkxMC90cmFpbiI6dHJ1ZX0%3D
I was wondering if you had any tips/ideas as to why this is happening. Thank you in advance!
We intend to publish in a couple of weeks an article, where open-images-v6 is a main dataset in it
I believe we will release a full training code, and also our processed dataset, which contains more images than yours (~6M I think)
Tal
Thanks Tal, I'll watch out for that!
Is it normal for the loss to have these magnitudes?
the scale of the loss doesnt really matter (and it depends if you do reduce_mean or reduce_sum)
but a jump in the loss after an epoch is a clear sign for a bug. triple check your code
Thanks for the quick answer! Im going to try out reduce_mean
next. Luckily i have access to a V100 so training goes pretty quickly, but its still slow to tell if its working ;) Im also going to try training on only the first 1000 classes to see if that makes a difference.
I dont really know what else to check... The whole training file is very short and i leave most of the heavy lifting to tensorflow.
import tensorflow as tf
import tensorflow_hub as hub
from loss import AsymmetricLossOptimizedTF
from hyperparams import *
from data import create_dataset
def build_model():
url = 'https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet21k_b1/feature_vector/2'
feature_extractor_layer = hub.KerasLayer(url,trainable=False)
model = tf.keras.Sequential([
tf.keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3)),
feature_extractor_layer,
tf.keras.layers.Dense(NUM_CLASSES, name="output"),
])
return model
model = build_model()
model.build([None, IMG_SIZE, IMG_SIZE, 3])
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss=AsymmetricLossOptimizedTF(7.0, 0.0),
metrics=['accuracy']
)
data = create_dataset()
history = model.fit(data, epochs=EPOCHS)
@Leterax Have you written the loss function in tensorflow? Is this open sourced?