Question about ASL and oidv6

Question

Question about ASL and oidv6

Leterax opened this issue 3 years ago · 6 comments

I am trying to replicate your findings with the oidv6 dataset. So far I have managed to load the oidv6 dataset (only about 1.7million valid images) with their labels. I am using a efficientnetV2 backbone with a final dense layer (no activation function. 9605 classes lead to 11million Params on the last layer!). To augment the data I am using Autoaugment and randaugment. I shuffle the data before training.
My problems:
For some reason the loss Is extremely high (100k+) and after every epoch the loss "resets" to a high level. The accuracy only improves by ~0.15% per epoch starting from around 25% after the first epoch.
Batch loss:

Current training run: https://tensorboard.dev/experiment/Xre2GIvmSJGReZktz5Qfwg/#scalars&_smoothingWeight=0&runSelectionState=eyJmaXQvMjAyMTEwMDQtMDg0MzM4L3RyYWluIjpmYWxzZSwiZml0LzIwMjExMDA0LTEwMzkxMC90cmFpbiI6dHJ1ZX0%3D

I was wondering if you had any tips/ideas as to why this is happening. Thank you in advance!

Answer 1 · 2021-10-05T05:29:17.000Z

We intend to publish in a couple of weeks an article, where open-images-v6 is a main dataset in it
I believe we will release a full training code, and also our processed dataset, which contains more images than yours (~6M I think)

Tal

Answer 2 · 2021-10-05T05:43:00.000Z

Thanks Tal, I'll watch out for that!
Is it normal for the loss to have these magnitudes?

Answer 3 · 2021-10-06T07:10:26.000Z

the scale of the loss doesnt really matter (and it depends if you do reduce_mean or reduce_sum)
but a jump in the loss after an epoch is a clear sign for a bug. triple check your code

Answer 4 · 2021-10-06T09:35:27.000Z

Thanks for the quick answer! Im going to try out reduce_mean next. Luckily i have access to a V100 so training goes pretty quickly, but its still slow to tell if its working ;) Im also going to try training on only the first 1000 classes to see if that makes a difference.
I dont really know what else to check... The whole training file is very short and i leave most of the heavy lifting to tensorflow.

import tensorflow as tf
import tensorflow_hub as hub

from loss import AsymmetricLossOptimizedTF

from hyperparams import *
from data import create_dataset


def build_model():
    url = 'https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet21k_b1/feature_vector/2'
    feature_extractor_layer = hub.KerasLayer(url,trainable=False)
    model = tf.keras.Sequential([
        tf.keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3)),
        feature_extractor_layer,
        tf.keras.layers.Dense(NUM_CLASSES, name="output"),
    ])
    return model


model = build_model()
model.build([None, IMG_SIZE, IMG_SIZE, 3])
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=AsymmetricLossOptimizedTF(7.0, 0.0),
    metrics=['accuracy']
)


data = create_dataset()

history = model.fit(data, epochs=EPOCHS)

Answer 5 · 2021-10-08T12:40:40.000Z

Turns out i had some issues with my data loading, it seems to work better now:

Im only getting an accuracy of ~40%, but i will investigate this later. Maybe a few more epochs will help. Thank you for your help and the awesome paper!

Answer 6 · 2022-11-11T07:10:39.000Z

@Leterax Have you written the loss function in tensorflow? Is this open sourced?