Alibaba-MIIL/ASL

Question about ASL and oidv6

Leterax opened this issue · 6 comments

I am trying to replicate your findings with the oidv6 dataset. So far I have managed to load the oidv6 dataset (only about 1.7million valid images) with their labels. I am using a efficientnetV2 backbone with a final dense layer (no activation function. 9605 classes lead to 11million Params on the last layer!). To augment the data I am using Autoaugment and randaugment. I shuffle the data before training.
My problems:
For some reason the loss Is extremely high (100k+) and after every epoch the loss "resets" to a high level. The accuracy only improves by ~0.15% per epoch starting from around 25% after the first epoch.
Batch loss:
Screenshot_20211004-125113
Current training run: https://tensorboard.dev/experiment/Xre2GIvmSJGReZktz5Qfwg/#scalars&_smoothingWeight=0&runSelectionState=eyJmaXQvMjAyMTEwMDQtMDg0MzM4L3RyYWluIjpmYWxzZSwiZml0LzIwMjExMDA0LTEwMzkxMC90cmFpbiI6dHJ1ZX0%3D

I was wondering if you had any tips/ideas as to why this is happening. Thank you in advance!

mrT23 commented

We intend to publish in a couple of weeks an article, where open-images-v6 is a main dataset in it
I believe we will release a full training code, and also our processed dataset, which contains more images than yours (~6M I think)

Tal

Thanks Tal, I'll watch out for that!
Is it normal for the loss to have these magnitudes?

mrT23 commented

the scale of the loss doesnt really matter (and it depends if you do reduce_mean or reduce_sum)
but a jump in the loss after an epoch is a clear sign for a bug. triple check your code

Thanks for the quick answer! Im going to try out reduce_mean next. Luckily i have access to a V100 so training goes pretty quickly, but its still slow to tell if its working ;) Im also going to try training on only the first 1000 classes to see if that makes a difference.
I dont really know what else to check... The whole training file is very short and i leave most of the heavy lifting to tensorflow.

import tensorflow as tf
import tensorflow_hub as hub

from loss import AsymmetricLossOptimizedTF

from hyperparams import *
from data import create_dataset


def build_model():
    url = 'https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet21k_b1/feature_vector/2'
    feature_extractor_layer = hub.KerasLayer(url,trainable=False)
    model = tf.keras.Sequential([
        tf.keras.Input(shape=(IMG_SIZE, IMG_SIZE, 3)),
        feature_extractor_layer,
        tf.keras.layers.Dense(NUM_CLASSES, name="output"),
    ])
    return model


model = build_model()
model.build([None, IMG_SIZE, IMG_SIZE, 3])
model.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=AsymmetricLossOptimizedTF(7.0, 0.0),
    metrics=['accuracy']
)


data = create_dataset()

history = model.fit(data, epochs=EPOCHS)

Turns out i had some issues with my data loading, it seems to work better now:
image
Im only getting an accuracy of ~40%, but i will investigate this later. Maybe a few more epochs will help. Thank you for your help and the awesome paper!

@Leterax Have you written the loss function in tensorflow? Is this open sourced?