Code of "ENLD:Efficient Noisy Label Detection in Data Lake".
Module Steps: Data Preprocess-> Model Generate -> Fine-grained Noisy Label Detection
Dataset download urls: EMNIST CIFAR100 Tiny-Imagenet
Divide inventory data and incremental data of each dataset and add label noise, path:/data_preprocess/noise_generate/:
python noise_generate.py --dataset --data_path --save_path
usage: noise_generate.py [-h] [--dataset DATASET] [--data_path DATA_PATH] [--save_path SAVE_PATH]
Generate unbalanced incremental datasetes from incremental data, path:/data_preprocess/divide_inremental/:
python split.py --dataset --data_path --save_path
usage: split.py [-h] [--dataset DATASET] [--data_path DATA_PATH] [--save_path SAVE_PATH]
Init the gerneral model, path:/model_gen/:
python generate_model.py.py --dataset --save_path --noise_rate
usage: generate_model.py [-h] [--dataset DATASET] [--data_path DATA_PATH] [--save_path SAVE_PATH] [--noise_rate NOISE_RATE]
Evaluate and process fine-grained noisy label detection method:
python fine_grained_noisy_label_detection.py --dataset --model_path --vote --size --iteration --noise_rate
usage: fine_grained_noisy_label_detection.py [-h] [--dataset DATASET] [--data_path DATA_PATH] [--model_path MODEL_PATH] [--vote VOTE] [--size SIZE] [--batch_size_set BATCH_SIZE_SET] [--iteration ITERATION] [--noise_rate NOISE_RATE]
Replace the sample selection strategy in fine-grained noisy label detection, path:/enld_policy/:
python ENLD_random.py/ENLD_entropy.py/ENLD_confidence.py/ENLD_pseudo.py
usage: the same as Fine-grained Noisy Label Detection
Conduct ablation study, path:/ablation_study/: ENLD-1~5
path: /missing_label/
python --ratio 0.75 ENLD_missing_label.py
*usage: [--ratio MISSING RATE]