What is the format of the `dataset` in the given example of Data Augmentation?
Closed this issue · 4 comments
In the Data Augmentation example, pls tell that what exactly format is the dataset
, as the first parameter of tu.dataset.MixupDataset
train_transform = albumentations.Compose([
albumentations.Resize(IMAGE_SIZE, IMAGE_SIZE),
albumentations.HorizontalFlip(p=0.5),
tu.dataset.randAugment(image_size=IMAGE_SIZE, N=2, M=12, p=0.9, mode='all', cut_out=False),
albumentations.Normalize(),
albumentations.Cutout(num_holes=8, max_h_size=IMAGE_SIZE//8, max_w_size=IMAGE_SIZE//8, fill_value=0, p=0.25),
AT.ToTensorV2(),
])
mixup_dataset = tu.dataset.MixupDataset(dataset, alpha=1.0, prob=0.1, mixup_to_cutmix=0.3)
# 0.07 mixup and 0.03 cutmix
Reminder: image_size=IMAGE_SIZE
in tu.dataset.randAugment is redundant.
In the latest version, redundant parameters have been removed.
TorchUtils/torch_utils/dataset/common_aug.py
Lines 7 to 16 in 4573472
Format is the dataset
in MixupDataset
should be like this:
In Dataset
class, __getitem__
should return img
tensor (after to_tensor
, shape c,h,w) and label
tensor/array/list (after one-hot or soft label, shape (num_classes,)).
An example:
__getitem__
return (img, label)
img
shape (3,224,224)
label
is a list [0,1,0,0,0] or after label smoothing [0.02, 0.92, 0.02, 0.02, 0.02]
Finally, thanks for asking. This repo is working in progress. I will add more documents and examples later.
In the latest version, redundant parameters have been removed.
TorchUtils/torch_utils/dataset/common_aug.py
Lines 7 to 16 in 4573472
Format is the
dataset
inMixupDataset
should be like this:In
Dataset
class,__getitem__
should returnimg
tensor (afterto_tensor
, shape c,h,w) andlabel
tensor/array/list (after one-hot or soft label, shape (num_classes,)).An example:
__getitem__
return (img, label)
img
shape (3,224,224)
label
is a list [0,1,0,0,0] or after label smoothing [0.02, 0.92, 0.02, 0.02, 0.02]Finally, thanks for asking. This repo is working in progress. I will add more documents and examples later.
Thanks for your guidance ~
By the way, I was wondering that is it convenient for you to provide a full deep learning script that use this repo ? I really want to learn from it.
Thank you again and have a good day~
This is a example using this repo: https://github.com/seefun/TorchUtils/blob/master/examples/kaggle_leaves_classification.ipynb
It's a training jupyter notebook for this dataset: https://www.kaggle.com/c/classify-leaves
Using this training pipeline, usually, we are able to get great baseline score in Kaggle.
Hope this can help you. GLHF~
This is a example using this repo: https://github.com/seefun/TorchUtils/blob/master/examples/kaggle_leaves_classification.ipynb
It's a training jupyter notebook for this dataset: https://www.kaggle.com/c/classify-leaves
Using this training pipeline, usually, we are able to get great baseline score in Kaggle.
Hope this can help you. GLHF~
Thanks soooo much~