
How to continue to train from scratch?

BruceTangLin opened this issue · 8 comments

Can someone teach me? I just started to learn face recognition.
Issues: I have already trained the model for 171 epochs. And I want to continue to train the model based on the latest model. How do I set the
BACKBONE_RESUME_ROOT = './', # the root to resume training from a saved checkpoint
HEAD_RESUME_ROOT = './', # the root to resume training from a saved checkpoint
in the file of the config.py.

                                                                                                                                                                         Thank you very much

I really appreciate that someone can teach me

Fill in the above config file with your model paths.
such as:
------ config.py ------
MODEL_ROOT = '/home/face.evoLVe.PyTorch/output_models',
BACKBONE_RESUME_ROOT = './output_models/backbone_ir50_ms1m_epoch120.pth',
HEAD_RESUME_ROOT = './output_models/head_arcface.pth',

Hi @BruceTangLin @changxinC ,
I am trying to train, I am not able to figure out the data format required for training, currently my data is inside

Inside dataV1 directory the data is as follows:
-> id1/
-> 1.jpg
-> ...
-> id2/
-> 1.jpg
-> ...
-> ...
-> ...
-> ...
Data is already aligned, resized to 112 using the align script provided in repo.
When I run train.py, I am getting file not found error, I saw lot of people are facing the same issue, not being able to get the correct data format.

It would help a lot if you can guide how to get the correct dataset format for training. Help would be much appreciated, thank you.

If the file cannot be found, your path setting is wrong.

In config.py
DATA_ROOT = '/database/face/CASIA_Face_Recognition/'

In train.py line 121
dataset_train = datasets.ImageFolder(os.path.join(DATA_ROOT, 'train'), train_transform)

My actual dataset path:
/database/face/CASIA_Face_Recognition/train/id1/1.jpg 2.jpg 3.jpg
/database/face/CASIA_Face_Recognition/train/id2/1.jpg 2.jpg 3.jpg

Hi @changxinC, thanks for replying, appreciate the help.

My DATA_ROOT = 'D:/face.evoLVe.PyTorch/data/dataV1'
Actual data:
D:/face.evoLVe.PyTorch/data/dataV1/Id1/1.jpg 2.jpg .....
D:/face.evoLVe.PyTorch/data/dataV1/Id2/1.jpg 2.jpg .....

How to generate meta, sizes files? Don't have any files other than .jpg's inside dataV1 directory.

Here's the exact output when I run train.py:

Overall Configurations:
{'SEED': 1337, 'DATA_ROOT': 'D:/face.evoLVe.PyTorch/data/dataV1', 'MODEL_ROOT': './model', 'LOG_ROOT': './log', 'BACKBONE_RESUME_ROOT': './model/weights/backbone_ir50_asia.pth', 'HEAD_RESUME_ROOT': './', 'BACKBONE_NAME': 'IR_50', 'HEAD_NAME': 'ArcFace', 'LOSS_NAME': 'Focal', 'INPUT_SIZE': [112, 112], 'RGB_MEAN': [0.5, 0.5, 0.5], 'RGB_STD': [0.5, 0.5, 0.5], 'EMBEDDING_SIZE': 512, 'BATCH_SIZE': 512, 'DROP_LAST': True, 'LR': 0.1, 'NUM_EPOCH': 125, 'WEIGHT_DECAY': 0.0005, 'MOMENTUM': 0.9, 'STAGES': [35, 65, 95], 'DEVICE': device(type='cpu'), 'MULTI_GPU': True, 'GPU_ID': [0, 1], 'PIN_MEMORY': True, 'NUM_WORKERS': 0}

Number of Training Classes: 5749
Traceback (most recent call last):
File "train.py", line 84, in
lfw, cfp_ff, cfp_fp, agedb, calfw, cplfw, vgg2_fp, lfw_issame, cfp_ff_issame, cfp_fp_issame, agedb_issame, calfw_issame, cplfw_issame, vgg2_fp_issame = get_val_data(DATA_ROOT)
File "D:\srikarRnD\dev\face.evoLVe.PyTorch\util\utils.py", line 63, in get_val_data
lfw, lfw_issame = get_val_pair(data_path, 'lfw')
File "D:\srikarRnD\dev\face.evoLVe.PyTorch\util\utils.py", line 56, in get_val_pair
carray = bcolz.carray(rootdir = os.path.join(path, name), mode = 'r')
File "bcolz/carray_ext.pyx", line 1067, in bcolz.carray_ext.carray.cinit
File "bcolz/carray_ext.pyx", line 1369, in bcolz.carray_ext.carray._read_meta
FileNotFoundError: [Errno 2] No such file or directory: 'D:/face.evoLVe.PyTorch/data/dataV1\lfw\meta\sizes'

You can see about this: https://github.com/TreB1eN/InsightFace_Pytorch#323-prepare-dataset--for-training
This file seems to be downloaded from the evaluation dataset LFW, or use the prepare_data.py in ./backup to generate. I never try about this, because I use my own dataset and rewrite the evaluation code, so I closed line 84 and lines 239 to 255.

@DrewdropLife Hi, do you know where to get the head resume pth file? I can find only the backbone resume. Thank you!

@DrewdropLife Hi, do you know where to get the head resume pth file? I can find only the backbone resume. Thank you!

https://pan.baidu.com/s/1-9sFB3H1mL8bt2jH7EagtA#list/path=%2Fms1m-ir152 pw:b197