How to continue to train from scratch?

Question

How to continue to train from scratch?

BruceTangLin opened this issue 3 years ago · 8 comments

Can someone teach me? I just started to learn face recognition.
Issues: I have already trained the model for 171 epochs. And I want to continue to train the model based on the latest model. How do I set the
BACKBONE_RESUME_ROOT = './', # the root to resume training from a saved checkpoint
HEAD_RESUME_ROOT = './', # the root to resume training from a saved checkpoint
in the file of the config.py.

                                                                                                                                                                         Thank you very much

Answer 1 · 2021-06-01T08:08:02.000Z

I really appreciate that someone can teach me

Answer 2 · 2021-06-02T09:02:19.000Z

Fill in the above config file with your model paths.
such as:
------ config.py ------
MODEL_ROOT = '/home/face.evoLVe.PyTorch/output_models',
BACKBONE_RESUME_ROOT = './output_models/backbone_ir50_ms1m_epoch120.pth',
HEAD_RESUME_ROOT = './output_models/head_arcface.pth',

Answer 3 · 2021-07-27T03:48:52.000Z

Hi @BruceTangLin @changxinC ,
I am trying to train, I am not able to figure out the data format required for training, currently my data is inside

D:/face.evoLVe.PyTorch/data/dataV1/
Inside dataV1 directory the data is as follows:
-> id1/
-> 1.jpg
-> ...
-> id2/
-> 1.jpg
-> ...
-> ...
-> ...
-> ...
Data is already aligned, resized to 112 using the align script provided in repo.
When I run train.py, I am getting file not found error, I saw lot of people are facing the same issue, not being able to get the correct data format.

It would help a lot if you can guide how to get the correct dataset format for training. Help would be much appreciated, thank you.

Answer 4 · 2021-07-27T04:04:50.000Z

@sriktrako
If the file cannot be found, your path setting is wrong.

In config.py
DATA_ROOT = '/database/face/CASIA_Face_Recognition/'

In train.py line 121
dataset_train = datasets.ImageFolder(os.path.join(DATA_ROOT, 'train'), train_transform)

My actual dataset path:
/database/face/CASIA_Face_Recognition/train/id1/1.jpg 2.jpg 3.jpg
/database/face/CASIA_Face_Recognition/train/id2/1.jpg 2.jpg 3.jpg

Answer 5 · 2021-07-27T09:26:32.000Z

Hi @changxinC, thanks for replying, appreciate the help.

My DATA_ROOT = 'D:/face.evoLVe.PyTorch/data/dataV1'
Actual data:
D:/face.evoLVe.PyTorch/data/dataV1/Id1/1.jpg 2.jpg .....
D:/face.evoLVe.PyTorch/data/dataV1/Id2/1.jpg 2.jpg .....

How to generate meta, sizes files? Don't have any files other than .jpg's inside dataV1 directory.

Here's the exact output when I run train.py:

Overall Configurations:
{'SEED': 1337, 'DATA_ROOT': 'D:/face.evoLVe.PyTorch/data/dataV1', 'MODEL_ROOT': './model', 'LOG_ROOT': './log', 'BACKBONE_RESUME_ROOT': './model/weights/backbone_ir50_asia.pth', 'HEAD_RESUME_ROOT': './', 'BACKBONE_NAME': 'IR_50', 'HEAD_NAME': 'ArcFace', 'LOSS_NAME': 'Focal', 'INPUT_SIZE': [112, 112], 'RGB_MEAN': [0.5, 0.5, 0.5], 'RGB_STD': [0.5, 0.5, 0.5], 'EMBEDDING_SIZE': 512, 'BATCH_SIZE': 512, 'DROP_LAST': True, 'LR': 0.1, 'NUM_EPOCH': 125, 'WEIGHT_DECAY': 0.0005, 'MOMENTUM': 0.9, 'STAGES': [35, 65, 95], 'DEVICE': device(type='cpu'), 'MULTI_GPU': True, 'GPU_ID': [0, 1], 'PIN_MEMORY': True, 'NUM_WORKERS': 0}

Number of Training Classes: 5749
Traceback (most recent call last):
File "train.py", line 84, in
lfw, cfp_ff, cfp_fp, agedb, calfw, cplfw, vgg2_fp, lfw_issame, cfp_ff_issame, cfp_fp_issame, agedb_issame, calfw_issame, cplfw_issame, vgg2_fp_issame = get_val_data(DATA_ROOT)
File "D:\srikarRnD\dev\face.evoLVe.PyTorch\util\utils.py", line 63, in get_val_data
lfw, lfw_issame = get_val_pair(data_path, 'lfw')
File "D:\srikarRnD\dev\face.evoLVe.PyTorch\util\utils.py", line 56, in get_val_pair
carray = bcolz.carray(rootdir = os.path.join(path, name), mode = 'r')
File "bcolz/carray_ext.pyx", line 1067, in bcolz.carray_ext.carray.cinit
File "bcolz/carray_ext.pyx", line 1369, in bcolz.carray_ext.carray._read_meta
FileNotFoundError: [Errno 2] No such file or directory: 'D:/face.evoLVe.PyTorch/data/dataV1\lfw\meta\sizes'

Answer 6 · 2021-07-27T10:13:13.000Z

@sriktrako
You can see about this: https://github.com/TreB1eN/InsightFace_Pytorch#323-prepare-dataset--for-training
This file seems to be downloaded from the evaluation dataset LFW, or use the prepare_data.py in ./backup to generate. I never try about this, because I use my own dataset and rewrite the evaluation code, so I closed line 84 and lines 239 to 255.
https://github.com/ZhaoJ9014/face.evoLVe.PyTorch/blob/a33a9121198ed354eb6b0d7c214443f09908ccc1/train.py#L84
https://github.com/ZhaoJ9014/face.evoLVe.PyTorch/blob/a33a9121198ed354eb6b0d7c214443f09908ccc1/train.py#L239

Answer 7 · 2022-03-11T08:01:45.000Z

@DrewdropLife Hi, do you know where to get the head resume pth file? I can find only the backbone resume. Thank you!

Answer 8 · 2022-03-11T11:02:22.000Z

@DrewdropLife Hi, do you know where to get the head resume pth file? I can find only the backbone resume. Thank you!

https://pan.baidu.com/s/1-9sFB3H1mL8bt2jH7EagtA#list/path=%2Fms1m-ir152 pw:b197