Unofficial implementation of arxiv paper "Single-Stage Multi-Person Pose Machines", detail imformation can see this paper or check this csdn link only for reference.
-
custom distribute training is not work well, I trained for 10 epochs and nothing can be learned at all. So if anyone is familiar with this, please help me to check it and make it work.The custom distribute training is right, but I writecheckpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
is different with inspm_model.py
. Because I writecheckpoint = tf.train.Checkpoint(optimizer=optimizer, net=model)
inspm_model.py
, the parameter intf.train.Checkpoint
is different, one isnet=model
and another ismodel=model
. So, if I use checkpoints saved bycheckpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
, it is impossible usingcheckpoint = tf.train.Checkpoint(optimizer=optimizer, net=model)
to restore it. So, we must keep parameters intf.train.Checkpoint
as same as possible. By the way, it's a good way addcheckpoint.restore(ckpt_path).assert_existing_objects_matched()
to find restore error as soon as possible. - using tf.keras to run distribute training
- add coco eval while training
- tensorflow 2.0.0
- python 3.6
- cuda 10
- imgaug == 0.3.0
- pycocotools
we use the first 12 points of ai-challenger format, which can found in this website. Maybe disabled, MSCOCO dataset is ok too, but need to delete five points on head and change its format just like ai-challenger. Note that we still use pycocotools to load data, so if you use ai-challenger, you need to translate its annos file format into coco annos format. here is a convert code just for reference.
In this repo, just use hrnet as for its body network, you can replace this body with any other network as you like. Please check for here: nets/spm_model.py
python3 main.py
All config can be found in config/center_config.py
python3 distribute/custom_train.py
Note that if you have four gpus and its ids is [0, 1, 2, 3], and you want to use gpu id [2, 3] is not work very well for now. You can only use gpu id [0, 1] or [0, 1, 2] will work fine. I didn't know why and wish someone can tell me.
The reason why we set os.environ['CUDA_VISIBLE_DEVICES'] = '2, 3'
but can not use gpu_ids = [2, 3]
is that tensorflow has already make gpu 2/3 on machine re-declear to 0/1. So, if we want to use gpu_ids = [2, 3]
, just write:
os.environ['CUDA_VISIBLE_DEVICES'] = '2, 3'
gpu_ids = [0, 1]
devices = ['/device:GPU:{}'.format(i) for i in gpu_ids]
strategy = tf.distribute.MirroredStrategy(devices=devices)
in using distribute training.
python3 tools/spm_model_test.py
create predicts json file
python3 tools/model_val.py
eval
python3 tools/ai_format_kps_eval.py --ref true_label.json --submit predict.json
detailed information can be found here
In spm_loss
function, you need carefully to set value of two different kinds of losses in order to make them balanced in numerical.
- right_shoulder
- right_elbow
- right_wrist
- left_shoulder
- left_elbow
- left_wrist
- right_hip
- right_knee
- right_ankle
- left_hip
- left_knee
- left_ankle