Thank you for your work,can you give your pretrained model for training?I can not train and get an error.
Closed this issue · 21 comments
Hi, thanks for your interest. We did not save the optim information to model_coco.pth, because it would have made the file more larger.
If you have replaced the storage address and other information of the coco dataset with your own dataset's, you can choose to train from scratch or load the trained weights from the coco dataset and fine-tune the model on your dataset.
- Train the model from scratch.
python train_net.py coco --bs ${batch_size}
- Load the model_coco.pth and fine-tune on your dataset.
python train_net.py coco --bs ${batch_size} --checkpoint ${path_to_checkpoint} --type finetune
If the categories of your dataset is different from coco dataset, you should change line 40 of train_net.py frombegin_epoch = load_network(network, model_dir=args.checkpoint)
tobegin_epoch = load_network(network, model_dir=args.checkpoint, strict=False)
.
If you encounter any other problems in the future, please feel free to continue asking questions. @cuiwenting87
I don't know what caused this. Can you provide some details of the error? It would be nice to have some screenshots like the one above.
When you load the trained weights from the model_coco.pth, what's going wrong,can you give some error reporting information (If --type finetune
is added then it shouldn't report an error) ? And can you provide me with some training logs to help me analyze the AP is 0 when training from scratch.
I can't see anything below “Those are the error when I load the trained weights from the model_coco.pth.”. @cuiwenting87
You can visit https://github.com/zhang-tao-whu/e2ec/issues/3
and upload some screenshots.
The ct_loss is not working properly, but the other losses look fine. E2EC's ct_loss is calculated in exactly the same way as centernet. I have some suggestions for you to try:
- Increase batch size, preferably over 8.
- Check your config file. For COCO dataset, images are resized as (512, 512) for training and with original size for testing. If the size of your images differs significantly from (512, 512), please change the
${scale}
and${input_h, input_w}
. Also make sure that your dataset categories are consistent with the${ct_hm}
. - Don't join dml at the start, please keep ${start_epoch} at 10.
- Continue to iterate to see if ct_loss can be reduced to below 10.
@cuiwenting87 Hi, I might know what went wrong. You may have used matrix coordinates instead of image coordinates when creating your dataset. The x of a matrix coordinate is y in image coordinates. You can use pycocotools.showAnns()
to check the annotations. It would also be helpful if you could show me an image of your dataset and the corresponding annotations to help me analyse the problem better.
My email is zhang_tao@whu.edu.cn
. You can upload the data to BaiDuYun and send me the link. Or you can send your contact details to my email and transfer the data online.
Incorrect training resolution and test resolution settings caused this problem, which has now been fixed.
Hello, I meet the same problem. AP is always zero when I train it on my dataset.
python train_net.py coco_finetune --bs 12 --type finetune --checkpoint data/model/model_coco.pth
This is my config/_finetune.py
:
from .base import commen, data, model, train, test
import numpy as np
data.mean = np.array([0.44726229, 0.43802511, 0.27905645],
dtype=np.float32).reshape(1, 1, 3)
std = np.array([0.22784984, 0.21254292, 0.16168552],
dtype=np.float32).reshape(1, 1, 3)
scale = np.array([640, 480])
input_w, input_h = (640, 480)
model.heads['ct_hm'] = 1
train.optimizer = {'name': 'sgd', 'lr': 1e-4, 'weight_decay': 1e-4,
'milestones': [150, ], 'gamma': 0.1}
train.batch_size = 12
train.epoch = 160
train.dataset = 'coco_train'
test.dataset = 'coco_val'
class config(object):
commen = commen
data = data
model = model
train = train
test = test
Perhaps you should use the following command:
python train_net.py _finetune --bs 12 --type finetune --checkpoint data/model/model_coco.pth
Perhaps you should use the following command:
python train_net.py _finetune --bs 12 --type finetune --checkpoint data/model/model_coco.pth
Sorry, it was my negligence. My config file's name is coco_finetune.py
indeed.
Is it possible that the AP is 0 due to too few epoch?
No, there must be a mistake somewhere. The reason for the AP being 0 is that the detection branch does not detect any instances. I don't know what caused the above phenomenon. If you can, it would be good to give me some information about your dataset, such as image resolution, etc., and it would be good if you could show a demo.
No, there must be a mistake somewhere. The reason for the AP being 0 is that the detection branch does not detect any instances. I don't know what caused the above phenomenon. If you can, it would be good to give me some information about your dataset, such as image resolution, etc., and it would be good if you could show a demo.
Thanks for your reply.
There are two image resolutions(640x480 and 1280x960) in my dataset.
There are many instances in one image. The number ranges from ten to hundreds.
{'115.jpg': 62, '142.jpg': 14, '154.jpg': 54, '103.jpg': 100, '178.jpg': 16, '206.jpg': 14, '98.jpg': 138, '77.jpg': 179, '139.jpg': 17, '61.jpg': 246, '181.jpg': 18, '36.jpg': 159, '119.jpg': 52, '41.jpg': 122, '230.jpg': 13, '16.jpg': 60, '57.jpg': 157, '226.jpg': 3, '174.jpg': 8, '94.jpg': 95, '123.jpg': 71, '82.jpg': 89, '6.jpg': 129, '135.jpg': 34, '162.jpg': 44, '163.jpg': 17, '7.jpg': 230, '83.jpg': 83, '95.jpg': 78, '122.jpg': 76, '175.jpg': 13, '56.jpg': 143, '159.jpg': 62, '17.jpg': 149, '231.jpg': 19, '40.jpg': 167, '180.jpg': 16, '37.jpg': 148, '138.jpg': 53, '211.jpg': 13, '207.jpg': 14, '196.jpg': 10, '179.jpg': 5, '21.jpg': 228, '102.jpg': 103, '155.jpg': 58, '143.jpg': 27, '114.jpg': 79, '47.jpg': 175, '10.jpg': 171, '109.jpg': 79, '51.jpg': 163, '220.jpg': 40, '172.jpg': 13, '125.jpg': 57, '0.jpg': 138, '133.jpg': 52, '164.jpg': 9, '113.jpg': 64, '152.jpg': 46, '105.jpg': 99, '191.jpg': 9, '26.jpg': 392, '200.jpg': 15, '129.jpg': 48, '71.jpg': 185, '216.jpg': 16, '88.jpg': 75, '67.jpg': 153, '168.jpg': 11, '30.jpg': 170, '169.jpg': 27, '31.jpg': 154, '89.jpg': 102, '66.jpg': 94, '70.jpg': 184, '27.jpg': 263, '104.jpg': 39, '153.jpg': 89, '145.jpg': 21, '112.jpg': 49, '1.jpg': 404, '85.jpg': 102, '93.jpg': 66, '124.jpg': 49, '173.jpg': 24, '108.jpg': 130, '50.jpg': 298, '149.jpg': 19, '11.jpg': 109, '46.jpg': 193, '166.jpg': 6, '189.jpg': 23, '218.jpg': 20, '69.jpg': 164, '131.jpg': 40, '2.jpg': 135, '127.jpg': 28, '28.jpg': 219, '170.jpg': 12, '222.jpg': 11, '53.jpg': 130, '45.jpg': 154, '32.jpg': 218, '185.jpg': 11, '65.jpg': 132, '214.jpg': 14, '73.jpg': 227, '202.jpg': 24, '24.jpg': 189, '150.jpg': 74, '146.jpg': 18, '49.jpg': 326, '111.jpg': 106, '48.jpg': 171, '110.jpg': 56, '147.jpg': 24, '151.jpg': 66, '106.jpg': 72, '25.jpg': 45, '192.jpg': 10, '72.jpg': 230, '64.jpg': 144, '33.jpg': 267, '184.jpg': 8, '44.jpg': 204, '13.jpg': 91, '52.jpg': 184, '223.jpg': 8, '29.jpg': 186, '171.jpg': 16, '126.jpg': 55, '91.jpg': 93, '68.jpg': 163, '130.jpg': 32, '87.jpg': 76, '3.jpg': 323, '219.jpg': 4, '167.jpg': 13, '34.jpg': 96, '183.jpg': 7, '63.jpg': 214, '8.jpg': 303, '212.jpg': 16, '75.jpg': 89, '204.jpg': 27, '59.jpg': 107, '101.jpg': 95, '228.jpg': 16, '156.jpg': 61, '117.jpg': 71, '38.jpg': 263, '160.jpg': 54, '137.jpg': 77, '4.jpg': 238, '208.jpg': 13, '79.jpg': 114, '121.jpg': 77, '96.jpg': 116, '176.jpg': 10, '199.jpg': 30, '55.jpg': 136, '232.jpg': 27, '43.jpg': 121, '15.jpg': 63, '225.jpg': 20, '177.jpg': 10, '198.jpg': 8, '120.jpg': 35, '136.jpg': 47, '5.jpg': 162, '39.jpg': 198, '161.jpg': 42, '19.jpg': 179, '141.jpg': 15, '229.jpg': 21, '58.jpg': 97, '100.jpg': 59, '23.jpg': 243, '194.jpg': 7, '205.jpg': 20, '213.jpg': 17, '62.jpg': 251, '9.jpg': 183, '182.jpg': 15}