when i train the dataset CUHK-SYSU,the result is not the same as the result you provide

Question

when i train the dataset CUHK-SYSU,the result is not the same as the result you provide

Closed this issue 3 years ago · 11 comments

My set is single GPU 2080ti ,i set the batchsize 1,its result is mAP=85.43%,top-1=86.72%.Do you know why?

Answer 1 · 2021-04-11T14:06:29.000Z

You can get the same result as paper when you set batchsize=5, which requires about 28GB memory.
You can try batchsize=2, which will achieve better results than batchsize=1.
In the future, distributed training will be supported to solve the problem of the lack of GPU memory.

Answer 2 · 2021-04-11T14:14:31.000Z

One more thing, batchsize=1 should match learning rate=0.0006.

Answer 3 · 2021-04-11T14:32:24.000Z

Thank you very much

Answer 4 · 2021-04-13T13:28:07.000Z

I have changed the lr=0.0006 when batchsize=1 ,its result get worse,
all detection:
recall = 82.22%
ap = 80.70%
search ranking:
mAP = 85.20%
top- 1 = 86.24%
top- 5 = 94.17%
top-10 = 95.45%
Maybe i should train more epoches?

Answer 5 · 2021-04-13T14:32:18.000Z

BN is unstable when batchsize=1.
You can freeze BN or try a larger batchsize.

Answer 6 · 2021-04-25T06:21:25.000Z

The result is so great, I achieved almost the same result as the original paper showed (and even better) by training only two times:

Best-Model	CUHK-SYSU(epoch-18)	PRW(epoch-17)
vanilla	94.13%-94.72%	46.65%-83.91%
CBGM	94.80%-95.31%	47.42%-86.92%

Thanks a lot for sharing this solid codebase.

Answer 7 · 2023-01-23T14:35:54.000Z

I got similar result as @Yx1322441675 mentioned. Here I wonder whether you still remember how to achieve the same result as the original paper showed? @Yx1322441675 @ZhengPeng7

Answer 8 · 2023-01-23T14:41:22.000Z

He set batch size as 1. I didn't change any settings in the project. You may check that you kept the original batch size and image size, and use a close PyTorch version (I used 1.8.1, but I think all versions < 1.11 are okay) to use this project again.

Answer 9 · 2023-01-23T14:47:28.000Z

Thanks, I used bs=4, lr=0.0024 without any other configs changed and my PyTorch version is 1.11. Maybe I should reduce it and have another try.

Answer 10 · 2023-01-23T14:51:50.000Z

You are welcome. I mentioned the PyTorch version since my friend met some nan problems using the too latest version (see this issue). And batch size also plays a role here to get exactly same results.

Answer 11 · 2023-01-30T11:29:09.000Z

@Wuzimeng Could you share your train log?