Different results from train.py and val.py

Question

Different results from train.py and val.py

xibici opened this issue a month ago · 1 comments

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

after the train.py is over, I got the valiadating result:

Validating runs/train/exp212/weights/best.pt...
Fusing layers...
summary: 360 layers, 4124308 parameters, 0 gradients, 10.9 GFLOPs
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 7/7 [00:12<00:00, 1.80s/it]
all 1629 5983 0.853 0.765 0.844 0.386

but when i run val.py alone and i got:

summary: 360 layers, 4124308 parameters, 0 gradients, 10.9 GFLOPs
val: Scanning /home/xjh/Pro/PythonPro/datasets/ContactHands/labels/val.cache... 1629 images, 0 backgrounds, 0 corrupt: 100%|██████████| 1629/1629 [00:00<?, ?it/s]
Class Images Instances P R mAP50 mAP50-95: 100%|██████████| 7/7 [00:38<00:00, 5.48s/it]
all 1629 5983 0.849 0.773 0.844 0.386
Speed: 0.1ms pre-process, 1.1ms inference, 0.8ms NMS per image at shape (256, 3, 640, 640)

why the the result of map is the same,but P and R is different?how can I get the original result from best.pt.

P_train.py 0.853
P_val.py 0.849

R_train.py 0.765
R_val.py 0.773

Additional

No response

Answer 1 · 2024-04-26T11:29:55.000Z

Hey there! 👋

It looks like you're noticing slight differences in Precision (P) and Recall (R) between your training (via train.py) and validation (via val.py) processes, although mAP scores remain consistent.

This variation is quite normal and can be attributed to a few factors, especially considering that P and R are more sensitive to the specifics of the dataset and detection thresholding than mAP scores. A slight change in model's confidence on certain samples can influence P and R without significantly affecting the mAP. It's also important to remember that during training, data augmentation and other regularization techniques might slightly alter the model's behavior on your training data compared to the validation phase, where your data is not augmented.

To ensure you're getting consistent evaluations, make sure to use the same dataset, model weights (like your best.pt), and evaluation settings (e.g., confidence thresholds, NMS settings) in both training and validation scripts.

If the slight variation in P and R still persists, this is generally not a cause for concern if your mAP scores are stable. However, ensuring model consistency across every metric is key to understanding your model's performance fully.

Hope this helps clear things up! Keep exploring with YOLOv5, and don't hesitate to reach out if you have more questions. Happy detecting! 🚀