Yolov4-tiny not showing detections
Grench6 opened this issue · 19 comments
The window of the picture is showing, the image is there, but I can not see any detections... I use the following command:
user@user-pc:~/darknet$ ./darknet detector test cfg/coco.data cfg/yolov4-tiny.cfg weights/yolov4-tiny.weights data/dog.jpg
Device IDs: 1
Device ID: 0
Device name: Ellesmere
Device vendor: Advanced Micro Devices, Inc.
Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
Device opencl used: 3180.7
Device double precision: YES
Device max group size: 256
Device address bits: 64
layer filters size input output
0 conv 32 3 x 3 / 2 416 x 416 x 3 -> 208 x 208 x 32 0.075 BFLOPs
1 conv 64 3 x 3 / 2 208 x 208 x 32 -> 104 x 104 x 64 0.399 BFLOPs
2 conv 64 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 64 0.797 BFLOPs
3 route 2
Unused field: 'groups = 2'
Unused field: 'group_id = 1'
4 conv 32 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 32 0.399 BFLOPs
5 conv 32 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 32 0.199 BFLOPs
6 route 5 4
7 conv 64 1 x 1 / 1 104 x 104 x 64 -> 104 x 104 x 64 0.089 BFLOPs
8 route 2 7
9 max 2 x 2 / 2 104 x 104 x 128 -> 52 x 52 x 128
10 conv 128 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 128 0.797 BFLOPs
11 route 10
Unused field: 'groups = 2'
Unused field: 'group_id = 1'
12 conv 64 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 64 0.399 BFLOPs
13 conv 64 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 64 0.199 BFLOPs
14 route 13 12
15 conv 128 1 x 1 / 1 52 x 52 x 128 -> 52 x 52 x 128 0.089 BFLOPs
16 route 10 15
17 max 2 x 2 / 2 52 x 52 x 256 -> 26 x 26 x 256
18 conv 256 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 256 0.797 BFLOPs
19 route 18
Unused field: 'groups = 2'
Unused field: 'group_id = 1'
20 conv 128 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 128 0.399 BFLOPs
21 conv 128 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 128 0.199 BFLOPs
22 route 21 20
23 conv 256 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 256 0.089 BFLOPs
24 route 18 23
25 max 2 x 2 / 2 26 x 26 x 512 -> 13 x 13 x 512
26 conv 512 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x 512 0.797 BFLOPs
27 conv 256 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BFLOPs
28 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
29 conv 255 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 255 0.044 BFLOPs
30 yolo4
[yolo4] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
31 route 27
32 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs
33 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128
34 route 33 23
35 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs
36 conv 255 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 255 0.088 BFLOPs
37 yolo4
[yolo4] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Loading weights from weights/yolov4-tiny.weights...Done!
data/dog.jpg: Predicted in 0.393254 seconds.
user@user-pc:~/darknet$
Yolo3, yolo3-tiny and yolo4 are working as expected. Is this because yolo4-tiny is not supported?
I re-port from YOLO4 repo route layer one more time (it indicates in your output not used variables) but it still not detecting objects... I will commit it soon... maybe the threshold is too high?
Lowering the threshold has no effect
Maybe you should try to train this model on your own? Thx!
Ok, I will try that. I will update results as soon as I have them.
I still cant train yolo4-tiny, but before posting the issue I was able to train yolo3 and yolo3-tiny and now I can not train any of those...
Here is the output
user@user-pc:~/darknet2$ ./darknet detector train data/obj.data yolo-obj.cfg yolov3-tiny.conv.11
Device IDs: 1
Device ID: 0
Device name: Ellesmere
Device vendor: Advanced Micro Devices, Inc.
Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
Device opencl used: 3180.7
Device double precision: YES
Device max group size: 256
Device address bits: 64
yolo-obj
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs
13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs
14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs
15 conv 21 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 21 0.004 BFLOPs
16 yolo
17 route 13 18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs
19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128
20 route 19 8 21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs
22 conv 21 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 21 0.007 BFLOPs
23 yolo
Loading weights from yolov3-tiny.conv.11...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Saving weights to backup/yolo-obj.start.conv.weights
Resizing
384
Segmentation fault (core dumped)
user@user-pc:~/darknet2$
I really have no idea what is wrong, I used the exact same files, I even created them again from zero, but it is still not working... I ran out of ideas here, training yolo3-tiny was working a few days ago...
I followed all the instructions of AlexeyAB to train, multiple times, in different ways.
- Images where generated using yolo-mark, and they worked before, so I doubt there is the problem.
- I downloaded the initial weights for yolo3-tiny from here
- yolo-obj.cfg:
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=1000
max_batches = 6000
policy=steps
steps=4800,5400
scales=.1,.1
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=1
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
###########
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=21
activation=linear
[yolo]
mask = 3,4,5
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=2
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 8
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=21
activation=linear
[yolo]
mask = 0,1,2
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=2
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
No matters what I change, the result is the same
I still cant train yolo4-tiny, but before posting the issue I was able to train yolo3 and yolo3-tiny and now I can not train any of those...
Here is the outputuser@user-pc:~/darknet2$ ./darknet detector train data/obj.data yolo-obj.cfg yolov3-tiny.conv.11 Device IDs: 1 Device ID: 0 Device name: Ellesmere Device vendor: Advanced Micro Devices, Inc. Device opencl availability: OpenCL 1.2 AMD-APP (3180.7) Device opencl used: 3180.7 Device double precision: YES Device max group size: 256 Device address bits: 64 yolo-obj layer filters size input output 0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 0.150 BFLOPs 1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16 2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 0.399 BFLOPs 3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32 4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 0.399 BFLOPs 5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64 6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 0.399 BFLOPs 7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128 8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 0.399 BFLOPs 9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256 10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs 11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512 12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BFLOPs 13 conv 256 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 256 0.089 BFLOPs 14 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 0.399 BFLOPs 15 conv 21 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 21 0.004 BFLOPs 16 yolo 17 route 13 18 conv 128 1 x 1 / 1 13 x 13 x 256 -> 13 x 13 x 128 0.011 BFLOPs 19 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128 20 route 19 8 21 conv 256 3 x 3 / 1 26 x 26 x 384 -> 26 x 26 x 256 1.196 BFLOPs 22 conv 21 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 21 0.007 BFLOPs 23 yolo Loading weights from yolov3-tiny.conv.11...Done! Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005 Saving weights to backup/yolo-obj.start.conv.weights Resizing 384 Segmentation fault (core dumped) user@user-pc:~/darknet2$
I really have no idea what is wrong, I used the exact same files, I even created them again from zero, but it is still not working... I ran out of ideas here, training yolo3-tiny was working a few days ago...
Should I use a specific branch or version? Is the master branch safe to clone? Does the images used for trainning need to be of specific size (pixelxpixel)? Is there a limit? Do I need a different procedure to train this repo? Those are other questions I have too.
@Grench6 code is fine, compilation too, your GPU needs rest, turn off your PC, unplug the power cord and give it rest about 1-2 hour and everything will be fine again :D. I often have a similar issue after many tries and OpenCL inint without deinint..., I checked and on my computer, all the mentioned training work just fine. On your end, you have garbage in VRAM that has to be cleaned up. Hope that helps.
@Grench6 btw, gdb is your friend if you build with -g flag or DEBUG=1 then you may after gdb command put your training command and see where is the breakpoint fails... if it will be in opencl.c hight probably my last comment is relevant :).
@Grench6 there was an error with OpenCL resources free in the Route layer... I have just fixed and committed it. Thx!
Sorry for late reply.
Detection is still not showing a thing
And with training... well, at least now I dont get the segmentation fault error, but now there is something else wrong.
Training is not working at all, I get the following output: out.pdf
avg is Nan... and it doesnt change no matter the iterations I let it run.
Still the same with Nan: out.pdf
Here is the config file if that is useful: yolov4-tiny-custom.txt
I suppose data set and everything else is in good conditions, because yolov3-tiny can be trained successfully with it.
I will look into it soon, for now, I am training other models, the answer is probably in the model, I have to compare it with yolo4 and look for any additional layer or activate function I may not have in the engine, sorry for inconvenient situation with it.
Ok, no problem man. I will wait for any update.
are there some good guys sharing the data/names.list , thx
i'm newbee
/darknet detector test cfg/yolov3.cfg weights/yolov3.weights data/dog.jpg ./data/coco.names
Device IDs: 2
Device ID: 0
Device name: Intel(R) HD Graphics 630
Device vendor: Intel Inc.
Device opencl availability: OpenCL 1.2
Device opencl used: 1.2(Apr 13 2021 00:47:18)
Device double precision: NO
Device max group size: 256
Device address bits: 64
names: Using default 'data/names.list'
Couldn't open file: data/names.list
@aiXia121 That has nothing to do with this issue, but what you are looking for is in this link:
https://github.com/pjreddie/darknet/blob/master/data/coco.names.
Download that file, place it where it belongs and rename it. Next time open a new issue.
Thank you! Right now I don't have my graphics card, but I will test it as soon as I have it. 👍🏾