sowson/darknet

Yolov4-tiny not showing detections

Grench6 opened this issue · 19 comments

The window of the picture is showing, the image is there, but I can not see any detections... I use the following command:

user@user-pc:~/darknet$ ./darknet detector test cfg/coco.data cfg/yolov4-tiny.cfg weights/yolov4-tiny.weights data/dog.jpg 
Device IDs: 1
Device ID: 0
Device name: Ellesmere
Device vendor: Advanced Micro Devices, Inc.
Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
Device opencl used: 3180.7
Device double precision: YES
Device max group size: 256
Device address bits: 64
layer     filters    size              input                output
    0 conv     32  3 x 3 / 2   416 x 416 x   3   ->   208 x 208 x  32  0.075 BFLOPs
    1 conv     64  3 x 3 / 2   208 x 208 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    2 conv     64  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x  64  0.797 BFLOPs
    3 route  2
Unused field: 'groups = 2'
Unused field: 'group_id = 1'
    4 conv     32  3 x 3 / 1   104 x 104 x  64   ->   104 x 104 x  32  0.399 BFLOPs
    5 conv     32  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  32  0.199 BFLOPs
    6 route  5 4
    7 conv     64  1 x 1 / 1   104 x 104 x  64   ->   104 x 104 x  64  0.089 BFLOPs
    8 route  2 7
    9 max          2 x 2 / 2   104 x 104 x 128   ->    52 x  52 x 128
   10 conv    128  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x 128  0.797 BFLOPs
   11 route  10
Unused field: 'groups = 2'
Unused field: 'group_id = 1'
   12 conv     64  3 x 3 / 1    52 x  52 x 128   ->    52 x  52 x  64  0.399 BFLOPs
   13 conv     64  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x  64  0.199 BFLOPs
   14 route  13 12
   15 conv    128  1 x 1 / 1    52 x  52 x 128   ->    52 x  52 x 128  0.089 BFLOPs
   16 route  10 15
   17 max          2 x 2 / 2    52 x  52 x 256   ->    26 x  26 x 256
   18 conv    256  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 256  0.797 BFLOPs
   19 route  18
Unused field: 'groups = 2'
Unused field: 'group_id = 1'
   20 conv    128  3 x 3 / 1    26 x  26 x 256   ->    26 x  26 x 128  0.399 BFLOPs
   21 conv    128  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 128  0.199 BFLOPs
   22 route  21 20
   23 conv    256  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 256  0.089 BFLOPs
   24 route  18 23
   25 max          2 x 2 / 2    26 x  26 x 512   ->    13 x  13 x 512
   26 conv    512  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x 512  0.797 BFLOPs
   27 conv    256  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 256  0.044 BFLOPs
   28 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   29 conv    255  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x 255  0.044 BFLOPs
   30 yolo4
[yolo4] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
   31 route  27
   32 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   33 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   34 route  33 23
   35 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   36 conv    255  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x 255  0.088 BFLOPs
   37 yolo4
[yolo4] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Loading weights from weights/yolov4-tiny.weights...Done!
data/dog.jpg: Predicted in 0.393254 seconds.
user@user-pc:~/darknet$

Yolo3, yolo3-tiny and yolo4 are working as expected. Is this because yolo4-tiny is not supported?

I re-port from YOLO4 repo route layer one more time (it indicates in your output not used variables) but it still not detecting objects... I will commit it soon... maybe the threshold is too high?

Lowering the threshold has no effect

Maybe you should try to train this model on your own? Thx!

Ok, I will try that. I will update results as soon as I have them.

I still cant train yolo4-tiny, but before posting the issue I was able to train yolo3 and yolo3-tiny and now I can not train any of those...
Here is the output

user@user-pc:~/darknet2$ ./darknet detector train data/obj.data yolo-obj.cfg yolov3-tiny.conv.11
Device IDs: 1
Device ID: 0
Device name: Ellesmere
Device vendor: Advanced Micro Devices, Inc.
Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
Device opencl used: 3180.7
Device double precision: YES
Device max group size: 256
Device address bits: 64
yolo-obj
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   15 conv     21  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x  21  0.004 BFLOPs
   16 yolo
   17 route  13   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   20 route  19 8   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   22 conv     21  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x  21  0.007 BFLOPs
   23 yolo
Loading weights from yolov3-tiny.conv.11...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Saving weights to backup/yolo-obj.start.conv.weights
Resizing
384
Segmentation fault (core dumped)
user@user-pc:~/darknet2$

I really have no idea what is wrong, I used the exact same files, I even created them again from zero, but it is still not working... I ran out of ideas here, training yolo3-tiny was working a few days ago...

I followed all the instructions of AlexeyAB to train, multiple times, in different ways.

  • Images where generated using yolo-mark, and they worked before, so I doubt there is the problem.
  • I downloaded the initial weights for yolo3-tiny from here
  • yolo-obj.cfg:
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 6000
policy=steps
steps=4800,5400
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=21
activation=linear



[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=2
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=21
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=2
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

No matters what I change, the result is the same

I still cant train yolo4-tiny, but before posting the issue I was able to train yolo3 and yolo3-tiny and now I can not train any of those...
Here is the output

user@user-pc:~/darknet2$ ./darknet detector train data/obj.data yolo-obj.cfg yolov3-tiny.conv.11
Device IDs: 1
Device ID: 0
Device name: Ellesmere
Device vendor: Advanced Micro Devices, Inc.
Device opencl availability: OpenCL 1.2 AMD-APP (3180.7)
Device opencl used: 3180.7
Device double precision: YES
Device max group size: 256
Device address bits: 64
yolo-obj
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   15 conv     21  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x  21  0.004 BFLOPs
   16 yolo
   17 route  13   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   20 route  19 8   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   22 conv     21  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x  21  0.007 BFLOPs
   23 yolo
Loading weights from yolov3-tiny.conv.11...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Saving weights to backup/yolo-obj.start.conv.weights
Resizing
384
Segmentation fault (core dumped)
user@user-pc:~/darknet2$

I really have no idea what is wrong, I used the exact same files, I even created them again from zero, but it is still not working... I ran out of ideas here, training yolo3-tiny was working a few days ago...

Should I use a specific branch or version? Is the master branch safe to clone? Does the images used for trainning need to be of specific size (pixelxpixel)? Is there a limit? Do I need a different procedure to train this repo? Those are other questions I have too.

@Grench6 code is fine, compilation too, your GPU needs rest, turn off your PC, unplug the power cord and give it rest about 1-2 hour and everything will be fine again :D. I often have a similar issue after many tries and OpenCL inint without deinint..., I checked and on my computer, all the mentioned training work just fine. On your end, you have garbage in VRAM that has to be cleaned up. Hope that helps.

@Grench6 btw, gdb is your friend if you build with -g flag or DEBUG=1 then you may after gdb command put your training command and see where is the breakpoint fails... if it will be in opencl.c hight probably my last comment is relevant :).

@Grench6 there was an error with OpenCL resources free in the Route layer... I have just fixed and committed it. Thx!

Sorry for late reply.

Detection is still not showing a thing

Screenshot from 2020-11-24 17-42-08

And with training... well, at least now I dont get the segmentation fault error, but now there is something else wrong.
Training is not working at all, I get the following output: out.pdf
avg is Nan... and it doesnt change no matter the iterations I let it run.

@Grench6 can you pls try to remove yolov4-tiny.conv.29 from train command. Thx!

Still the same with Nan: out.pdf
Here is the config file if that is useful: yolov4-tiny-custom.txt
I suppose data set and everything else is in good conditions, because yolov3-tiny can be trained successfully with it.

I will look into it soon, for now, I am training other models, the answer is probably in the model, I have to compare it with yolo4 and look for any additional layer or activate function I may not have in the engine, sorry for inconvenient situation with it.

Ok, no problem man. I will wait for any update.

are there some good guys sharing the data/names.list , thx
i'm newbee

/darknet detector test cfg/yolov3.cfg weights/yolov3.weights data/dog.jpg ./data/coco.names
Device IDs: 2
Device ID: 0
Device name: Intel(R) HD Graphics 630
Device vendor: Intel Inc.
Device opencl availability: OpenCL 1.2
Device opencl used: 1.2(Apr 13 2021 00:47:18)
Device double precision: NO
Device max group size: 256
Device address bits: 64
names: Using default 'data/names.list'
Couldn't open file: data/names.list

@aiXia121 That has nothing to do with this issue, but what you are looking for is in this link:

https://github.com/pjreddie/darknet/blob/master/data/coco.names.

Download that file, place it where it belongs and rename it. Next time open a new issue.

@Grench6 you may check now :-).
predictions

Thank you! Right now I don't have my graphics card, but I will test it as soon as I have it. 👍🏾