DanielCoelho112/synfeal

Replicate data augmentation as done by current state of the art

Closed this issue · 42 comments

Discussion of what method would be better to implement this feature:

Current ideas:

First I tried using the same dataset, detect the objects and then place black boxes on them but had many false positives or not detecting some of them.

I then tried to read movement from a dataset with objects and replicate its camera movement. But the black boxes will be random.

Hi @andrefdre ,

just for me to understand better: the goal is to have images augmented with a 3d model in the case of our approach, and then to have this exact same image augmented with a blackbox.

Is that it? I mean, if you do not need the exact same images you could just generate a random black box of variable position and size...

Is that it?

Yes that's it.

During the last meeting, I don't remember who said it, but someone said it would be better to have the black boxes covering the objects. So when arguing that our model is better we could say the dataset has augmentation at the same place, just has more realistic with the 3d model.

So my suggestion is to run a yolo detector on the images looking for persons and chairs, and wherever you detect persons paint the box black.

So my suggestion is to run a yolo detector on the images looking for persons and chairs, and wherever you detect persons paint the box black.

Didn't try with yolo will try it tomorrow. Today I only tested with mediapipe.

Hi @miguelriemoliveira and @andrefdre,

I would say to compare our approach to the state-of-the-art, which I think is this paper: Random Erasing Data Augmentation. https://arxiv.org/pdf/1708.04896.pdf

image

In pytorch you can use: https://pytorch.org/vision/main/generated/torchvision.transforms.RandomErasing.html

I would say to compare our approach to the state-of-the-art, which I think is this paper: Random Erasing Data Augmentation.

So your suggestion is to just do data augmentation randomly (in a dataset without objects) instead of covering the objects?

So your suggestion is to just do data augmentation randomly (in a dataset without objects) instead of covering the objects?

I think the baseline should be the random erasing. Additionally, we can also compare with the method you described, but I have never seen anyone doing that... I doubt that there is a paper we can cite. Nevertheless, we can always say that it is an improvement of the Random Erasing Data Augmentation.

I would say that we could do both: 1- Our method (3D objects), 2- SoA Random erasing at first. Then if everything goes smoothly we could try removing of objects directly in images as an alternative.

Since the training is done based on the RGB images I guess the 3D method vs segmentation and removal of objects in 2D images should yield to similar results (?). The only diference wil be the use of a box instead of the real objects.

Another relevant question in my opinion is what to test in the training set... The random method might work well with random images but how is it going to work if tested with real 3D objects projected on 2D images. Need to think a little bit about it!

Anyway for now I would say: Our methods vs random erasing and then we will see.

I'm currently creating a way of replicating the camera steps of another dataset. With that implemented I will train two models one with our method and another with random erasing.

I agree we can try both the random and the bbox where the objects are.

Since the training is done based on the RGB images I guess the 3D method vs segmentation and removal of objects in 2D images should yield to similar results (?). The only diference wil be the use of a box instead of the real objects.

I hope not, I hope results are much better with the objects in 3D. That is actually the reasoning behind this work, that using the 3D objects instead of rough bboxes will improve the results.

Hi @andrefdre ,

I'm currently creating a way of replicating the camera steps of another dataset. With that implemented I will train two models one with our method and another with random erasing.

not sure I understand this. I mean, for creating a new dataset using random erasing don't you just have to copy the entire dataset with objects, and them go through all images and randomize a box position and size, and put it on the image?

Sorry @andrefdre , now I understand why we need what you are discussing. Because for the random boxes they should be inserted in images without the objects.

If we placed the black boxes on top of the object, we could just create a copy of the dataset. But from my tries, it doesn't seem like I could automate detecting the objects.
At least yolo doesn't have classes for all the objects I have.

These are very strange images ...

I started getting results for lights and our method seems good, but the augmentation as done by the state of the art and both methods together seems very weird.

image

I used this for augmentation as suggested previously:

transforms.ColorJitter(brightness=.5, hue=0)

@DanielCoelho112 and @miguelriemoliveira do you have any theory?

Video with brightness augmentation https://www.youtube.com/watch?v=Y8KI9bcpeD4

Thanks. I think the darkest images are too dark. Can you post some of the darkest images?

Can you post some of the darkest images?

frame-00254 rgb

Too dark, don't you think?

Too dark, don't you think?

I agree will generate a video with 0.3

Darkest image with 0.3:

frame-00115 rgb

A video? You mean a dataset? Also, am I supposed to know what the 0.3 means :- ) ?

The image is still very darl, I think. A bit less perhaps ... @DanielCoelho112 ? You do you say?

A video? You mean a dataset? Also, am I supposed to know what the 0.3 means :- ) ?

0.3 from the documentation means the amount to jitter the brightness. Video with 0.3: https://youtu.be/1r5tloG-RGo

The image is still very darl, I think. A bit less perhaps ... @DanielCoelho112 ? You do you say?

I just have the thought that we are only darkening the image, but don't we also want the opposite? When comparing the dataset with lights, it has brighter images as well as darker images.

Hi @miguelriemoliveira, @andrefdre,

I just have the thought that we are only darkening the image, but don't we also want the opposite? When comparing the dataset with lights, it has brighter images as well as darker images.

We want both cases. Darker and lighter images.

@DanielCoelho112 and @miguelriemoliveira do you have any theory?

Since the model is worse than the baseline, I would follow the suggestion of @miguelriemoliveira and reduce the magnitude of the augmentation applied.

When we apply extreme augmentations, the models usually perform poorly (https://research.unl.pt/ws/portalfiles/portal/44736222/Brightness_as_an_Augmentation_Technique_for_Image_Classification.pdf).

Right. I agree. Also brighten the images and reduce the maximum magnitude of the transformation.

I get dizzy when watching the video ...

Right. I agree. Also brighten the images and reduce the maximum magnitude of the transformation.

I read through the colorJitter documentation again and finally understood how to increase the brightness. So the brightness varies between 0.5 of the original image and 2 of the original image. The video with these settings: https://youtu.be/buwznRVJDWY I also lowered the fps in the video, hope now it looks better.
Should I increase max brightness or lower brightness?

After the current training, I will train with these new settings if @miguelriemoliveira agrees with this configuration.

Right now I continued training the models that don't require augmentation for lights and objects study.

The video looks much better now. In any case its hard to check the appearance of the darkest and brightest images. Can you post them?

Darkened:
frame-00037 rgb

Original:
frame-00037 rgb

Brightned:
frame-00027 rgb

Original:
frame-00027 rgb

Not sure... I think you should compare that image with an image taken from the scene without lights. That way we can see how dark we can get images.

I think you should compare that image with an image taken from the scene without lights

Without lights? This is from a dataset without any lighting, and then data augmentation was used.

Hm, forgot about that... I think we should've applied the data augmentation on a dataset with some lights on to simulate what happens in reality. But it's not critical.

Given this, I agree with @miguelriemoliveira. I think we should reduce the level of darkness applied.

Maybe this is the reason why the results were so bad. We are choosing the darkest images, and then we are darkening them even more. And then in the test set all images are brighter.

These images are from a dataset with lights. I don't know if it's because they are darker, one thing is for sure the results weren't good due to only darkening the images which I now will also brighten them.
Dark:
frame-00059 rgb

Bright:
frame-00025 rgb

Not sure I understood what you said ...

The augmentation I was doing was only darkening images, not brightening them.

Ok, so I think we have a good reason why the training could be going wrong.

But again, my point is that we cannot have images so dark like the one you have above. We should darken and brighten, but not so much as to have the image almost all black or white. In those extreme cases we cannot expect a good localization.

But again, my point is that we cannot have images so dark like the one you have above. We should darken and brighten, but not so much as to have the image almost all black or white. In those extreme cases we cannot expect a good localization.

The last images we have good results, since it's the results for our implementation. I didn't explain it properly, so I'm hopeful it will improve now.

Hi, Sorry fro the silence these days, but off a couple of days and many other duties. I think if not, André can at least open a virtual nightclub in a church with strobe effects :-)! I'm not sure I'm following everything, but the question on the "what is too dark" is not easy... The room might be darker and this can influence the location but this is true for any system. maybe the easiest would be to consider the model as baseline and only add light at least to validate the model. This is an engineering problem I guess: If we dim light, location will be worse (even for an human) but not because of our approach but because camera have less information... At this point the important is to think about a method to validate our method. Just increasing light should do the trick: if it improves, it shows the system is more robust with light changes. Actually the best validation will always be with real data from a real room at different daylights. Not an easy answer!

I don't agree that the problem is because the room is too dark since our method we have good results. The issue here is that when we implement data augmentation in a dataset without any light manipulation, the results are worse than without any augmentation. Also, when we implement augmentation in conjunction to our method, the results also get worse. Moreover, our method has dark images, and still performs well. I was tweaking the augmentation parameters to try to get similar max and min brightness in dataset generated using the simulator and augmented dataset.

When we apply extreme augmentations, the models usually perform poorly (https://research.unl.pt/ws/portalfiles/portal/44736222/Brightness_as_an_Augmentation_Technique_for_Image_Classification.pdf).

I think Daniel here has a point. I just don't know if it's worth to try to tweak the augmentation parameters or just accept the results that the state-of-the-art method gives, not just for the thesis but also already thinking for the paper.

Table with tests, it's currently a bit confusing, but the first five lines use the same parameters as we were doing previously with 0.5 brightness threshold then a line with 0.3, 0.2 and lastly with a min threshold of 0.5 and max threshold of 2. The last one actually brightness the images, while the other only darkens, since I hadn't understood that yet at the time.

image