/thesis-normals-estimation

Engineering Thesis - Surface Normals Estimation using Transfer Learning on Synthetic Dataset

Primary LanguageJupyter NotebookMIT LicenseMIT

Thesis Normals Estimation

Engineering Thesis - Surface Normals Estimation using U-Net CNN

Table of contents


Quick start

Nice virtualenv tutorial here

pip3 install --upgrade pip
which python3.7
mkvirtualenv -p <path to python3> <name>
workon <name>
pip install -r requirements.txt

General info

The goal of this thesis is to show that one can transfer the model trained purely on synthetic data into the real world. I will conduct a qualitative test showing that the robot is able to pick an item using the presented network.

Network was based on U-Net model. Model 2 - prod

The models were trained on two datasets. Each dataset contained RGB and depth images of the items inside the containers. Both datasets had 10000 images. 1000 of those images in each dataset were taken to a validation set. The first dataset was quite simple, mostly boxes of different sizes aligned flat in the container. In the second dataset, items types were varying, and their positions were diversified. Dataset

Normal vectors are created straight from depth. Normals1


Results

Three main models were trained.

  • Model 1 - using only Synthetic Dataset 1
  • Model 2 - using both Synthetic Dataset 1 and Synthetic Dataset 2
  • Model 3 - same as Model 2, but normals were generated using a slight modification of the algorithm

Model 1 results: Model 2 - prod Model 2 - lab

Model 2 results: Model 2 - prod Model 2 - lab

Model 3 results: Model 3 - prod Model 3 - lab

Comparison against a Realsense camera

The normal vector from my model at the grasp point was [0.157 0.514 0.282], which gives the following Euler ZYX rotation in degrees (-8.9, 15.0, -61.2). But when I took the average value of the normal vector in the radius of 15 pixels from the grasp point, I got vector [0.1 0.21 0.54]. It is equal to Euler ZYX rotation: (-1.8780727547595142, 9.6, -21.9). Normals generated from the Realsense are much worst in this case. At the grasp point, the vector was [-0.7 -0.7 0.14], giving (-37, -44.4, 78.6) Euler ZYX rotation. Taking the average in the radius of 15 pixels gives a vector [-0.27 -0.66 0.31], which equals rotation by (-13.09, -20.44, 64.96) degrees in Euler ZYX. It is worth mentioning that the robot would fail to pick an item with rotation above ~45-50 degrees as it would have problems with picking the item or hitting the tote. Model 3 - lab


Robot picking the item

Robot picking 1

Robot picking 2


Notebooks

  1. Data analysis here.
  2. Unet demo here.
  3. Dataset augmentation here.
  4. Depth to normals conversion here.
  5. Normals from the model vs normals from depth from Realsense camera here.
  6. Comparison with MiDas model here.