Repository for the paper Finding Your (3D) Center: 3D object detection using a learned loss. We provide an implementation for the single class approach reported in the paper for simplicity and clarity. We choose to do this as it offers better focus on our primary contribution, a new approach to model training. A multi-class pipeline will be provided in a separate repository which will also contain multiple improvements. In general this repository can be adapted with the addition of only a few extra files which can be supplied if required.
The repository is built and tested with the following setup:
python == 3.6.8
cuda == 10.0
In addition there are a number python dependencies which can be installed through the requirements.txt
. The most straight forward way to install these is through a virutal environment:
virtualenv -p /usr/bin/python3 env
source env/bin/activate
pip install -r requirements.txt
The PointNet++ backbone requires the compilation of 3 custom ops. A shell script for compiling these ops is located at tf_ops/compile_ops.sh
. Ensure CUDA_ROOT
and EIGEN_LIB
point to your local installations.
The ops can then be comiled from the root directory by running:
chmod u+x ./tf_ops/compile_ops.sh
sh ./tf_ops/compile_ops.sh
We provide preprocessing code for both the Stanford 3D-S dataset and ScanNet. First create a new folder in the root directory called data (./data
).
Download the S3DIS dataset and place the downloaded folder into ./data
.
The loss network training data consists of small patches with a random but known offset. The script for generating training data is found in datasets/s3dis/s3dis_patches.py
. The training data generation can be customised using the config
dictionary at the bottom of the file. We leave the deault values to those used in the paper. To change the percentage of training data used change the train_ratio
variable, where 1. and .05 would be 100% and 5% of available training data respectively. n_crops_per_object
and n_rotations_per_object
are data augmentation settings. Setting these high will give more varied data samples from the available training data. n_negative_samples
is the number of data samples with where the object is not present. We find best results are achieved when this is balanced with n_crops_per_object
.
Once the config
is correctly configured, you can generate the loss network train .tfrecord
files by running:
python datasets/s3dis/s3dis_patches.py
The dataset must first be processed for scene level supervision. First ensure the config file is set correctly and run:
python datasets/s3dis/process_scenes.py
The config
dictionary should be configured to point to the dataset root directory and the processed data directory. box_size
and overlap
are added together to get total scene size (i.e. box_size: (1.5, 1.5)
and overlap: (1.5, 1.5)
would result in a scene size (3., 3.)
. If rotate
is set to True
, 3 rotation augmentations will be performed at 90, 180, and 270 degrees. Finall ensure the config points to the correct folders and run:
python datasets/s3dis/s3dis_scenes.py
Download ScanNet and place in ./data
.
See the Loss network section for Stanford 3D-S data for more details on configs. Once configured generate patches and scenes with:
python datasets/scannet/scannet_patches.py
python datasets/scannet/scannet_scenes.py
Before the scene network can be trained you must first train a loss network, which in turn is used to train the scene network. Before training you must configure the config
dictionary at the bottom of the train file loss_network/train.py
. train_dataset
and test_dataset
should point to the .tfrecord
files created in dataset generation step. test_freq
determines how many steps a test batch is run. log_freq
sets the frequency that the metrics are logged to tensorboard.
Once configured, training can be initalised with:
python loss_network/train.py
Logs will be written to the log directory and can be visualised in tensorboard with:
tensorboard --logdir=/path/to/logs
The best model will be saved by deafult as: /path/to/logs/models/weight.ckpt
. This is the model to use for training the scene network.
Note: The model overwrites itself each time the mean loss passes below the previous best. Training will run indefinitely until cancelled. Once the loss curves in the tensorboard have converged simply kill the training (i.e. ctrl + c
).
Once training is complete you can evaluate the model performance on the entire test set by configuring and running:
python loss_network/evaluate.py
To train the scene network point the config
dictionary at the bottom of scene_network/train.py
to the .tfrecord
files generated above. loss_weights
should also point to the trained loss network model. loss_points
should match the number of points the loss network was trained with. To train the network using the learned loss set loss_network_inf: True
and unsupervised: True
. To train using the supervised chamfer loss function set unsupervised: False
. If loss_network_inf
is still True
then the gradients will be updated from the supervised loss function, but the patches will still be passed through the loss network and metrics collected. This is useful for comparison. However, it is slower than using the supervised loss network by itself. Therefore, if to train supervised it is also advisable to set loss_network_inf: False
. Once the config
is configured correctly, start scene network training with:
python scene_network/train.py
As with the loss network, logs will be written to the log_dir
and training can be visualised in tensorboard by running:
tensorboard --logdir=/path/to/logs
Note: Supervised losses are plotted for training using the learned loss function, however, it is important to remember these have different objective functions. As such, when training using the learned loss the supervised loss curves do not necessarily indicate model performance.
After successful training you can evaluate the performance of the model using the scene_network/inference.py
script. The config
dictionary can be used to point towards the trained scene network model. score_thresh
is a hyperparameter which is used to determine the at which confidence the occupancy branch needs to be for to classify the prediction as a positive. Increasing this therefore improves precision at the cost of recall. For the results presented in the paper we use the default 0.9
. iou_thresh
is the minimum required IoU for a prediction and a ground truth box such that the prediction is classified as true positive. For example if iou_thresh=0.25
the ap score is mAP@0.25. The evaluation can be run by:
python scene_network/evaluation.py
The final results will be printed to the terminal. A .csv
file will also be saved in the log_dir
for the current model being evaluated with per sample results.
We also provide a full room inference script for S3DIS dataset to run the scene network over an entire room with no redundency. This is purely for visualisation purposes and do not necessarily reflect the results reported in the paper with respect to mAP@.25. In the paper we report results on individual samples of scene network training data. To run first configure the config
dictionary at the bottom of scene_network/inference.py
to point to the area and room you wish to run and the trained model. Then perform inference with:
python scene_network/inference.py