Sketch23D - Masterthesis Kerstin Hofer

This repository contains the source code and utility code for my Masterthesis Sketch to 3d-Model using Deep Learning and Differentiable Rendering, which was completed at the University of Applied Sciences Salzburg, Austria. The results of this thesis and how to use the code will be presented briefly in the following. If there are further questions, please contact me via email at kerstin_hofer90@gmx.at.


Usage

Installation

The provided source code works on both windows 10 and 11 as well as linux Ubuntu 22.04. The required python packages can be installed via Anaconda using the conda requirements (linux or windows). A version of NVIDA CUDA (preferably NVIDA CUDA 11.4 for windows or 11.5 for linux) is required, refer to the NVIDIA website for information on the installation.

Datasets

To create the datasets from scratch follow the instructions in util/dataset_ShapeNet and util/dataset_Thingy10k_ABC respectively.

Run code

Each module can be run individually by calling:

The main.py runs the entire pipeline with all three modules. The input parameters are explained in each runnable file and the required file structures for the datasets are explained in the respective util folder.


Thesis

The intent of the thesis is to create a system, that is capable of turning a simple 2D line-art into a 3D model. The main inspiration for this work is Xiang et al. (2020), but their idea was expanded by adding a depth map, that is created and used the same way as the normal map in their work.

Basic overview of the pipeline

Furthermore, multiple different base shapes based on different genera where provided (as depicted above) and the genus was determined prior to the differentiable rendering process. The aim of these modifications is to make the reconstruction process applicable to a greater variety of shapes and less restrictive to certain classes.

Structure

Basic overview of the pipeline

A pipeline with 3 modules was created:

  • A segmentation module using flood fill and the determination of the Euler number to evaluate the genus. The output of this module is a silhouette image as well as a base mesh based on the determined genus of the object.
  • A image-translation-network using a WGAN. The inspiration for that are the works by Isola et al. and Sue et al.. 2 networks were trained, one for a depth and one for a normal map. The checkpoints used in the thesis can be downloaded here.
  • A reconstruction module using the differentiable renderer Mitsuba 3. The filled image from the first module, and the created normal and depth maps from the translation network are used as ground truth for the normal, depth and silhouette loss. In addition to that, smoothness and edge losses are computed.

Output

The thesis was evaluated qualitatively and quantitatively (IoU and Chamfer distance) in 2 parts: A comparison to a state-of-the-art comparison method and an ablation study. For the comparison, the Neural Mesh Renderer by Kato, Ushiku, and Harada (2018) is used and for the dataset ShapeNetv1 is utilized. An introduction on how to setup those two and which resources to do that can be found in util/NMR and util/dataset_ShapeNet. For the ablation study, 4 variants are used:

  • The proposed one using the depth map and the determined genus in the process
  • One using the genus and the respective base mesh, but not the depth map
  • One using the predicted depth map, but not the base mesh
  • One that neither uses the depth map nor the base mesh of the determined genus

The results of the studies are presented in data incl. the reconstructed models. Generally, it can be said, that the proposed model is better than the other tested variants, but has problems with reconstructing the overall shape.

Comparison results

360° view of reconstructed object Normal maps from view of training images Normal maps random view

The comparison to the state-of-the-art model does not imply that this thesis' method is better regarding the shape, however, there are improvements in the reconstruction of details and normals. As seen in the image above, the reconstruction of the pipeline is only working for the view of the input images, while the state-of-the-art model does reconstruct all perspectives equally well. This is caused by the lack of regularizers in the Differentiable Rendering process, which could be improved in future versions.

Ablation results

360° view of reconstructed object Normal maps of reconstructed objects Depth maps of reconstructed objects

When looking at the images above, it can be determined, that the addition of the depth map and the topology-based base mesh does improve the reconstruction. Furthermore, it stablises the reconstruction process, leading to fewer invalid normals that can crash the system and make it more robust in terms of weight and learning rate adjustments.

Conclusion In general, the reconstruction has some flaws regarding the topology, since only the genus was considered, but not the relation of the holes to each other or their size. Furthermore, the input must be clean and properly seeded, which forces more work on the user. However, the pipeline does an overall reasonable job in reconstructing the surface given via the predicted maps (which were flawed in itself, especially when using the dataset created for this thesis due to the great variety of shapes), but a bad job in the overall shape. Since this method is single-view, these results were expected. However, the results show that the made additions due improve the base code and therefore it can reasonably be assumed, that with further improvements these methods are on-par and even better than current state-of-the-art methods. Easy first improvements could include mirroring the shaped part (only works for symmetrical datasets with aligned objects e.g. ShapeNet), determining the relation of the holes, clean up the input sketch, choosing better fitting base meshes and remeshing the output to avoid inverted and access faces.