SHREC2023 - Team TikTorch

This repository contains the sources code of our team (Tiktorch) for track 2 and 3 of the SHREC2023 challenge, a competition that evaluates how well computer algorithms can recognize and retrieve 3D shapes based on different types of input, such as sketches or text description.

Our team achieved the best performance in both tracks for the topic of Sketch-based and Text-based 3D Animal Fine-Grained Retrieval with the following results:

Track	Public test				Private test
Track	NN	P@10	NDCG	mAP	NN	P@10	NDCG	mAP
2 (Sketch-based)	0.533	0.280	0.708	0.570	0.470	0.255	0.665	0.512
3 (Text-based)	0.520	0.220	0.651	0.527	0.460	0.238	0.647	0.525

Content:

Solution overview
Install dependencies
Preparing the data
Training

4.1. For sketch-based (track 2)

4.2. For text-based (track 3)
Retrieval

5.1. For sketch-based (track 2)

5.2. For text-based (track 3)
Ensemble the query results
References

1. Solution overview

Our main approach to the both track (sketch-based and text-based) animal retrieval task is considering it as a contrastive learning problem. From two different domains (3D objects and 2D sketches/prompt), we try to learn embedding vectors for both objects and queries in a common vector space, in which the embedding vectors of similar objects and sketches should be closer to each other and vice versa. The similarity score between an object and a sketch can be computed by cosine similarity metric of their embedding vectors.

We process the 3D objects with the multi-view method. Each 3D object is represented by a set of 7 rings, each ring holds a collection of images of the object, which are captured while moving a camera around it along a specific trajectory. The illustration for this idea can be seen in the figure below.

In order to help the model learning better, we generate a moderate amount of new queries and corresponding results. For sketch-based track, we apply Canny edge detection and a wonderful model Artline to create the sketches.

The overall architecture of our method is presented in the figures below. We will build two feature extractors, one for objects and one for queries (sketches and prompts). From these extractors, we obtain two feature vectors with U and V dimensions, respectively (U and V can be different). They will be embedded to the common vector space with P dimensions by two Multi-layer Perceptron (MLP) networks. The contrastive loss we use for simultaneous learning the parameters for the both models is a customized version of Normalized Temperature-scaled Cross Entropy Loss (NT-Xent).

Architecture for sketch-based (track 2)

Architecture for text-based (track 3)

You can read more about our solutions in these working note papers: (Track 2), (Track 3).

2. Install dependencies

Before installing the repo, we need to install the CUDA driver version >=11.6

We will create a new conda environment:

$ conda env create -f animar.yml
$ conda activate animar
$ pip install utils/pointnet2_ops_lib/.

3. Preparing the data

From the original data provided by the challenges's organizer, let's follow the official baseline repository to generate the "ringview" images of 3D objects (folder generated_models).

The folder structure should look like this:

SHREC2023-ANIMAR
`─ data/
│  `─ TextANIMAR2023/
│  │  `─ 3D_Model_References/
│  │  │  `─ References/
│  │  `─ Train/
│  │  │  `─ *GT_Train.csv
│  │  │  `─ *Train.csv
|
│  `─ SketchANIMAR2023/
│  │  `─ 3D_Model_References/
│  │  │  `─ References/
|  |  |  `─ generated_models/
│  │  `─ Train/
│  │  │  `─ SketchQuery_Train/
│  │  │  `─ *GT_Train.csv
│  │  │  `─ *Train.csv
`─ ...

From the ringview images in folder generated_models, we will generate the sketch-like version of them. The result images will be on the folder generated_sketches. We will only use these images for training the sketch-based model.

$ python data/ring_to_sketch.py \
    data/SketchANIMAR2023/Train/generated_models \
    data/SketchANIMAR2023/Train/generated_sketches

For generating the new queries for training (including images and csv files (track 2) or texts and csv files (track 3)), you can follow the notebook data/gen_query.ipynb. Assume that we have generated and put them in these folders:

Sketch-based: images folder data/SketchANIMAR2023/Train/new_skt_query and 2 csv files data/csv/new_train_skt.csv, data/csv/new_test_skt.csv.
Text-based: 2 csv files data/csv/new_train_tex.csv, data/csv/new_test_text.csv.

Also, in sketch-based track, we will crop the sketch images to ensure that the sketches will be in the center of the images.

$ python data/crop_sketch_query.py \
    data/SketchANIMAR2023/Train/new_skt_query \
    data/SketchANIMAR2023/Train/cropped_new_skt_query

Now, the structure of our data folder will look like this:

SHREC2023-ANIMAR
`─ data/
│  `─ TextANIMAR2023/
│  │  `─ 3D_Model_References/
|  |  |  `─ generated_models/
|  |  |  `- ...
│  │  `─ Train/
│  │  │  `─ new_train_tex.csv
│  │  │  `─ new_test_tex.csv
|  |  |  `- ...
│  `─ SketchANIMAR2023/
│  │  `─ 3D_Model_References/
|  |  |  `─ generated_sketches/
|  |  |  `- ...
│  │  `─ Train/
|  |  |  `─ cropped_new_skt_query/
│  │  │  `─ new_train_skt.csv
│  │  │  `─ new_val_skt.csv
`─ ...

4. Training

Our best results are achieved with the ringview methods. Therefore, we will only mention about the training of theses models. For the pointcloud methods, the commands are very similar.

4.1. For sketch-based (track 2)

Current available options:

CNN backbone: ResNet (resnetXX), EfficientNet (efficientnet_bX, efficientnet_v2_X)
View sequence embedder: LSTM/BiLSTM (bilstm) or T-Encoder (mha)
Ring sequence embedder: T-Encoder (1 or more blocks)

The MLP network for embedding to common space can be in the shinking (default) or expanding mode.

NOTE.

Add the flag --reduce-lr to use the learning rate schedule.

The meaning and default values of arguments:

$ python train_sketch_ringview.py --help

We can use the processed ring-view images for training (generated_sketches), or use the default ring-view images (generated_models).

For example:

$ python train_sketch_ringview.py \
    --view-cnn-backbone efficientnet_v2_s \
    --skt-cnn-backbone efficientnet_v2_s \
    --rings-path data/SketchANIMAR2023/3D_Model_References/generated_sketches \
    --used-rings 2,3,4 \
    --skt-data-path data/SketchANIMAR2023/Train/cropped_new_skt_query \
    --train-csv-path data/csv/new_train_skt.csv \
    --test-csv-path data/csv/new_test_skt.csv \
    --batch-size 2 \
    --epochs 50 \
    --latent-dim 256 \
    --output-path skt_exps \
    --view-seq-embedder mha \
    --num-rings-mhas 2 \
    --num-heads 4 \
    --lr-obj 1e-4 \
    --lr-skt 1e-4

The result of training process will be put inside folder skt_exps/ringview_exp_{num} (num is counted from 0)

4.2. For text-based (track 3)

Similar to sketch-based, you can read the the meaning and default values of all arguments by running this command:

$ python train_prompt_ringview.py --help

Example for training command:

$ python train_prompt_ringview.py \
    --view-cnn-backbone efficientnet_v2_s \
    --rings-path data/TextANIMAR2023/3D_Model_References/generated_models \
    --used-rings 2,3,4 \
    --train-csv-path data/csv/new_train_tex.csv \
    --test-csv-path data/csv/new_test_tex.csv \
    --batch-size 2 \
    --epochs 100 \
    --latent-dim 256 \
    --output-path tex_exps \
    --view-seq-embedder mha \
    --num-rings-mhas 2 \
    --num-heads 4 \
    --lr-obj 3e-5 \
    --lr-txt 3e-5

The result of training process will be put inside folder tex_exps/ringview_exp_{num} (num is counted from 0)

5. Retrieval

5.1. For sketch-based (track 2)

For example, our training results is in the directory ./skt_exps/ringview_exp_0/ and the test data is in the directory ./data/SketchANIMAR2023/Public Test/.

Firstly, we will crop the public test sketch images:

$ python data/crop_sketch_query.py \
    data/SketchANIMAR2023/Public\ Test/Sketches/ \
    data/SketchANIMAR2023/Public\ Test/cropped_sketches

Then run the retrieval command:

$ python retrieve_sketch_ringview.py \
    --info-json ./skt_exps/ringview_exp_0/args.json \
    --rings-path data/SketchANIMAR2023/3D_Model_References/generated_sketches \
    --obj-csv-path ./data/SketchANIMAR2023/3D_Model_References/References.csv \
    --skt-data-path ./data/SketchANIMAR2023/Public\ Test/cropped_sketches \
    --skt-csv-path ./data/SketchANIMAR2023/Public\ Test/SketchQuery_Test.csv \
    --obj-weight ./skt_exps/ringview_exp_0/weights/best_obj_embedder.pth \
    --skt-weight ./skt_exps/ringview_exp_0/weights/best_query_embedder.pth \
    --output-path skt_predicts

The retrieval results will be on the directory skt_predicts/ringview_predict_{num}, which contains 2 file: query_results.json and submission.csv. The json file can be used for ensemble the results later.

5.2. For text-based (track 3)

This is quite similar to the sketch-based. Assumse that our training results is in the directory ./tex_exps/ringview_exp_1/ and the test data is in the directory ./data/TextANIMAR2023/Public Test/.

The retrieval command:

$ python retrieve_prompt_ringview.py \
    --info-json tex_exps/ringview_exp_1/args.json \
    --rings-path data/TextANIMAR2023/3D_Model_References/generated_models \
    --obj-csv-path data/TextANIMAR2023/3D_Model_References/References/References.csv \
    --txt-csv-path data/TextANIMAR2023/Public\ Test/TextQuery_Test.csv \
    --obj-weight tex_exps/ringview_exp_1/weights/best_obj_embedder.pth \
    --txt-weight tex_exps/ringview_exp_1/weights/best_query_embedder.pth
    --output-path text_predicts

The retrieval results will be on the directory tex_predicts/ringview_predict_{num} with similar files to the sketch-based.

6. Ensemble the query results

We provide the max-voting method to ensemble the query results of models.

For example, the folder skt_predicts/test1 is currently in these structure:

`─ skt_predicts/test1/
│  `─ a_rand_name/
│  │  `─ query_results.json
│  │  `─ ...
│  `─ other_rand_name/
│  │  `─ query_results.json
│  │  `─ ...
|  `─ ...

Ensemble results:

python utils/ensemble_results.py \
    --input-folder skt_predicts/test1 \
    --output-folder skt_predicts/test1_ensembled

After running the above command, we get the result folder skt_predicts/test1_ensembled storing 2 files: query_results.json and submission.csv.

7. References

This repository is based on the official baseline of the organizers.

hellojxt/SHREC2023-ANIMAR