/ViT_depth

ViT Test code for CV tasks with 4th channel

Primary LanguagePythonApache License 2.0Apache-2.0

""" This file contains unit tests for the project.

It provides a set of test cases to verify the functionality of the project's code. Each test case is designed to test a specific aspect of the code and ensure that it behaves as expected.

The unit tests are implemented using the Python unittest framework, which provides a convenient way to define and run tests. Each test case is defined as a subclass of unittest.TestCase and contains one or more test methods that perform the actual testing.

Future Features:

  • Add additional test cases to cover more scenarios and edge cases.
  • Implement test fixtures to set up and tear down test environments.
  • Integrate with a continuous integration system to automatically run tests on code changes. """

ViT_depth

This repository contains code for testing the Vision Transformer (ViT) model with depth/4th channel for three different tasks:

  1. Image Classification
  2. Single Object Detection
  3. Multi Object Detection

Sample Data

The repository provides sample data for each task. Here are the sample images:

  1. Sample Data for Single RGBD Image from Random Noise with Depth: multi_class_objdet_data

  2. Sample Data for Single Class Object Detection: multi_class_objdet_data

  3. Sample Data for Multi Class Object Detection (up to 5 classes): multi_class_objdet_data

Model Architecture

The input images are preprocessed by flattening and concatenating the RGB and depth channels into a single vector. This vector represents the input sequence of tokens.

The concatenated vector is then passed through an embedding layer to project it into the desired embedding size, capturing important features from the input sequence.

The embedded sequence is reshaped to have dimensions (batch_size, 1, sequence_length) and passed through a series of transformer blocks. These blocks allow each position to attend to other positions, processing the sequence.

After the transformer blocks, global average pooling is applied to the output sequence, aggregating information from different positions and reducing the sequence length to 1.

Finally, the pooled output is passed through fully connected layers to produce predictions for bounding boxes and class scores.

Please refer to the code in this repository for more details on the implementation.

The pooled output is then passed through fully connected layers (self.fc_bbox and self.fc_class) to produce predictions for bounding boxes and class scores.

For each stage of the RGBDViT, ObjectDetectionViT, and MultiObjectDetectionViT models, the vector shape and size can be described as follows:

  1. RGBDViT:

    • Input Vector Shape: (batch_size, height, width, 4)
    • Input Vector Size: height x width x 4
    • Embedded Sequence Shape: (batch_size, sequence_length, embedding_size)
    • Embedded Sequence Size: sequence_length x embedding_size
    • Output Sequence Shape: (batch_size, 1, sequence_length)
    • Output Sequence Size: sequence_length
  2. ObjectDetectionViT:

    • Input Vector Shape: (batch_size, height, width, 3)
    • Input Vector Size: height x width x 3
    • Embedded Sequence Shape: (batch_size, sequence_length, embedding_size)
    • Embedded Sequence Size: sequence_length x embedding_size
    • Output Sequence Shape: (batch_size, 1, sequence_length)
    • Output Sequence Size: sequence_length
  3. MultiObjectDetectionViT:

    • Input Vector Shape: (batch_size, height, width, 3)
    • Input Vector Size: height x width x 3
    • Embedded Sequence Shape: (batch_size, sequence_length, embedding_size)
    • Embedded Sequence Size: sequence_length x embedding_size
    • Output Sequence Shape: (batch_size, 1, sequence_length)
    • Output Sequence Size: sequence_length

Please note that the we use values from original Vit for batch_size, height, width, sequence_length, and embedding_size will depend on the specific implementation and configuration of the models.

This repo also provides a set of test cases to verify the functionality of the project's code. Each test case is designed to test a specific aspect of the code and ensure that it behaves as expected.

The unit tests are implemented using the Python unittest framework, which provides a convenient way to define and run tests. Each test case is defined as a subclass of unittest.TestCase and contains one or more test methods that perform the actual testing.

Future Features:

  • Add additional test cases to cover more scenarios and edge cases.
  • Implement test fixtures to set up and tear down test environments.
  • Integrate with a continuous integration system to automatically run tests on code changes. """