/Siamese-Masked-Autoencoders---Learning-and-Exploration

Course: DD2412 Deep Learning Advanced at KTH Project by Casper, Magnus, and Friso Focus: Self-supervised learning and computer vision with SiamMAE. Replicating core results and potential research extensions.

Primary LanguageJupyter Notebook

Siamese Masked Autoencoders - Learning and Exploration

Alt text

Short Description

  • Course: DD2412 Deep Learning Advanced at KTH
  • Project Team: Friso de Kruiff, Magnus Tibbe, and Casper Augustsson Savinov
  • Focus: Self-supervised learning and computer vision with siamMAE. Replicating core results and potential research extensions.
  • Project Duration: December 2023

Usage

  • Fill in here

For more details, see Documentation.

Documentation

TODO

  • Pretraining
    • Checkpoints load and save
    • Script to transform data to jpg
    • Dataclass
    • Data augmentation
    • Data loading

Relevant Research Papers

Possible Research Extensions

In addition to replicating the core results from the paper and gaining a deep understanding of Siamese Masked Autoencoders, our project has the potential for several research extensions. These extensions aim to explore the capabilities and applications of the model further. Here are some of the potential research directions we are considering:

  1. Exploring Additional Data Augmentations: We aim to investigate whether the Siamese Masked Autoencoder method can be extended to work with different data augmentations, specifically rotation. Given the success of rotation in contrastive learning, we plan to experiment with various degrees of rotation to determine its impact on model performance.

  2. Complex Datasets: To test the model's scalability and adaptability, we plan to evaluate Siamese Masked Autoencoders on more complex datasets, such as VSPW, UVO, or [KITTI], which contain more objects per frame and pose unique challenges.

  3. Multi-Frame Prediction: Building on the foundation of self-supervised learning, we want to explore the feasibility of predicting multiple future frames in video sequences. This extension would involve training the model to predict not just the immediate future frame but multiple frames ahead, potentially enhancing its temporal understanding.

  4. Uncertainty Estimation for Improved Performance: We plan to leverage uncertainty estimation techniques to enhance the model's performance. This could involve generating uncertainty heatmaps of predicted pixel values, which may help in identifying areas where the model is more confident or uncertain. We can explore using this uncertainty estimate for "smart masking," prioritizing regions with high certainty or uncertainty to improve prediction quality or increase the difficulty of the task.

These research extensions are aligned with our core interests in self-supervised learning, generative models, and uncertainty estimation. While the primary focus is on replicating the core results of the paper, we believe these extensions have the potential to contribute to the broader understanding of Siamese Masked Autoencoders and their practical applications.

Please note that the feasibility and scope of these extensions may evolve as the project progresses, and we will adapt our plans accordingly.

Reference

Resources, Links, and YouTube Videos