PROJECTS

Building Height Estimation

A multitask model for building height Estimation and shadow, footprint mask generation, trained on the unbalanced GRSS-2023 dataset. Domain adapted for India with Google Open Building data, Introduced a novel seam carving based augmentation technique. Trained on UNet architecture with ASSP based encoder and two decoder one for height estimation known as regression deoder and other for shadow mask and footprint generation known as segmentation decoder, regression decoder uses windowed cross attention to query about shadow and footprint information from the segmentation decoder.
Report

Context Aware Image Resizing

The project focuses on seam carving, a content-aware image resizing technique that adjusts image dimensions by removing or inserting seams, preserving important content. It explains the implementation of seam carving using the Discrete Cosine Transform (DCT) and dynamic programming to find the minimum energy seams. Performance improvements were achieved by removing multiple seams simultaneously, reducing time complexity by 68%. The report also covers a deep learning-based seam carving approach using a VGG16-based UNet model and perceptual loss, showing promising results with a high structural similarity index (SSIM) of 0.8186.
Report

MINI PROJECTS

GAN

In this project, a Generative Adversarial Network (GAN) was implemented from scratch to generate images using the CIFAR-10 dataset. The architecture featured a generator that produces images from random noise and a discriminator that classifies them as real or fake. The model included batch normalization and LeakyReLU layers, with an Inception Score of 3.5, indicating moderate image quality and diversity. While the model performed reasonably well, there is room for improvement in terms of image sharpness and variety across classes.
Report

Super-Resolution

In this project, a Super Resolution Generative Adversarial Network (SRGAN) was implemented to upsample satellite images. The model enhances image resolution while preserving texture and detail. The architecture consists of a generator using convolutional layers and residual blocks for upsampling and a discriminator that classifies images as real or fake. Loss functions like adversarial, perceptual, and total variation losses were combined to improve image quality. The SRGAN effectively generates high-resolution satellite images, enabling better interpretation of geographical features and improving the utility of satellite data.
Report

Inversion and Adverserial Attack

This project demonstrates neural network inversion and adversarial attacks on the MNIST dataset. It utilizes a simple sigmoid-based convolutional neural network (CNN) model trained on this dataset. The project employs image inversion through gradient descent to reconstruct input images that yield desired outputs from the model. Additionally, it showcases a targeted adversarial attack, where images are manipulated to resemble a specific target class, fooling the model into consistently predicting this class. The effectiveness of the attack is evaluated using a confusion matrix, highlighting the model's vulnerabilities.
Report

0	1	2	3	4	5	6	7	8	9
858	0	4629	188	4	142	48	0	53	1
1	0	6692	37	0	0	0	6	6	0
36	53	5405	81	91	16	60	85	100	31
232	3	5009	385	12	102	236	7	135	10
16	4	5193	217	144	18	13	3	161	73
52	0	4173	963	7	53	53	8	89	23
31	0	5755	45	4	18	52	0	13	0
144	7	5686	95	8	58	47	21	63	136
35	3	5494	138	10	39	37	2	87	6
28	3	5362	390	108	6	0	33	12	7

The following confusion matrix illustrates the effectiveness of the targeted attack on the model, which consistently predicts the inputs belonging to the target class 2.

Judging a Book by its Cover

The project develops a multi-modal model for a book cover dataset to predict book genres, combining BERT for text and ResNet for images, this effectively integrates textual and visual information in the latent space before final classification. The model achieved a training accuracy of 54%, with Class 15 performing best (precision 0.87, recall 0.95), while Class 21 struggled significantly (precision 0.35, recall 0.02). On test data, overall accuracy achieved was 48%, with an average precision and recall of 49%.
Report

Artistic Image Enhancement and Style Transfer, Image Quantization

This assignment had three parts.
Report

Part 1

The goal was to transform real photos into non-photorealistic, painting-like images by enhancing light-shadow contrast, emphasizing lines, and using vivid colors. This process consists of two main steps: Artistic Enhancement and Color Adjustment.

Artistic Enhancement Step:
- Shadow Map Generation: Convert the input image from RGB to HSI color space and create a shadow map by assigning flags based on light or shadow areas. Merge this shadow map with the original image.
- Line Draft Generation: Convert the original image to grayscale, apply bilateral filtering to smooth it while preserving edges, and use Sobel filters to detect edges. Produce a binary line draft image through thresholding to emphasize important lines.
Color Adjustment Step: Create a chromatic map by decomposing the image in LAB color space to remove lightness influence and enhance the RGB components with this map. Apply linear saturation correction to address brightness changes.
Final Output: Combine the enhanced image and the line draft to create the final artistic rendered image, effectively balancing color and line emphasis.

Part 2

Focuses on analyzing a mechanism for rendering output through image quantization, specifically using the median-cut method for color quantization. The process involves working with a 24-bit input image, which contains 8 bits each for the red, green, and blue components. For comparison, results are generated both with and without Floyd-Steinberg dithering.

The assignment requires implementing both the median-cut algorithm and Floyd-Steinberg dithering from scratch, avoiding the use of direct built-in functions. The median-cut quantization is approached using a divide-and-conquer strategy.

The result below shows 12 bit quantization of the original 24 bit image.

Part 3

This part focuses on transferring color from a source image to a grayscale image using swatches. The objective is to utilize the similar intensity values between the source and target images within the respective swatches to effectively paint the entire target image.

A basic GUI application was developed to effectively select n swatches from both the source and the target image as shown below.

Swatch Selection

Grayscale image after color transfer.

Neural Style Transfer

Neural Style Transfer (NST) is a technique in computer vision that combines the content of one image with the artistic style of another, producing a new image that retains the original content's structure while adopting stylistic elements. In this part, I also implemented NST from scratch using a pre-trained convolutional neural network (CNN), specifically VGG16. They extracted content features from the higher layers of the network and style features from the lower layers. The process involved optimizing a randomly initialized image by minimizing a loss function that balances content and style differences.

Below is the result of performing the Iterative SGD version of Neural Style Transfer.

ASSIGNMENTS

Implementing Linear Regression, Logistic Regression, GDA. Report

Classification using Naive Bayes, SVM. Report

Exploring Decision Trees, Random Forests, Gradient Boosted Trees and Implementing Neural Network. Report

Exploring Activation Functions, Transfer Learning, Optimisers. Report

UC Merced Classifier. Report

Implementing Bilinear Interpolation. Report

Face Expression Transfer. Report

TusharVerma2497/ResumeProjects