This project delves into machine perception, specifically targeting the task of fitting both 2D images and 3D scenes utilizing Multilayer Perceptron (MLP) networks. It comprises two distinct parts, each with its set of objectives and methodologies.
- Implemented Positional Encoding, a technique to map continuous input coordinates into a higher-dimensional space, enhancing the neural network's ability to capture intricate color and texture variations effectively.
- Designed a Multilayer Perceptron (MLP) with three linear layers, incorporating ReLu activation for the initial two layers and a Sigmoid activation function for the final layer.
- Trained the MLP to match the provided 2D image, employing the Adam optimizer and Mean Square Error as the loss function.
- Utilized normalized pixel coordinates and transformed the network's output back into image format.
- Evaluated the MLP's performance by computing the Peak Signal-to-Noise Ratio (PSNR) between the original and reconstructed images.
- Calculated the rays of the images based on the transformation between camera and world coordinates, combined with the camera's intrinsic parameters.
- Sampled points along each ray, adopting a uniform distribution from the nearest to the farthest points.
- Developed a Neural Radiance Fields (NeRF) MLP that ingested input from the position and direction of sampled points along each ray, with positional encoding applied to both.
- Implemented a volumetric rendering formula to compute the color of each pixel, involving the numerical approximation of a continuous integral for ray color.
- Rendered an image by computing all the rays, sampling points along these rays, forwarding them through the neural network, and then applying the volumetric equation to generate the reconstructed image.
- Integrated all the aforementioned steps to train the NeRF model using the Adam optimizer and Mean Square Error as the loss function.
- Iteratively improved the model, experimenting with different positional encoding frequencies and evaluating their impact on the image-fitting process.
This project served as a profound exploration of machine learning applications in computer vision, providing a comprehensive understanding of advanced concepts like NeRF and volumetric rendering. The final model achieved a remarkable PSNR exceeding 24.2 after 3000 iterations, demonstrating effective learning and the successful approximation of a 3D scene from 2D perspectives.