Training model from scratch results in incompatibility with the testing script and inferior visual quality after adjustments

Question

Training model from scratch results in incompatibility with the testing script and inferior visual quality after adjustments

albek00 opened this issue 3 years ago · 0 comments

I have attempted to reproduce high levels of quality shown in your pretrained model through training on a substantively expanded dataset. However, I have encountered a number of divergences between your dedicated training and testing dataset classes (UvitonDatasetFull and UvitonDatasetV19_test). Notably, training dataset, network architecture and, consequently, all new models trained through the provided script use two arrays of normalized body part images (norm_img and norm_img_lower, with their corresponding shapes at 30x64x64 and 12x64x64) as style encoding inputs, while testing dataset and pretrained model use only one array of normalized body part images, concatenated with a normalized pose representation (norm_img and norm_pose or ‘stickman’, with shapes at 30x64x64 and 30x64x64).

My approach to resolving these differences so far was based on bringing the training dataset class into closer alignment with the testing dataset through modification of its normalize() function. After these dataset adjustments and slight changes to the training script to accommodate the new input flow, I was able to train a network model with style encoding input shape (norm_img and norm_pose) at 60x64x64, fully in line with the pretrained model. However, as attached images show, the resulting level of visual quality is far removed from yours, even at relatively advanced stages of training (8000-12000 iterations). Notably, shoulder area and general body shape experience unexpected deformations relative to the original pose, while use of full-body images as either person or garment causes severe distortion.

Given this disappointing outcome, would it be possible for you to provide some feedback to the general direction of my efforts? Is there something I might have overlooked in my attempts to bring two dataset classes in line? Were there any additional parameters in the training script that should have received more of my attention?