xiezhy6/PASTA-GAN

Training model from scratch results in incompatibility with the testing script and inferior visual quality after adjustments

albek00 opened this issue · 0 comments

I have attempted to reproduce high levels of quality shown in your pretrained model through training on a substantively expanded dataset. However, I have encountered a number of divergences between your dedicated training and testing dataset classes (UvitonDatasetFull and UvitonDatasetV19_test). Notably, training dataset, network architecture and, consequently, all new models trained through the provided script use two arrays of normalized body part images (norm_img and norm_img_lower, with their corresponding shapes at 30x64x64 and 12x64x64) as style encoding inputs, while testing dataset and pretrained model use only one array of normalized body part images, concatenated with a normalized pose representation (norm_img and norm_pose or ‘stickman’, with shapes at 30x64x64 and 30x64x64).

My approach to resolving these differences so far was based on bringing the training dataset class into closer alignment with the testing dataset through modification of its normalize() function. After these dataset adjustments and slight changes to the training script to accommodate the new input flow, I was able to train a network model with style encoding input shape (norm_img and norm_pose) at 60x64x64, fully in line with the pretrained model. However, as attached images show, the resulting level of visual quality is far removed from yours, even at relatively advanced stages of training (8000-12000 iterations). Notably, shoulder area and general body shape experience unexpected deformations relative to the original pose, while use of full-body images as either person or garment causes severe distortion.

Given this disappointing outcome, would it be possible for you to provide some feedback to the general direction of my efforts? Is there something I might have overlooked in my attempts to bring two dataset classes in line? Were there any additional parameters in the training script that should have received more of my attention?

1a95a642b4a8440a81c96cc928b40e1c__5c5b36b78f2341e48946768b5c490ef9
1a30787ed8a94293a2ba2a6b0eb0aa32__0f47cb9cb50a4dba8bcdaa68ae4da54d
1a30787ed8a94293a2ba2a6b0eb0aa32__2ec33e777b9d4905941b1b33d670045a
2d1c8a0485704f1291c6a02be96bdac8__1a7a5302159e48af84a690e3ad456489
2d1c8a0485704f1291c6a02be96bdac8__0bc002dc7e474daeb9a97688043fd83f
6b9c0d92645049bc9906c7d418c1f09c__4d21f15280db465db0169b5804620d64
6b9c0d92645049bc9906c7d418c1f09c__1ec58bc1d00a4542bddd59c03664b24e