jolibrain/joliGEN

Preserving the semantics during translation

habibian opened this issue · 6 comments

Hi,

I wonder what translation model should be used to maximize the semantic correctness during the translation. Any particular architecture, or discriminator that should be used?

Thanks!

beniz commented

Hi @habibian semantic conservation, e.g. with masks is controlled via the semantic network, see https://www.joligen.com/doc/options.html#semantic-segmentation-network

For instance, using a segformer b5 with --f_s_net segformer and --f_s_config_segformer models/configs/segformer/segformer_config_b5.json fits a rather large model on the semantic task, possibly improving the correctness of the mask conservation.

Other options are available, depending on use cases, e.g. --train_mask_loss_out_mask forces conservation of pixels outside the masks. Useful for local modification and global pixel conservation typically.

Thanks for the response @beniz . Let's see if I understand you correctly:

I assume by setting --f_s_net flag you incorporate a semantic segmentation network in the training pipeline of your GAN. However, I don't get how this network contributes to training a semantic correct generation. Is it being used as an additional discriminator? Would this additional discriminator be sufficient to guarantee semantic correctness in generations?

Thanks!

beniz commented

Gradients from f_s back propagate to G. Discriminators implement perceptual losses (fake vs real), while f_s both constrains and speeds up convergence as it diminishes the solution space.
So the more accurate f_s is, the better the gradients to G, the better G in terms of correctness.

Got it, thanks for the explanation!

Is f_s the only solution available in joliGEN for semantic correctness?

beniz commented

f_s is the general mechanism for GANs. Formally it is generic since constraints' gradients are passed to G. f_s could be more than a supervised network, as long as it is differentiable.

In practice, there are several options to control whether it learns from A, B or both, whether f_s is cross-domains or not, etc...

Most important are the losses with f_s. joliGEN implements standard supervised losses for f_s as they proved sufficient until now. But more precised losses could benefit some objectives. Typically removing small artifacts outside of existing labels, e.g. false positives requires adequate losses.

See #224 and #222 for PRs that we left aside.

Typically, adding more balanced losses, such as Dice and multi-class Dice can help. Also, weighting classes non-uniformly, etc... It should not be too difficult to add and test new losses, and you can let us know about this. When useful, they can later be added to the semantic controls.

Thanks for the great response @beniz !