How to apply NOT to descrete data?
pengzhangzhi opened this issue · 3 comments
Hi! First of all, thank you so much for t he NOT! I am wondering how to use NOT in unpaired translation task for descrete data like text tokens. I guess the loss function should be changed. But I have no clue! Would you like to talk about it?
Thank you anyway for making the NOT available!
Best,
Zhangzhi
Hi, I do not see a straightforward way to apply the algorithm to discrete data such as tokens. However, it seems like one may work in the space of embeddings. In this case, NOT algorithm should be applicable "as-is" with l2-like cost functions.
Best,
Alex
Hi, Thank you Alex for the response. I am trying to apply this powerful tool to other data and facing obstacles. The training of NOT is similar to GAN. I find it really hard to converge for the T and f functions in my task. I successfully overfit the model on one sample. But when I scale up to two samples, the model does not converge. I am writing to ask if you have any ideas on how to train a NOT model in a new dataset. Any tricks or practical guidance would be much appreciated!
Hi. Doing many experiments over years, me and my colleagues found several things:
(1) Usually, doing much more iterations for transport map T than for D is better. In the paper, we used 10T/1f, but sometimes even 25T/1F or so works better. Overall, it seems to ok to "overfit" T for f.
(2) Sometimes, using beta1=0 in Adam for T and f really helps (this is not used in the repo).
(3) Schedulers are recommended (though no clear universal recipe here).
Also, if you consider weak optimal transport, it is better to use weak kernel costs that the weak quadratic cost.
https://github.com/iamalexkorotin/KernelNeuralOptimalTransport
Hope this helps!
Best,
Alex