Some questions concerning the implementation
Opened this issue · 0 comments
Hi, thanks for interesting work and beautiful visualisations.
I have looked at the implementation of the transforms in the code and some seem to contradict the main paper text.
-
For example,
ColorShift
is intended to do the following operation:$\sigma x + \mu$ . But according
to the line it does the inverse operation, i.e denormalisation$( x - \mu) / \sigma$ -
Total variation weight is
$0.0005$ instead of$0.00005$ (i.e 10 times larger) according to the line. -
Operation done here looks a bit strange. The exclusion of the
[CLS]
token is clear, but in the end one for some reason restrict the batch size (0th
dimension) to themin(batch_size, feature_dim)
, where feature dim is4 * embed_dim
for the case of transformers and then turns the resulting 1-d vector into 2-d diagonal matrix and takes the mean. Are these steps really needed or one could simply take the mean without transforming vector into matrix? -
Order of augmentations is different from the one in the paper according to the line
$Jitter(GS(CS(x)))$ instead of$GS(CS(Jitter(x)))$ . Or it doesn't affect performance much?
Thanks in advance for the response.