Raincleared-Song/sparse_gpu_operator

Question about L1Norm when training

Closed this issue · 4 comments

Hi @Raincleared-Song ,
Confusion about performing L1norm on which dim given the input x of shape [bs, seq_len, hidden].
Is the L1 Norm output in the shape of [bs,] or [bs, seq_len]?
Thanks~

I'm sorry but our operator on the master branch does not support batch operations at present. In other words, the batch size must be 1 to run properly.
However, the operator with batch processing is already under active development in this repo. After the naive implementation is completed, it will be pushed to the current repo as a new branch.

Thanks for your reply.
But actually I'm focusing on the output shape of L1 Norm in formula 2 during training stage:
L1formula
The bs is unlikely to be 1 during training, is that right?

You are right. As our operators are tailored for inference at the beginning, we did not add the batch processing feature. Also, we did not apply the operators in the training stage.
Still, you may pay attention to our developing feature in the following repo.

However, the operator with batch processing is already under active development in this repo. After the naive implementation is completed, it will be pushed to the current repo as a new branch.

OK,thanks~