mit-han-lab/lite-transformer
[ICLR 2020] Lite Transformer with Long-Short Range Attention
PythonNOASSERTION
Issues
- 1
- 1
Export model to ONNX
#24 opened by suyuzhang - 2
- 3
- 3
about the global and local features in fig 3
#35 opened by sanwei111 - 5
TransformerEncoderLayer
#36 opened by sanwei111 - 2
wmt16_en_de dataset link
#41 opened by topbookcc - 1
model pruning
#42 opened by AIikai - 1
Can‘t find the cnn branch,
#43 opened by gwyanCN - 1
about kernel size
#37 opened by sanwei111 - 1
About data !
#40 opened by veryhigh - 1
about dynamicconv_cuda
#38 opened by sanwei111 - 2
about padding!!!
#39 opened by sanwei111 - 1
Model size confuse
#12 opened by zml24 - 1
in paragra 4 of
#33 opened by sanwei111 - 1
in the paragra 4 of paper
#34 opened by sanwei111 - 2
How to measure the FLOPs/MACs?
#31 opened by ranery - 8
Error while testing the model
#29 opened by tomshalini - 9
Error while evaluating model
#25 opened by kishorepv - 1
- 1
Quantization
#22 opened by zilunpeng - 3
transfomer model with different paramters
#23 opened by ChuanyangZheng - 1
What functions are achieved in the cu code? The cu code is too hard for me to understand. Thank you.
#21 opened by guotong1988 - 1
- 2
Could you please point out the core code, as there are too many fairseq code. Thank you!
#19 opened by guotong1988 - 1
- 1
CNN\DM dateset preprocess (bpe 30K)
#17 opened by Wangt-CN - 3
wmt14 en-fr data processing problem
#16 opened by macn3388 - 1
- 1
- 1
Data preprocessing
#13 opened by swgu98 - 5
Model Compression
#4 opened by kalyangvs - 3
Wrong key_padding_mask position?
#6 opened by godweiyang - 2
- 1
DeprecationWarning
#7 opened - 1
- 4
training config for wikitext103
#2 opened by pichuang1984 - 4
Applying factorized embedding
#3 opened by asharma20 - 1
Summarization checkpoint release ?
#1 opened by astariul