mit-han-lab/lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention

PythonNOASSERTION

Issues

could you share the tensorboard log file? thank you so much!
#5 opened 5 months ago by luogen1996
1
Export model to ONNX
#24 opened 5 months ago by suyuzhang
1
Please share your quantization, quantization+pruning checkpoints
#26 opened 5 months ago by kishorepv
2
Can not get the result as the paper if train the transformer from scratch.
#30 opened 5 months ago by tomshalini
3
about the global and local features in fig 3
#35 opened 5 months ago by sanwei111
3
TransformerEncoderLayer
#36 opened 5 months ago by sanwei111
5
wmt16_en_de dataset link
#41 opened 5 months ago by topbookcc
2
model pruning
#42 opened 5 months ago by AIikai
1
Can‘t find the cnn branch,
#43 opened 5 months ago by gwyanCN
1
about kernel size
#37 opened 3 years ago by sanwei111
1
About data !
#40 opened 3 years ago by veryhigh
1
about dynamicconv_cuda
#38 opened 3 years ago by sanwei111
1
about padding！！！
#39 opened 3 years ago by sanwei111
2
Model size confuse
#12 opened 4 years ago by zml24
1
in paragra 4 of
#33 opened 4 years ago by sanwei111
1
in the paragra 4 of paper
#34 opened 4 years ago by sanwei111
1
How to measure the FLOPs/MACs?
#31 opened 4 years ago by ranery
2
Error while testing the model
#29 opened 4 years ago by tomshalini
8
Error while evaluating model
#25 opened 4 years ago by kishorepv
9
Missing Data Preparation section for the CNN / DailyMail dataset
#28 opened 4 years ago by cronopioelectronico
1
Quantization
#22 opened 4 years ago by zilunpeng
1
transfomer model with different paramters
#23 opened 4 years ago by ChuanyangZheng
3
What functions are achieved in the cu code? The cu code is too hard for me to understand. Thank you.
#21 opened 4 years ago by guotong1988
1
Why do you recode the cpp code and cu code? What function is necessary?
#20 opened 4 years ago by guotong1988
1
Could you please point out the core code, as there are too many fairseq code. Thank you!
#19 opened 4 years ago by guotong1988
2
Will you release the TensorFlow code in the future?
#18 opened 4 years ago by guotong1988
1
CNN\DM dateset preprocess (bpe 30K)
#17 opened 4 years ago by Wangt-CN
1
wmt14 en-fr data processing problem
#16 opened 4 years ago by macn3388
3
Is there any link for downloading iwslt14.de-en pretrained model?
#15 opened 4 years ago by macn3388
1
Is there any guidance on preparing the cnndm dataset?
#14 opened 4 years ago by macn3388
1
Data preprocessing
#13 opened 4 years ago by swgu98
1
Model Compression
#4 opened 4 years ago by kalyangvs
5
Wrong key_padding_mask position?
#6 opened 4 years ago by godweiyang
3
Could you share Quantify and Pruning script? Thank you very much!
#11 opened 4 years ago by fansiawang
2
DeprecationWarning
#7 opened 4 years ago
1
register_task of abstractive summarization
#8 opened 4 years ago by zhangyongwei756
1
training config for wikitext103
#2 opened 5 years ago by pichuang1984
4
Applying factorized embedding
#3 opened 5 years ago by asharma20
4
Summarization checkpoint release ?
#1 opened 5 years ago by astariul
1