jadore801120/attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".

PythonMIT

Issues

python preprocess.py -lang_src de -lang_trg en -save_data multi30k_de_en.pkl -share_vocab have problem
#222 opened 5 months ago by yyydfff
1
根据requirements.txt安装出错
#215 opened a year ago by kmphuang
4
when I run the `python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl`.I have faced a problem
#218 opened 10 months ago by dapaolufuduizhang
1
Error time to execute 'python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl' command
#216 opened a year ago by evilczy
3
preprocess error
#202 opened 2 years ago by zhoup150344
6
About target mask
#217 opened a year ago by KimRass
0
ValueError: Cell is empty
#204 opened 2 years ago by Kznnd
4
TranslationDataset is now deprecated in torchtext
#194 opened 3 years ago by imkzh
2
download dataset error
#197 opened 3 years ago by qimg412
4
why is masking performed again during the inference decoder stage?
#213 opened a year ago by Akshay1-6180
0
Possible mistakes in d_k, d_v of MultiheadAttention
#211 opened a year ago by SARIHUST
0
In patch_trg, i cant understand why do you change the data shape like that
#205 opened 2 years ago by kwanhoP
3
May I ask you a question about the "scale_emb_or_prj" parameter?
#210 opened 2 years ago by aitch25
0
Performance Confusion
#209 opened 2 years ago by Zarca
0
Is there some way to convert .chkpt to other form model such as onnx?
#208 opened 2 years ago by warren-wzw
0
The results of the translate function.
#176 opened 4 years ago by zshyang
4
How to export the trained chkpt network to onnx?
#172 opened 4 years ago by ZhangDongyuCN
5
Attention value is strange
#182 opened 3 years ago by YPatrickW
1
CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
#193 opened 3 years ago by lisp2047
1
why dropping last example with patch_src&patch_trg function @train.py
#174 opened 4 years ago by pluspluswu
1
some problem
#185 opened 3 years ago by zhLia
3
what's the intuition behind getting q, k and v from embedding
#201 opened 2 years ago by ShouravBR
1
My question
#195 opened 3 years ago by Messiz
2
OverflowError
#198 opened 3 years ago by Daming-TF
0
Can't find model 'en'
#169 opened 4 years ago by manhph2211
2
Confusion regarding embedding space
#186 opened 3 years ago by IamAdiSri
2
Beam search torch.log
#191 opened 3 years ago by actforjason
0
test error
#181 opened 3 years ago by LonelyPlanetIoT
2
变量名能不能起得好读一点？
#183 opened 3 years ago by Xelawk
0
learning rate update before optimizer.step()
#188 opened 3 years ago by AlbertiPot
1
why PEpos+k can be represented as a linear function of PEpos?
#187 opened 3 years ago by myrainbowandsky
0
Resuming Training
#158 opened 4 years ago by kaiyon07
5
MultiHeadAttention input shape
#179 opened 4 years ago by Superklez
0
Incorrect implementation?
#177 opened 4 years ago by weilueluo
2
Input sequence dimensions of MultiHeadAttention
#178 opened 4 years ago by Superklez
1
If we change the code in Model.py like this, the convergence speed would be faster.
#175 opened 4 years ago by Sry2016
1
Why the previous version train faster
#157 opened 4 years ago by dwtenis
1
How to deal with the UNK_TOKEN?
#173 opened 4 years ago by lhy2749
0
why none pad mask is nedd
#166 opened 4 years ago by helloworld729
1
train big data(8G)
#171 opened 4 years ago by JoeCoding
0
raise ConnectionError(e, request=request)
#156 opened 4 years ago by KrisLee512
1
Question About Attention Score Computation Process & Intuition
#167 opened 4 years ago by rezhv
0
what is meaning of trg_pad_idx in label smoothing loss?
#165 opened 4 years ago by fakerhbj
0
wrong with the code!!!!!
#164 opened 4 years ago by chenrxi
0
what does n_head, d_model, d_k, d_v stands for?
#162 opened 4 years ago by seyeeet
1
SyntaxError: invalid syntax
#163 opened 4 years ago by junzew
1
Why decoding is needed during inference ?
#160 opened 4 years ago by rajeevbaalwan
0
How does the gradients flow in cal_loss function in train.py?
#159 opened 4 years ago by InhyeokYoo
0
Surprising PPL on WMT 17
#154 opened 4 years ago by luffycodes
0
d_k not equal to d_k gives issues
#153 opened 5 years ago by luffycodes
0