jadore801120/attention-is-all-you-need-pytorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".
PythonMIT
Issues
- 1
python preprocess.py -lang_src de -lang_trg en -save_data multi30k_de_en.pkl -share_vocab have problem
#222 opened by yyydfff - 4
根据requirements.txt安装出错
#215 opened by kmphuang - 1
when I run the `python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl`.I have faced a problem
#218 opened by dapaolufuduizhang - 3
Error time to execute 'python preprocess.py -lang_src de -lang_trg en -share_vocab -save_data m30k_deen_shr.pkl' command
#216 opened by evilczy - 6
preprocess error
#202 opened by zhoup150344 - 0
About target mask
#217 opened by KimRass - 4
ValueError: Cell is empty
#204 opened by Kznnd - 2
TranslationDataset is now deprecated in torchtext
#194 opened by imkzh - 4
download dataset error
#197 opened by qimg412 - 0
- 0
Possible mistakes in d_k, d_v of MultiheadAttention
#211 opened by SARIHUST - 3
- 0
- 0
Performance Confusion
#209 opened by Zarca - 0
- 4
The results of the translate function.
#176 opened by zshyang - 5
- 1
Attention value is strange
#182 opened by YPatrickW - 1
CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
#193 opened by lisp2047 - 1
- 3
some problem
#185 opened by zhLia - 1
- 2
My question
#195 opened by Messiz - 0
OverflowError
#198 opened by Daming-TF - 2
Can't find model 'en'
#169 opened by manhph2211 - 2
Confusion regarding embedding space
#186 opened by IamAdiSri - 0
Beam search torch.log
#191 opened by actforjason - 2
test error
#181 opened by LonelyPlanetIoT - 0
变量名能不能起得好读一点?
#183 opened by Xelawk - 1
learning rate update before optimizer.step()
#188 opened by AlbertiPot - 0
- 5
Resuming Training
#158 opened by kaiyon07 - 0
MultiHeadAttention input shape
#179 opened by Superklez - 2
Incorrect implementation?
#177 opened by weilueluo - 1
Input sequence dimensions of MultiHeadAttention
#178 opened by Superklez - 1
If we change the code in Model.py like this, the convergence speed would be faster.
#175 opened by Sry2016 - 1
Why the previous version train faster
#157 opened by dwtenis - 0
How to deal with the UNK_TOKEN?
#173 opened by lhy2749 - 1
why none pad mask is nedd
#166 opened by helloworld729 - 0
train big data(8G)
#171 opened by JoeCoding - 1
raise ConnectionError(e, request=request)
#156 opened by KrisLee512 - 0
- 0
what is meaning of trg_pad_idx in label smoothing loss?
#165 opened by fakerhbj - 0
wrong with the code!!!!!
#164 opened by chenrxi - 1
what does n_head, d_model, d_k, d_v stands for?
#162 opened by seyeeet - 1
SyntaxError: invalid syntax
#163 opened by junzew - 0
Why decoding is needed during inference ?
#160 opened by rajeevbaalwan - 0
- 0
Surprising PPL on WMT 17
#154 opened by luffycodes - 0
d_k not equal to d_k gives issues
#153 opened by luffycodes