Issues
- 1
No Field in torchtext.data
#35 opened by dtdo90 - 1
Question about k × hk weight matrices
#30 opened by ramesaliyev - 1
Why the use of log_softmax ?
#36 opened by dsantiago - 0
How to tokenize a testing phrase
#37 opened by edgarmg91 - 7
- 2
- 3
Einsum to avoid transpose and reshape
#4 opened by azzever - 2
- 3
improve performance in deeper network using your multi head attention code
#29 opened by fatemehniknezhad - 1
- 1
Module import error
#25 opened by Cybernetic1 - 1
conda installation failed
#27 opened by bsun0802 - 0
Understanding multiheaded attention.
#23 opened by anorak94 - 1
Blog is down
#26 opened by JJGO - 8
- 1
data.Field no longer supported in torchtext
#20 opened by jvdburgt - 5
ModuleNotFoundError: No module named 'past'
#9 opened by jjong2ya - 1
High compute time, what is a reasonable generator model to get somewhat good results to play with?
#22 opened by jplasser - 8
Is narrow attention implemented correctly?
#13 opened by TheGrayFrost - 1
slide 50, v and q mixed up
#19 opened by ulf1 - 6
token_embedding for non-text sequences
#18 opened by StolkArjen - 1
- 1
Using the trained model
#14 opened by ShivanshuPurohit - 1
How can we visualize the self-attention map
#15 opened by amiltonwong - 4
- 1
Accuracies on the examples
#12 opened by sf-wind - 1
Issue with masking
#10 opened by mc-robinson - 2
Comparing SelfAttention classes
#8 opened by sidneyaraujomelo - 5
- 1
Masking done for the upper or lower triangle?
#6 opened by esvhd - 2
Weight scaling should be for keys not values?
#2 opened by esvhd