Question in self-attention from 'transformer from scratch'

Question

Question in self-attention from 'transformer from scratch'

nemo0526 opened this issue 2 years ago · 0 comments

Hello! Your video is very nice but I still have some trouble when training.
I met "RuntimeError: shape '[64, 1024, 8, 128]' is invalid for input of size 65536" when split the embedding into self.heads different pieces and my embed_dim is set to 1024 as same as the value_len, key_len, query_len. Or is that mean I have to set value_len to 1? Do you know how's that happen? Thanks a lot.