Bug in Exercise 5.6?

Question

Bug in Exercise 5.6?

Closed this issue a month ago · 3 comments

Bug description

First of all, congratulations for writing such an amazing book! I was going from tutorial to tutorial until this book finally made me understand everything.

In Exercise 5.6, I'm trying to experiment with GPT-2 models of different sizes. It works fine with "gpt2-small (124M)" but other models return the error from the __init__() method of the MultiHeadAttention Class: d_out must be divisible by num_heads.

For instance, with model "gpt2-xl (1558M)", we have d_out=1600 and num_heads=48. Indeed, 1600 is not divisible by 48.

Is there a bug somewhere?

What operating system are you using?

Windows

Where do you run your code?

Local (laptop, desktop)

Answer 1 · 2024-11-21T00:01:52.000Z

Hi there,

I just gave it a try and it seems to work for me. I wonder if you maybe accidentally swapped the number of layers with the number of heads? I.e., the number of heads should be 25 not 48. And 1600/25=64.

Answer 2 · 2024-11-21T06:47:18.000Z

You're right!!! I swapped the number of layers with the number of heads. I spent so many hours trying to adjust the MultiHeadAttention Class in order to make it fit. Of course, that was impossible.

Sorry to have bothered you with my mistake and thank you for answering.

Answer 3 · 2024-11-21T09:26:39.000Z

No worries, and I am glad that we were able to resolve the problem!