Problem about activation calculation

Question

Problem about activation calculation

Daftstone opened this issue a year ago · 3 comments

I would like to know how neuron activation is calculated and how to map neuron activation to each input token. Or can you provide me with related work on calculating neuron activation, I would be very grateful.

Answer 1 · 2023-05-24T21:03:32.000Z

Yes, I have the same question regarding to the calculation of token-level activations.
It is not clear in both the paper and code.
If anyone could give some hints, I would also be very grateful.

Answer 2 · 2023-05-30T18:34:01.000Z

Dear authors,

I found that this section provides the definition of neuron-token-level connection weights. First, I want to confirm if the word-neuron activation is extracted based on this section. I am confused because it seems that this activation does not take into account the context information. Specifically, according to the equation h{l}.mlp.c_proj.w[:, n, :] @ diag(ln_f.g) @ wte[t, :], the output weight of a neuron (l, n) to the token t appears to be independent of other tokens.

I would greatly appreciate it if someone could address my confusion and provide clarification on this matter.

Best,
Xuansheng

Answer 3 · 2023-06-10T04:41:38.000Z

yes, that's right - it doesn't take context information into account. it would probably be better to use something activation instead of weight based