
Magnitude of Vectors on chart not taking into account number of vectors?

Closed this issue · 3 comments

Hi - as far as I can tell with the code, there needs to be a division at the end of this code snippet to take into account the number of vectors used for the embedding, to get the average magnitude across all vectors:

def get_vector_data_magnitude(data: dict[int, dict[int, Tensor]], step: int) -> float:
value = 0
for n in data[step]:
value += pow(n, 2)
value = math.sqrt(value) ##<------needs a divisor here?
return value

as current this is just squaring every data point and then sqrt the result

So something like this?

def get_vector_data_magnitude(data: dict[int, dict[int, Tensor]], step: int) -> float:
    value = 0
    for n in data[step]:
        value += pow(n, 2)
    vectors_per_token = int(len(data[step]) / DIMS_PER_VECTOR) #ie: 1, 3, 10, etc
    value = math.sqrt(value) / vectors_per_token
    return value

Yes I believe so. I was trying to write this change but didn't realise you had to explicitly define the vectors per token again. I'm happy to test this tonight and let you know


to update, I have tested your code change on a 3-vector TI, and all looks great - the new Average Vector magnitude (top image) reports as 4.5.. and not 3x that (13.7..), and the average vector strength is (rightly) unaffected.