ContextLab/quail

account for list length when computing temporal clustering (maybe other fingerprint dims)?

andrewheusser opened this issue · 7 comments

List length can bias our measure of temporal clustering (pointed out to me by Karl Healey's talk at CEMS 2017). For instance, short lists will have higher cluster scores than long lists because when recalling short lists, the subject can only possibly recall nearby items.

To account for this, we could permute recall sequences and then measure where the 'real' clustering score falls with respect to this distribution.

Not clear to me whether this would also effect our other feature dimensions?

I think this will affect all of our clustering estimates. My guess is that we are particularly biased in our estimates of discrete categorical features (size, category, word length), although this would be interesting to explore. Perhaps the way to do this right would be to (as a wrapper for each clustering score):

  • take the length of the observed recall sequence, n
  • select (with replacement) random sequences of n words from the set of presented words, and compute the clustering score for each random sequence.
  • repeat this 1000 times (or 10000 times?) to get a distribution of 1000 (or 10000) clustering scores
  • the "corrected" clustering score is the proportion of clustering scores in that random distribution that were lower than the clustering score for the observed recall sequence.

👍 that sounds great to me - seems worth doing before the memory fingerprint gets out into the wild as well..

i'm interested to see how different the answers are. if these permutation-based clustering functions are (a) quick, (b) relatively stable, and (c) systematically different from the raw clustering scores in a way that seems meaningful, then we should probably switch to using permuted clustering. we could potentially add a flag to turn on permutation clustering vs. computing raw clustering values.

Implemented the permutation-based clustering. It takes quite a bit longer in general. With 100 permutations per list the analysis takes 27.06 seconds vs. 2.69 seconds without permutations.

BEFORE CORRECTION:
before_correction

AFTER CORRECTION:
after_correction

I'll run the 1000 perms version and post below

ah - and here is the code the permutation. it could probably be streamlined a bit:

def bootstrap_fingerprint(p, r, f, distances, n_perms=100):

    r_perms = []
    r_real = compute_feature_weights(p, r, f, distances)

    for iperm in range(n_perms):
        r_perm = list(np.random.permutation(r))
        r_perms.append(compute_feature_weights(p, r_perm, f, distances))

    r_perms_bool = []
    for perm in r_perms:
        r_perm_bool = []
        for idx, feature_perm in enumerate(perm):
            r_perm_bool.append(feature_perm < r_real[idx])
        r_perms_bool.append(r_perm_bool)

    return np.sum(np.array(r_perms_bool), axis=0) / n_perms

1000 perms per list took 228 seconds (3.8 minutes)

after_correction_1000perms

@jeremymanning this is implemented on #58 - defaults to running the bootstrapping with 100 shuffles per recall list. When you get a chance, can you close when you are satisfied, or let me know what changes you'd like to make