Save cached latents as caching progresses

Question

Save cached latents as caching progresses

Opened this issue a year ago · 3 comments

This is a complicated change. At the moment, we are holding all feature activations in RAM while caching, which becomes problematic when dealing with millions of tokens.
I thing the way we want to do this is to use something like huggingface datasets.

Answer 1 · 2024-11-12T17:59:54.000Z

@SrGonao would love to work on this ,can you provide more details about it .

Answer 2 · 2024-11-12T18:17:03.000Z

Currently, we do feature caching by keeping the activations in memory, before saving it (https://github.com/EleutherAI/sae-auto-interp/blob/v0.2/sae_auto_interp/features/cache.py#L208-L242). We could potentially keep saving it after X amount of tokens and then merge them at the end. This would allow for people to do longer runs where feature activations don't all fit in memory

Answer 3 · 2024-11-12T18:22:44.000Z

okay great .will look into that .how can i test this approach