Simulator scorer not in a working state
Closed this issue · 4 comments
With all_at_once=False and our default interpreter model (Llama3.1) the simulator fails to produce any score files and logs a variety of errors:
ERROR:delphi.scorers.simulator.oai_autointerp.explanations.simulator:activation value incorrect type: None
ERROR:delphi.scorers.simulator.oai_autointerp.explanations.simulator:activation value out of range: 10.14
ERROR:delphi.scorers.simulator.oai_autointerp.explanations.simulator:activation value type error: float() argument must be a string or a real number, not 'NoneType
With a more powerful interpreter model (Gemini flash 2) the simulator runs but consistently produces all zero activations:
"predicted_activations":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"ev_correlation_score":"nan","rsquared_score":0,"absolute_dev_explained_score":0
With all_at_once=True the simulator crashes with
terminate called after throwing an instance of 'c10::DistBackendError'
what(): [PG ID 2 PG GUID 3 Rank 0] Process group watchdog thread terminated with exception: [Rank 0] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=28, OpType=BROADCAST, NumelIn=7196, NumelOut=7196, Timeout(ms)=600000) ran for 600016 milliseconds before timing out.
zsh: IOT instruction CUDA_VISIBLE_DEVICES="5,6" python -m delphi HuggingFaceTB/SmolLM2-135M
and often leaves zombie processes taking up VRAM.
Command to reproduce:
CUDA_VISIBLE_DEVICES="0,1" python -m delphi HuggingFaceTB/SmolLM2-135M <path_to_sae> --dataset_repo EleutherAI/SmolLM2-135M-20B --filter_bos --max_latents 100 --hookpoints layers.0.mlp layers.9.mlp layers.18.mlp layers.27.mlp --name test --scorers simulation --explainer default --explainer_model_max_len 7320
I mostly agree that the simulator needs an upgrade. I've fixed some problems that were definitively wrong, and it should work albeit slow (on my end, using 2 gpus it's taking 2.30 minutes per feature). It will still have some errors, but that's just Llama being dumb.
Using all_at_once is what we want but vllm does not support prompt_logprobs caching so we can't really do much there.
Right okay, if we're not planning to support all_at_once=True then let's remove it.
Let me think about this