Error 4i evaluate.py

Question

Error 4i evaluate.py

Thapeachydude opened this issue a year ago · 7 comments

Hi,

I'm trying to run the simple 4i tutorial, but evaluate.py crashes halfway. Unfortuantely, without more info on what each script is doing its difficult to troubleshoot this by oneself.

I'm running python /pathto/cellot/scripts/evaluate.py --outdir /pathto/scripts/cellot_run/ --setting iid --where data_space

Traceback (most recent call last):
  File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/ml_collections/config_dict/config_dict.py", line 883, in __getitem__
    field = self._fields[key]
KeyError: 'data'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/ml_collections/config_dict/config_dict.py", line 807, in __getattr__
    return self[attribute]
  File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/ml_collections/config_dict/config_dict.py", line 889, in __getitem__
    raise KeyError(self._generate_did_you_mean_message(key, str(e)))
KeyError: "'data'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/pathto/cellot/scripts/evaluate.py", line 183, in <module>
    app.run(main)
  File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/pathto/cellot/scripts/evaluate.py", line 173, in main
    evals = pd.DataFrame(
  File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/pandas/core/frame.py", line 563, in __init__
    data = list(data)
  File "/pathto/cellot/scripts/evaluate.py", line 62, in compute_evaluations
    for ncells, nfeatures, treated, imputed in iterator:
  File "/pathto/cellot/scripts/evaluate.py", line 118, in iterate_feature_slices
    _, treateddf, imputed = load_conditions(
  File "/pathto/cellot/cellot/utils/evaluate.py", line 271, in load_conditions
    embedding = read_embedding_context(
  File /pathto/cellot/cellot/utils/evaluate.py", line 159, in read_embedding_context
    if "ae_emb" in config.data:
  File "/pathto/miniconda3/envs/cellot/lib/python3.9/site-packages/ml_collections/config_dict/config_dict.py", line 809, in __getattr__
    raise AttributeError(e)
AttributeError: "'data'"

Training finished without errors and the output directory /pathto/scripts/cellot_run/ looks like this:

config.yaml

cache:
last.pt  model.pt  scalars  status

Any insights would be appreciated,
Best,
M

Answer 1 · 2024-02-28T14:54:51.000Z

Can you share the contents of /pathto/scripts/cellot_run/config.yaml?

Answer 2 · 2024-02-28T15:52:45.000Z

Sure.

data:
  condition: drug
  features: /pathto/cellot/datasets/4i/features.txt
  path: /pathto/cellot/datasets/4i/8h.h5ad
  source: control
  target: cisplatin
  type: cell
dataloader:
  batch_size: 256
  shuffle: true
datasplit:
  groupby: drug
  name: train_test
  test_size: 0.2
model:
  g:
    fnorm_penalty: 1
  hidden_units:
  - 64
  - 64
  - 64
  - 64
  kernel_init_fxn:
    b: 0.1
    name: uniform
  latent_dim: 50
  name: cellot
  softplus_W_kernels: false
optim:
  beta1: 0.5
  beta2: 0.9
  lr: 0.0001
  optimizer: Adam
  weight_decay: 0
training:
  cache_freq: 1000
  eval_freq: 250
  logs_freq: 50
  n_inner_iters: 10
  n_iters: 100000

Answer 3 · 2024-02-29T13:06:34.000Z

hmm ok Im not able to reproduce on my end. I think the script is not finding the config.yaml so some paths might be unspecified. Could you double check that? I added some better asserts and warnings, so if you dont find something, can you pull the new changes, re-run and send output if you still have an error?

Answer 4 · 2024-03-01T14:14:42.000Z

Hi, so I pulled the repo again.
Now there is a warning, that indeed it can't find the config file. Is there a way to specify it?

WARNING: config path not found
Traceback (most recent call last):

Answer 5 · 2024-03-01T15:00:07.000Z

Hi M,

aha ok I figured out what happened -- we had assumed that outdirs are named like <experiment_name>/model-<model>. So I was now able to reproduce your error and fixed it. Let me know if this works for you?

best,
Stefan

Answer 6 · 2024-03-01T17:01:48.000Z

So it finished with:

... storing 'transport' as categorical
I0301 17:15:05.610995 48003472152704 utils.py:145] Note: detected 192 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
I0301 17:15:05.611438 48003472152704 utils.py:148] Note: NumExpr detected 192 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.

Output looks like this:

ls cellot_run/evals_iid_data_space/
evals.csv  imputed.h5ad

Best

Answer 7 · 2024-03-15T22:47:47.000Z

Is your question resolved?