bedapub/besca

`IndexError: list index out of range` while running the cell annotation workbook

matchy233 opened this issue · 2 comments

I was trying to run the latest cell annotation workbook on Roche's sHPC platform using jupyter lab with the besca2.5.3 kernel provided. The notebook was expected to run without errors, but I encountered IndexError in cell 46 (the cell number if you click "run all").

The error screenshot was attached below.

image

I've already identified the root cause of this error: it's related to the read_annotconfig function in besca/besca/tl/sig/_annot.py.

I'm not sure since when but at least for pandas v2.0.2, pd.read_csv will replace all NaN-like values (including "None") with NaN when you read a csv/tsv file.

So the "None"s in sigconfig will be replaced by NaN in the current implementation and thus will affect the building of levs, resulting in the function returning an empty levsk list. This consequently causes the index out of range error.

We could fix this by:

  1. Add keep_default_na=False to read_csv
  2. Add na_filter=False to read_csv
  3. Do not use sigconfig["Parent"] == "None" as the filtering criterion

Any of the fix is pretty easy so I can raise a PR for it after a dev review this issue.

Hi @matchy233, this sounds good to me. You can go ahead with your PR