theislab/chemCPA

KeyError: 'assinex' on lincs.ipynb with lincs_full.h5ad

sumanabasu opened this issue · 4 comments

Hi
I'm running lincs.ipynb on lincs_full.h5ad
But I'm getting a keyerror on running this:

adata.uns['rank_genes_groups_cov'] = {cat: de_genes_quick[extract_drug(cat)] for cat in adata.obs.eval_category.unique() if extract_drug(cat) != 'DMSO'}

Error:

KeyError                                  Traceback (most recent call last)
/var/folders/yb/7q0vm8qx1tb5v2h73hv_ljz00000gn/T/ipykernel_81044/3681267745.py in <module>
----> 1 adata.uns['rank_genes_groups_cov'] = {cat: de_genes_quick[extract_drug(cat)] for cat in adata.obs.eval_category.unique() if extract_drug(cat) != 'DMSO'}

/var/folders/yb/7q0vm8qx1tb5v2h73hv_ljz00000gn/T/ipykernel_81044/3681267745.py in <dictcomp>(.0)
----> 1 adata.uns['rank_genes_groups_cov'] = {cat: de_genes_quick[extract_drug(cat)] for cat in adata.obs.eval_category.unique() if extract_drug(cat) != 'DMSO'}

KeyError: 'assinex

A little bit of background:
Code:

adata = adata[adata.obs.condition.isin(suff_drug_abundance)].copy()
adata 

Output:

AnnData object with n_obs × n_vars = 1023036 × 978
    obs: 'cell_id', 'det_plate', 'det_well', 'lincs_phase', 'pert_dose', 'pert_dose_unit', 'pert_id', 'pert_iname', 'pert_mfc_id', 'pert_time', 'pert_time_unit', 'pert_type', 'rna_plate', 'rna_well', 'condition', 'cell_type', 'dose_val', 'cov_drug_dose_name', 'cov_drug_name', 'eval_category', 'control'
    var: 'pr_gene_title', 'pr_is_lm', 'pr_is_bing'
    uns: 'cydata_pull'

Note: instead of 199620 x 928 as shown in the original notebbok.

Code:

def extract_drug(cond): 
    split = cond.split('_')
    if len(split) == 2: 
        return split[-1]
    return '_'.join(split[1:-1])

adata.obs['cov_drug_dose_name'].apply(lambda s: len(s.split('_'))).value_counts()
adata.obs['eval_category'].apply(lambda s: len(s.split('_'))).value_counts()

Output:

2    1022382
3        654
Name: eval_category, dtype: int64

Note: This is again different from what is shown in the notebook:

2    199620
Name: eval_category, dtype: int64

Appreciate any help!

Hi @sumanabasu,

Thanks for reporting this issue, I am able to reproduce it an get back to you, once I have figured out what is going wrong!

Hi @MxMstrmn Thanks!
Meanwhile, is there anywhere I can find the lincs_full_pp.h5ad file to be able to proceed?

Hi, I ran into the same issue when running the preprocessing code. Is there a new version of the code where this is resolved?

see my post of "#124"