I am trying to use cellphoneDB version 4.1 on a dataset of ~72k cells. This dataset is generated using single cell multiome (RNA + ATAC) and multiple batches. Following the guides, I am using Seurat "RC normalized" counts from RNA assay and created the h5ad object for the same.
Using the vignette and running the command: "list(adata.obs.index).sort() == list(metadata['barcode_sample']).sort()" returns TRUE.
However everytime I try to run the cpd_analysis_method following the vignette, I always get the error (please see attached)
I am not sure what is the issue in my files, please could you suggest what might be wrong?

Thank you for using CellphoneDB. As the exception message above says, some bar codes in your meta file are not found in your counts matrix. The expectation is that your meta file should have the same cell bar codes as the ones in your counts matrix. Could you please check?




So I had generated the metafile and the h5ad file from same seurat object. When reading it as annadata object AND comparing using

"list(adata.obs.index).sort() == list(metadata['barcode_sample']).sort()" it returns TRUE.

I believe this step is essentially comparing the barcodes which are in the h5ad object and the metadata barcode and it found all of them true.

Is that not the case?


As per - the meta file should have just two columns: Cell and cell_type. I see 'barcode_sample' in your reply above. Could you please fixing the column names in your meta file and trying again?




I was using the following tutorial and since it had barcode_sample as the column name of metadata file "Cell", So I assumed this was not the error.

I changed the metadata column names as suggested and attached is a screenshot of how my files look like. I still get the same error.
Please suggest what else can I try?

My apologies for leading you up a garden path - on second inspection I see the underlying code is more resilient and should be able to cope with barcode_sample as the first column name. You can see the piece of code that throws the original exception in

if np.any(~meta.index.isin(counts.columns)): ...

hence there must be some cell bar codes in meta file that are not found in the counts file - we just need to get to the bottom of why. You could try to test yourself using the above, or alternatively share the counts and meta files with me and I will take a look? If you put them somewhere accessible, you can send me the link via
Either way, do let me know how you got on.



Sorry for interrupting the thread. I was having the same issue, and tracked down the problem. Your problem is your cell names. pandas doesn't like column names with dashes ("-") in them. They get changed to periods. Since the count data is converted to a pandas dataframe, (and transposed so that cells are columns), this ends up renaming your cell names.

So for example, the cell "pool5A_AAACAGCCAAAGGTAC-1"? In your metadata file, it is labeled as "pool5A_AAACAGCCAAAGGTAC-1". In your counts file, it is labeled as "pool5A_AAACAGCCAAAGGTAC.1". Obviously, these two are not the same, so the code throws an error. You can fix it by going back to your code that generates your sample files, and renaming them so that they don't have dashes in their names.

I'm not sure if you have any other problems, but this at least seems to be one.

Perhaps it would be helpful for others to add a note in the tutorials that cell names with dashes cause errors.

Many thanks for your kind help on this issue - much appreciated - I will amend accordingly.

Many thanks, yes this worked. I stripped off all the hyphens and then tool ran.

