NaN in column names of spillover matrix

Question

NaN in column names of spillover matrix

grst opened this issue 10 months ago · 8 comments

I encounter the following issue when reading in a FCS file:

The spillover matrix contains a column with NaN as column name:

This leads to a failure in pytometry.pp.compensate():

TypeError: '<' not supported between instances of 'float' and 'str'

Stacktrace

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[49], line 1
----> 1 pm.pp.compensate(adatas["FBG-XXX_CD45+"])

File ~/projects/scverse/pytometry/pytometry/preprocessing/_process_data.py:153, in compensate(adata, comp_matrix, matrix_type, inplace)
    147 # Ignore channels 'FSC-H', 'FSC-A', 'SSC-H', 'SSC-A',
    148 # 'FSC-Width', 'Time'
    149 # and compensate only the values indicated in the compensation matrix
    150 # Note:
    151 # the compensation matrix may have different index names than the adata.X matrix
    152 ref_col = adata.var.index
--> 153 idx_in = np.intersect1d(compens.columns, ref_col)
    154 if not idx_in.any():
    155     # try the adata.var['channel'] as reference
    156     ref_col = adata.var["channel"]

File <__array_function__ internals>:200, in intersect1d(*args, **kwargs)

File /data/clinbias_data6/tmp_sturmgre/conda/envs/1403-0001_pytometry/lib/python3.11/site-packages/numpy/lib/arraysetops.py:444, in intersect1d(ar1, ar2, assume_unique, return_indices)
    442         ar2, ind2 = unique(ar2, return_index=True)
    443     else:
--> 444         ar1 = unique(ar1)
    445         ar2 = unique(ar2)
    446 else:

File <__array_function__ internals>:200, in unique(*args, **kwargs)

File /data/clinbias_data6/tmp_sturmgre/conda/envs/1403-0001_pytometry/lib/python3.11/site-packages/numpy/lib/arraysetops.py:274, in unique(ar, return_index, return_inverse, return_counts, axis, equal_nan)
    272 ar = np.asanyarray(ar)
    273 if axis is None:
--> 274     ret = _unique1d(ar, return_index, return_inverse, return_counts, 
    275                     equal_nan=equal_nan)
    276     return _unpack_tuple(ret)
    278 # axis was specified and not None

File /data/clinbias_data6/tmp_sturmgre/conda/envs/1403-0001_pytometry/lib/python3.11/site-packages/numpy/lib/arraysetops.py:336, in _unique1d(ar, return_index, return_inverse, return_counts, equal_nan)
    334     aux = ar[perm]
    335 else:
--> 336     ar.sort()
    337     aux = ar
    338 mask = np.empty(aux.shape, dtype=np.bool_)

TypeError: '<' not supported between instances of 'float' and 'str'

Unfortunately, I can't share the FCS file, but maybe the space in the column name could be an issue? This is the corresponding adata.var:

Answer 1 · 2023-08-24T14:49:57.000Z

Hi Gregor, Thanks for reporting! It seems you have two NAs in the spillover matrix; both `PerCP-…` and `APC-…` were missing and the latter doesn’t contain space. Could you try to load your dataset using the class `readfcs.ReadFCS`? And could you check what’s in `self._meta[“spill”]`? If that looks alright, it’s possible that your “channels" and the spill matrix index have some mismatches, and got mapped to NA because of these lines: https://github.com/laminlabs/readfcs/blob/main/readfcs/_core.py#L217-L219 I made a PR here to prevent mismatches from converting into NAs: #32 But I’m not sure if that’s a good fix and if it will still error downstream during compensation. Could you test if that fixes your issue? ~Sunny

…

On Aug 24, 2023 at 16:15 +0200, Gregor Sturm ***@***.***>, wrote: Hi @sunnyosun, I encounter the following issue when reading in a FCS file: The spillover matrix contains a column with NaN as column name: This leads to a failure in pytometry.pp.compensate(): TypeError: '<' not supported between instances of 'float' and 'str' Stacktrace --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[49], line 1 ----> 1 pm.pp.compensate(adatas["FBG-XXX_CD45+"]) File ~/projects/scverse/pytometry/pytometry/preprocessing/_process_data.py:153, in compensate(adata, comp_matrix, matrix_type, inplace) 147 # Ignore channels 'FSC-H', 'FSC-A', 'SSC-H', 'SSC-A', 148 # 'FSC-Width', 'Time' 149 # and compensate only the values indicated in the compensation matrix 150 # Note: 151 # the compensation matrix may have different index names than the adata.X matrix 152 ref_col = adata.var.index --> 153 idx_in = np.intersect1d(compens.columns, ref_col) 154 if not idx_in.any(): 155 # try the adata.var['channel'] as reference 156 ref_col = adata.var["channel"] File <__array_function__ internals>:200, in intersect1d(*args, **kwargs) File /data/clinbias_data6/tmp_sturmgre/conda/envs/1403-0001_pytometry/lib/python3.11/site-packages/numpy/lib/arraysetops.py:444, in intersect1d(ar1, ar2, assume_unique, return_indices) 442 ar2, ind2 = unique(ar2, return_index=True) 443 else: --> 444 ar1 = unique(ar1) 445 ar2 = unique(ar2) 446 else: File <__array_function__ internals>:200, in unique(*args, **kwargs) File /data/clinbias_data6/tmp_sturmgre/conda/envs/1403-0001_pytometry/lib/python3.11/site-packages/numpy/lib/arraysetops.py:274, in unique(ar, return_index, return_inverse, return_counts, axis, equal_nan) 272 ar = np.asanyarray(ar) 273 if axis is None: --> 274 ret = _unique1d(ar, return_index, return_inverse, return_counts, 275 equal_nan=equal_nan) 276 return _unpack_tuple(ret) 278 # axis was specified and not None File /data/clinbias_data6/tmp_sturmgre/conda/envs/1403-0001_pytometry/lib/python3.11/site-packages/numpy/lib/arraysetops.py:336, in _unique1d(ar, return_index, return_inverse, return_counts, equal_nan) 334 aux = ar[perm] 335 else: --> 336 ar.sort() 337 aux = ar 338 mask = np.empty(aux.shape, dtype=np.bool_) TypeError: '<' not supported between instances of 'float' and 'str' Unfortunately, I can't share the FCS file, but maybe the space in the column name could be an issue? This is the corresponding adata.var: — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

Answer 2 · 2023-08-24T15:04:01.000Z

Ok, I got at the bottom of this:

>>> import fcsparser
>>> meta, data = fcsparser.parse('<path>')
>>> data.columns
Index(['FSC-A', 'FSC-H', 'FSC-W', 'SSC-A', 'SSC-H', 'SSC-W', 'CD33',
       'PerCP-eFluor 710-A', 'SIRPa', 'Alexa-700-A', 'APC-eFluor-780-A',
       'CD1c', 'BV510-A', 'CD11c', 'CD11b', 'CD64', 'CD45', 'TIE2',
       'Dazzle-594-A', 'BUV395-A', 'BUV496-A', 'BUV737-A', 'Time'],
      dtype='object')
>>> meta["SPILL"]
'16,FITC-A,PercP-eFluor 710-A,APC-A,Alexa-700-A,APC-Alexa-750-A,BV421-A,BV510-A,BV605-A,BV650-A,BV711-A,BV786-A,PE-A,Dazzle-594-A,BUV395-A,BUV496-A,BUV737-A,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1'

It seems to be a case conflict (PercP vs PerCP) and an entirely mismatched column name (APC-Alexa-750-A vs. APC-eFluor-780-A) . I'll need to follow up with the data providers if they can fix it.

On the readfcs side, not sure what's the best? Maybe raise an error?
Or keep the original names as you suggested and raise the error (with a better error message) in pytometry?

Answer 3 · 2023-08-24T15:17:08.000Z

Great that you identified the issue! Yes, raising an error early for such upstream data integrity issues are always good. Will do that!

Answer 4 · 2023-08-24T15:25:09.000Z

My only concern with that is that it doesn't allow me at all to read the data anymore (for instance, to fix the matrix manually if they can't fix it upstream). So maybe just a warning would be better?

Answer 5 · 2023-08-24T15:32:00.000Z

Oh, you can try manually fixing the mismatches using the ReadFCS class, something along this line:

fcsfile = readfcs.ReadFCS(datapath)
old_spill_matrix = fcsfile._meta["spill"]
# --- fix the index and columns --- #
fcsfile._meta["spill"] = fixed_spill_matrix
adata = fcsfile.to_anndata()

Answer 6 · 2023-08-24T15:35:10.000Z

Ok, that would work for me :)

…

________________________________ From: Sunny Sun ***@***.***> Sent: Thursday, August 24, 2023 5:32:11 PM To: laminlabs/readfcs ***@***.***> Cc: Gregor Sturm ***@***.***>; Author ***@***.***> Subject: Re: [laminlabs/readfcs] NaN in column names of spillover matrix (Issue #31) Oh, you can try manually fixing the mismatches using the ReadFCS class, something along this line: fcsfile = readfcs.ReadFCS(datapath) old_spill_matrix = fcsfile._meta["spill"] # --- fix the index and columns --- # fcsfile._meta["spill"] = fixed_spill_matrix adata = fcsfile.to_anndata() — Reply to this email directly, view it on GitHub<#31 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABVZRV3GK4OQWM5KMWMDD4TXW5XXXANCNFSM6AAAAAA35DGERQ>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Answer 7 · 2023-08-25T11:07:24.000Z

Thanks again for your swift response and fixes!

Answer 8 · 2023-08-25T11:14:33.000Z

My pleasure!