theislab/scarches

Sizes of tensors must match except in dimension 1

Closed this issue · 9 comments

hi,

i have successfully run scarches with one pair of reference and query sets. now i am trying to do it with another one and i get the following error message at the query data loading stage:

In [123]: model = sca.models.SCANVI.load_query_data(
     ...:     target_adata.copy(),
     ...:     vae,
     ...: )
INFO     Using data from adata.X                                                             
INFO     Computing library size prior per batch                                              
/opt/anaconda3/lib/python3.7/site-packages/scvi/data/_anndata.py:795: UserWarning: adata.X does not contain unnormalized count data. Are you sure this is what you want?
  logger_data_loc
INFO     Registered keys:['X', 'batch_indices', 'local_l_mean', 'local_l_var', 'labels']     
INFO     Successfully registered anndata object containing 19755 cells, 6137 vars, 9 batches,
         17 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0     
         extra continuous covariates.                                                        
WARNING  Make sure the registered X field in anndata contains unnormalized count data.       
Traceback (most recent call last):

  File "<ipython-input-123-d90f5f2be02a>", line 3, in <module>
    vae,

  File "/opt/anaconda3/lib/python3.7/site-packages/scvi/core/models/archesmixin.py", line 118, in load_query_data
    fixed_ten = torch.cat([load_ten, new_ten[..., -dim_diff:]], dim=-1)

RuntimeError: Sizes of tensors must match except in dimension 1. Got 9 and 17 in dimension 0 (The offending index is 1)

any advice on what may be causing the error is greatly appreciated!

Hi,

Seems that you are using normalized data in your adata.X ? scanvi requires count data in adata.X.

  1. Do are you trying to use same
    Reference model you used before? Or you are creating a new reference?

  2. Do you use cell type labels in query?

hi, thanks for getting back to me so quickly.

  1. i am using raw counts, but raw counts produced by alevin are not integer for whatever reason (my best guess is this is because some genes contain multiple isoforms), so scVI etc always display the warning about unnormalized counts. everything else still works fine - is there any reason i should be worried?
  2. no, it is a completely different reference
  3. yes, i do, should i remove them?

by the way, 9 and 17 in Got 9 and 17 in dimension 0 do indeed refer to the number of clusters / cell types in the reference and the query, respectively

just to be clear, my ultimate goal is to transfer cell type labels from the reference onto the query

You need to treat the the query data as unlabeled. If your query data include new cell type then will get this error (if you want to use labeled query then your query cell types should be subset or same number of cell types as reference cell types o.w consider using unlabeled query).

Hi @bsierieb1 you should ensure that the query data has the cell type category that corresponds to unlabeled cells.

This can be done by doing

target_adata.obs[cell_type_label_key] = vae.unlabeled_category_

where vae is the reference model. Do you currently have cell type labels in your target that are not in reference?

In other words, the full workflow would be:

setup_anndata(ref_adata, batch_key=batch_key, labels_key=labels_key)
vae = SCANVI(ref_adata, unlabeled_category = "Unknown", **arches_params) # "Unknown" need not be in ref_adata.obs[labels_key]
vae.train()
target_adata.obs[labels_key] = vae.unlabeled_category_
query_model = SCANVI.load_query_data(target_adata, vae)
query_model.train(...)

thanks @M0hammadL for the link and thanks @adamgayoso! target_adata.obs[labels_key] = vae.unlabeled_category_ was exactly what i needed. perhaps it is worth adding this line to the tutorial? otherwise it is a little confusing - the corresponding tutorial section is called 'Perform surgery on reference model and train on query dataset without cell type labels' but in reality the query data set in the example does have cell type labels.

Hi,
I have a question in concerning the same issue:
I did train a model with scanvea.train and saved it.

Now I want to use the same model to transfer the cell type labels to another data set, but as I would only put the path of the model, and not define the model itself:

ref_path =
model = sca.models.SCANVI.load_query_data(
target_adata,
ref_path,
freeze_dropout = True,
)
model._unlabeled_indices = np.arange(target_adata.n_obs)
model._labeled_indices = []
print("Labelled Indices: ", len(model._labeled_indices))
print("Unlabelled Indices: ", len(model._unlabeled_indices))

So target_adata.obs[labels_key] = vae.unlabeled_category_ doesn't work in that case

I'd be very happy about any advice!