Code correction
Closed this issue · 1 comments
mugpeng commented
Code: omicverse/omicverse/single/_simba.py at master · Starlitnightly/omicverse
At the step of choosing the largest dataset as a reference: Batch correction (multiple batches) — SIMBA 1.2 documentation
The author get the result as:
{'C': AnnData object with n_obs × n_vars = 8569 × 50,
'C2': AnnData object with n_obs × n_vars = 2127 × 50,
'C3': AnnData object with n_obs × n_vars = 2122 × 50,
'C4': AnnData object with n_obs × n_vars = 457 × 50,
'G': AnnData object with n_obs × n_vars = 7988 × 50,
'C5': AnnData object with n_obs × n_vars = 1492 × 50}
Obviously C annData is the largest one, but it doesn't mean C is always the largest, for my data:
## dict_adata
{'C18': AnnData object with n_obs × n_vars = 3285 × 50,
'C17': AnnData object with n_obs × n_vars = 761 × 50,
'G': AnnData object with n_obs × n_vars = 3000 × 50,
'C16': AnnData object with n_obs × n_vars = 2080 × 50,
'C5': AnnData object with n_obs × n_vars = 1988 × 50,
'C6': AnnData object with n_obs × n_vars = 1835 × 50,
'C15': AnnData object with n_obs × n_vars = 597 × 50,
'C13': AnnData object with n_obs × n_vars = 2418 × 50,
'C7': AnnData object with n_obs × n_vars = 659 × 50,
'C10': AnnData object with n_obs × n_vars = 3673 × 50,
'C8': AnnData object with n_obs × n_vars = 523 × 50,
'C20': AnnData object with n_obs × n_vars = 147 × 50,
'C22': AnnData object with n_obs × n_vars = 1038 × 50,
'C9': AnnData object with n_obs × n_vars = 2437 × 50,
'C12': AnnData object with n_obs × n_vars = 1298 × 50,
'C4': AnnData object with n_obs × n_vars = 1774 × 50,
'C21': AnnData object with n_obs × n_vars = 1165 × 50,
'C2': AnnData object with n_obs × n_vars = 1995 × 50,
'C11': AnnData object with n_obs × n_vars = 3527 × 50,
'C14': AnnData object with n_obs × n_vars = 1912 × 50,
'C19': AnnData object with n_obs × n_vars = 436 × 50,
'C3': AnnData object with n_obs × n_vars = 1379 × 50,
'C': AnnData object with n_obs × n_vars = 497 × 50}
Thus the script should be:
batch_size_si = dict(zip(list(dict_adata.keys()),
[dict_adata[i].shape[0] for i in dict_adata.keys()]))
adata_ref = dict_adata[max(batch_size_si, key=batch_size_si.get)]
Starlitnightly commented
Thanks for the correction and the pull request!
Zehua