Starlitnightly/omicverse

Code correction

Closed this issue · 1 comments

Code: omicverse/omicverse/single/_simba.py at master · Starlitnightly/omicverse

At the step of choosing the largest dataset as a reference: Batch correction (multiple batches) — SIMBA 1.2 documentation
The author get the result as:

{'C': AnnData object with n_obs × n_vars = 8569 × 50,
 'C2': AnnData object with n_obs × n_vars = 2127 × 50,
 'C3': AnnData object with n_obs × n_vars = 2122 × 50,
 'C4': AnnData object with n_obs × n_vars = 457 × 50,
 'G': AnnData object with n_obs × n_vars = 7988 × 50,
 'C5': AnnData object with n_obs × n_vars = 1492 × 50}

Obviously C annData is the largest one, but it doesn't mean C is always the largest, for my data:

## dict_adata
{'C18': AnnData object with n_obs × n_vars = 3285 × 50,
 'C17': AnnData object with n_obs × n_vars = 761 × 50,
 'G': AnnData object with n_obs × n_vars = 3000 × 50,
 'C16': AnnData object with n_obs × n_vars = 2080 × 50,
 'C5': AnnData object with n_obs × n_vars = 1988 × 50,
 'C6': AnnData object with n_obs × n_vars = 1835 × 50,
 'C15': AnnData object with n_obs × n_vars = 597 × 50,
 'C13': AnnData object with n_obs × n_vars = 2418 × 50,
 'C7': AnnData object with n_obs × n_vars = 659 × 50,
 'C10': AnnData object with n_obs × n_vars = 3673 × 50,
 'C8': AnnData object with n_obs × n_vars = 523 × 50,
 'C20': AnnData object with n_obs × n_vars = 147 × 50,
 'C22': AnnData object with n_obs × n_vars = 1038 × 50,
 'C9': AnnData object with n_obs × n_vars = 2437 × 50,
 'C12': AnnData object with n_obs × n_vars = 1298 × 50,
 'C4': AnnData object with n_obs × n_vars = 1774 × 50,
 'C21': AnnData object with n_obs × n_vars = 1165 × 50,
 'C2': AnnData object with n_obs × n_vars = 1995 × 50,
 'C11': AnnData object with n_obs × n_vars = 3527 × 50,
 'C14': AnnData object with n_obs × n_vars = 1912 × 50,
 'C19': AnnData object with n_obs × n_vars = 436 × 50,
 'C3': AnnData object with n_obs × n_vars = 1379 × 50,
 'C': AnnData object with n_obs × n_vars = 497 × 50}

Thus the script should be:

batch_size_si = dict(zip(list(dict_adata.keys()),
                            [dict_adata[i].shape[0] for i in dict_adata.keys()]))
adata_ref = dict_adata[max(batch_size_si, key=batch_size_si.get)]

Thanks for the correction and the pull request!

Zehua