snap-stanford/GEARS

Question about how to generate condition colname?

Closed this issue · 3 comments

Hi Yusuf et al.

Assuming my adata has 3000 cells, and i want to perturba two genes(geneA & geneB, i know it's not well to train models with few genes). is the following code reasonable.

tem = []
tem.extend(list(np.repeat("geneA+ctrl", 1000)))
tem.extend(list(np.repeat("geneB+ctrl", 1000)))
tem.extend(list(np.repeat("ctrl", 1000)))
adata.obs = adata.obs.assign(condition = tem)

3000 cells, 1000 cells set as ctrl, and the remaining cells are evenly distributed to geneA and geneB, is it okay to do this?

Syntactically this is fine, but may not be a strong model

without considering other factors, can I use above code generate condition under this assumption?

your data:
b8430bbce7a18b1207c0412dc29caed

my data:
d0faac61939364172f8d0d184a8266e

in one of your adata.obs.condition, the distribution of number of perturbations and 'ctrl' seems to have no pattern at all;
in my own adata.obs.condition, i use the following code generate condition, cell number of perturbations and 'ctrl' are almost equal;

image

my question:

  1. can the above code be used to generate condition?
  2. does the distribution of cell numbers of perturbations affect the results, if so, how to determine the distribution of cell numbers of perturbations?

looking forward to your reply.