snap-stanford/GEARS

Predictions: expression or delta expression?

Closed this issue · 2 comments

Hi Yusuf et al., I have a really simple question. Running a small example for 5 epochs, I notice about 20% of the predictions are negative, even though the training data are all nonnegative. Does GEARS predict expression directly, or additive change in expression over the control? Example below.

from gears import PertData, GEARS
pert_data = PertData('./data', default_pert_graph=False)
pert_data.load(data_name = "dixit")
pert_data.prepare_split(split = 'simulation', seed = 5) 
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128) 
# set up and train a model
gears_model = GEARS(pert_data, device = 'cpu')
gears_model.model_initialize(hidden_size = 64)
gears_model.train(epochs = 5)

# predict
y = gears_model.predict([['USP13', 'USP15'], ['UTP6']])
(list(y.values())[0]>0).mean() # 0.82
yhr91 commented

Thanks yes this is a problem that we had not considered initially and I heard about recently. The final output is indeed expression but this is achieved through adding the delta expression onto control expression. It is possible that in some datasets this could result in a negative outcome. The fix is to clip values at 0 so they never go below that value. I can add that in the next pip update.

Thank you for confirming!

I had actually assumed it was a fold change, which in hindsight was pretty boneheaded on my end. But I am getting much better performance now that I am using the predictions correctly. I'm excited to update all my experiments and see where things stand.