snap-stanford/GEARS

error about prepare_split

Closed this issue · 4 comments

This work is excellent and I am trying to replicate your results. However, I met the error at the very begining. It will be very kind of you for helping me solve out this problem. The follow is my code and error:
image
image

Not sure why you see that error. I just ran this piece of code and everything seemed to work fine

import sys
sys.path.append('../')

from gears import PertData

pert_data = PertData('./data') # specific saved folder
pert_data.load(data_name = 'adamson') # specific dataset name
pert_data.prepare_split(split = 'simulation', seed = 1) # get data split with seed
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128) # prepare data loader

Thanks for your kindness and patience, I think I found the reason.
You may write the code for MacOS or Linux, but I run your code on a windows laptop. So, the output of the following code in your perdata.py line 166
data_path = os.path.join(self.data_path, data_name)
is './data\norman' instead of './data/norman'
thus,
self.dataset_name = data_path.split('/')[-1]
here is 'data\norman' instead of 'norman'.

I run this code
pert_data.dataset_name = 'norman'
before
pert_data.prepare_split(split = 'simulation', seed = 1)
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128)

and fix the problem.
But I'm not sure whether there are other potential errors.

Also, I'm curious about the prediciton of GEARS. As you mentioned in issue#47, the output is [perturbation_categories, genes]. But in your code, correct me if i'm wrong, you add the new perturb on all unperturbed cells (self.ctrl_adata), then you take the average of all perturbed cells. Dose that means technically you could provide the single-cell resolution prediction? Could you please share with me why you give up the high resolution but use the average?

Yes, we haven't tested our code on Windows machines so there may be some unexpected behavior in how paths are defined.

Thanks for your question regarding single-cell prediction. We avoid making predictions at the single-cell level because the mapping between control cell and perturbed cells is random. We don't have ground truth information on how each cell responds to perturbation. Thus, both the metric computation and the model predictions are at the level of population averages.