TencentAILabHealthcare/scBERT

Question about the SCDataset class

Opened this issue · 0 comments

Hi there,

Thanks for the amazing work! When I was reading the source code, I got one question in mind and couldn't find an answer myself. It will great if any of you can help!

So in the SCDataset class, when a single cell is being sampled, why it appends a 0 at the end of the gene expression vector?

class SCDataset(Dataset):
    def __init__(self, data):
        super().__init__()
        self.data = data

    def __getitem__(self, index):
        rand_start = random.randint(0, self.data.shape[0]-1)
        full_seq = self.data[rand_start].toarray()[0]
        full_seq[full_seq > (CLASS - 2)] = CLASS - 2
        full_seq = torch.from_numpy(full_seq).long()
        ############################################################
        full_seq = torch.cat((full_seq, torch.tensor([0]))).to(device)
        ############################################################

        return full_seq

    def __len__(self):
        return self.data.shape[0]

Thanks!