Question about Model

Question

Question about Model

gdolsten opened this issue 2 years ago · 11 comments

Hi, I am reading through the code and trying to understand how the model works. Can you clarify to me what is the meaning of batch.pop("Y") ? What does batch.pop do? What is the data type of batch?

class Module(pl.LightningModule):
    def training_step(self, batch, batch_idx):
        Y = batch.pop("Y")
        logits = self(**batch)
        loss = self.loss(logits, Y)
        self.log("train/loss", loss)
        return loss

Answer 1 · 2022-12-09T21:05:21.000Z

Nevermind, I see now:

    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        x = row.seq
        # x = x[400:600]
        # x = x[300:700]
        x = self.tokenizer(
            x,
            padding="max_length",
            max_length=self.max_length,
            return_token_type_ids=False,
            return_tensors="pt",
            truncation=True,
        )
        d = dict(
            input_ids=x["input_ids"].flatten(),
            attention_mask=x["attention_mask"].flatten(),
            Y=torch.tensor(row[self.features].values.astype(np.uint8)),
        )

Answer 2 · 2022-12-09T23:48:19.000Z

Alright, will close for now. Let me know if you have any other questions!

Answer 3 · 2022-12-10T18:48:24.000Z

Thanks so much!

I was just wondering if you could explain the reason why GPNDataModule is used in /chromatin/trainer.py but Trainer is used with DataCollatorForLanguageModelingSpan in /mlm/run_mlm_custom.py? What is the difference between these two training scripts? Is the /chromatin/ for fine tuning and the /mlm/ for the language model task?

Answer 4 · 2022-12-12T02:56:59.000Z

Another question – in run_mlm_custom you have:
config = CONFIG_MAPPING[model_args.model_type]()
And when running the model you have: --model_type ConvNet \
But I get the following error:

config = CONFIG_MAPPING['ConvNet']()
KeyError: 'ConvNet'

Answer 5 · 2022-12-12T16:42:33.000Z

One more question:

        self.conv = nn.Sequential(
            TransposeLayer(),
            nn.Conv1d(
                in_channels=hidden_size,
                out_channels=hidden_size,
                padding="same",
                **kwargs,
            ),
            TransposeLayer(),
            nn.GELU(),
            nn.LayerNorm(hidden_size),
        )

Why do you have a TransposeLayer() here? Won't the convolution output be the same with our without a transpose? (since the convolution itself would just be transposed, no?)

Secondly, you have:

class OneHotEmbedding(nn.Module):
    def __init__(self, hidden_size=None):
        super().__init__()
        self.hidden_size = hidden_size

    def forward(self, x):
        return F.one_hot(x, num_classes=self.hidden_size).float()

with

class ConvNetConfig(PretrainedConfig):
    def __init__(
        self,
        vocab_size=7,
        hidden_size=512,
        n_layers=30,
        kernel_size=9,
        dilation_double_every=1,
        dilation_max=32,
        dilation_cycle=6,
        initializer_range=0.02,
        **kwargs
    ):

Is there a reason you are initializing the OneHotEmbedding in 512 dimensions?

Answer 6 · 2022-12-12T20:42:54.000Z

Thanks so much!

I was just wondering if you could explain the reason why GPNDataModule is used in /chromatin/trainer.py but Trainer is used with DataCollatorForLanguageModelingSpan in /mlm/run_mlm_custom.py? What is the difference between these two training scripts? Is the /chromatin/ for fine tuning and the /mlm/ for the language model task?

That's right, everything mlm/ is pre-training and chromatin/ is fine-tuning. Unfortunately, we used a different framework for each, Huggingface for pre-training and Pytorch Lightning for fine-tuning. I'm hoping to re-organize the code in the next 2 months, using just Huggingface.

Answer 7 · 2022-12-12T20:45:14.000Z

Another question – in run_mlm_custom you have: config = CONFIG_MAPPING[model_args.model_type]() And when running the model you have: --model_type ConvNet \ But I get the following error:
config = CONFIG_MAPPING['ConvNet']()
KeyError: 'ConvNet'

Usually import gpn.mlm makes sure that 'ConvNet' is registered. Could you provide some more details on how you run the script?

Answer 8 · 2022-12-12T20:50:30.000Z

One more question:

        self.conv = nn.Sequential(
            TransposeLayer(),
            nn.Conv1d(
                in_channels=hidden_size,
                out_channels=hidden_size,
                padding="same",
                **kwargs,
            ),
            TransposeLayer(),
            nn.GELU(),
            nn.LayerNorm(hidden_size),
        )

Why do you have a TransposeLayer() here? Won't the convolution output be the same with our without a transpose? (since the convolution itself would just be transposed, no?)

Secondly, you have:

class OneHotEmbedding(nn.Module):
    def __init__(self, hidden_size=None):
        super().__init__()
        self.hidden_size = hidden_size

    def forward(self, x):
        return F.one_hot(x, num_classes=self.hidden_size).float()

with

class ConvNetConfig(PretrainedConfig):
    def __init__(
        self,
        vocab_size=7,
        hidden_size=512,
        n_layers=30,
        kernel_size=9,
        dilation_double_every=1,
        dilation_max=32,
        dilation_cycle=6,
        initializer_range=0.02,
        **kwargs
    ):

Is there a reason you are initializing the OneHotEmbedding in 512 dimensions?

The transpose layer is to make sure the tensor dimensions are compatible with the different operations. For example, conv1d may expect batch,channels,position while layernorm may expect batch,position,channels.

The OneHotEmbedding into 512 dimensions is just to simplify all the convolutional layers to have the same dimensions... It's certainly a waste of parameters and compute.

Answer 9 · 2022-12-13T01:03:09.000Z

Thanks, with your help I understand all of these! I now have the model running, but I just wanted to check how long does one batch of 128 take (for the MLM task) on a single GPU for you?

Answer 10 · 2022-12-18T14:09:54.000Z

Hey, would love to know this^ for benchmarking purposes

Answer 11 · 2022-12-21T14:28:49.000Z

Hey! Sorry I don't have everything set up to easily check this scenario right now.