state-spaces/s4

Validation dataset is missing when training IMDB

mathematicallfs opened this issue · 7 comments

When training the IMDB dataset, I got an error that 'An invalid dataloader was returned from SequenceLightningModule.val_dataloader(). Found None.' It seems that this is because in the LRA paper the validation set is the same as the test set for the IMDB training, but in the lra.py file
https://github.com/HazyResearch/state-spaces/blob/a246043077dbff8563a6b172426443dced9a9d96/src/dataloaders/lra.py#L71, self.dataset_val is set to None.
Maybe should change to self.dataset_test?

It ran fine for me. Are you on the correct version of Pytorch Lightning? I did have to make a small change to monitor the ModelCheckpoint callback on a different key (or alternatively turn off the model checkpointing in the callbacks)

It ran fine for me. Are you on the correct version of Pytorch Lightning? I did have to make a small change to monitor the ModelCheckpoint callback on a different key (or alternatively turn off the model checkpointing in the callbacks)

I am using pytorch-lightning=2.0.8, is this also a correct version?

It ran fine for me. Are you on the correct version of Pytorch Lightning? I did have to make a small change to monitor the ModelCheckpoint callback on a different key (or alternatively turn off the model checkpointing in the callbacks)

I also tried the correct version of Pytorch Lightning, say 2.0.4, but got the same error that the val_dataloader is None.
For the 2.0.8 version of pl, I tried to set self.dataset_val = self.dataset_test, and it worked fine for me, and I can get a 88% accuracy with the default training script. But the intriguing thing is that when I trained the V1 version, I can only get a 78% accuracy for the IMDB dataset, and this result is far worse than the one reported in the S4 paper. I have no idea on why this happened.

I think this version of the repo was last updated for pytorch-lightning==1.9.3, which you can see from the requirements.txt file.

For training with versions prior to v4 of this codebase, you need to recompile the Cauchy/Vandermonde CUDA kernels if you're using those. That might explain numerical issues. But I think also the very first version of the paper used very suboptimal hyperparameters for IMDB (LRA-Text) that gets much lower performance. Why are you using v1?

I think this version of the repo was last updated for pytorch-lightning==1.9.3, which you can see from the requirements.txt file.

For training with versions prior to v4 of this codebase, you need to recompile the Cauchy/Vandermonde CUDA kernels if you're using those. That might explain numerical issues. But I think also the very first version of the paper used very suboptimal hyperparameters for IMDB (LRA-Text) that gets much lower performance. Why are you using v1?

Thanks for your comments. I am using v1 because the model in v1 is smaller and thus easier for me to tune some hyparameters.

I would not recommend using earlier versions since a lot has changed since then. Model size is easily adjustable (set model.n_layers or model.d_model).

I would not recommend using earlier versions since a lot has changed since then. Model size is easily adjustable (set model.n_layers or model.d_model).

I see, thanks for your suggestion!