chengkai-liu/Mamba4Rec

The results in the paper diverge significantly from the results reproduced by this repo.

Closed this issue ยท 11 comments

Hi authors.

I have downloaded the package and tested it with ml-1m dataset. The full logs are provided in the attachments.

The results of GRU4Rec seem similar with that in the paper, but the results of Mamba4Rec are poor.

Maybe there are something not updated in the current version?

reproduced: OrderedDict([('hit@10', 0.0096), ('ndcg@10', 0.0041), ('mrr@10', 0.0024)])
reported in the paper: hit@10: 0.3121, ndcg@10: 0.1822, mrr@10: 0.1425

Mamba4Rec-ml-1m-Mar-09-2024_20-09-16-ae14e3.log
GRU4Rec-ml-1m-Mar-09-2024_20-46-13-876bfd.log

Hello,

The current version works well on my device. You can also follow the code and implement Mamba4Rec based on your framework.

Your settings seem the same. But the accuracy of each epoch is close to 0. Maybe you need to check your environment such as RecBole and Mamba-ssm. My log of ML-1M is attached.

If you find the issue or bug, please update and I will fix it.

Thank you.

Mamba4Rec-ml-1m.log

Thank you very much for your swift reply.

These days, we also invited peers from other research groups with different hardware platform. Unfortunately, the results diverge significantly from the reported results in the paper. Therefore, I reopen this issue and slightly query the specific environments (This issue is non-trivial since the results should not change so significantly despite version difference of packages). Could you please kindly share the environment config file for reproduction by the community?

Big thanks!

Thank you. I will update the document and environment later.

Attached is the environment.txt file. You can convert it into a .yaml format. I have tried to reinstall the environment, and the training process and results are still correct on my device. Please feel free to reach out via email for further discussion.

Thank you for reporting the issue!

environment.txt

I have uploaded my environment.yaml to the repository.

My approach is to first create a new environment named mamba4rec with torch, torchaudio, and torchvision installed. Then I run conda env update -f environment.yaml --prune to update the environment with the remaining dependencies specified in the environment.yaml file.

Thank you very much for your swift reply.

These days, we also invited peers from other research groups with different hardware platform. Unfortunately, the results diverge significantly from the reported results in the paper. Therefore, I reopen this issue and slightly query the specific environments (This issue is non-trivial since the results should not change so significantly despite version difference of packages). Could you please kindly share the environment config file for reproduction by the community?

Big thanks!

Hi I am also struggling to reproduce the result, what is your results?
Here is my training log:

Mamba4Rec-ml-1m-Mar-13-2024_14-14-01-cb3b09.log

Here is my training log:

Mamba4Rec-ml-1m-Mar-13-2024_14-14-01-cb3b09.log

Thank you for your training log. I am looking into it, and it could possibly be due to differences in environment and versions. However, your results roughly match the performance of Mamba4Rec, at least the validation and test metrics are relatively high, compared to SASRec and BERT4Rec. However, each epoch is still different from the results on my device.

On the other hand, HowardZJU might have issues with environment configuration, and it's possible that the Mamba block failed to train successfully, resulting in NDCG scores close to 0 for each epoch.

Seems like the results can slightly fluctuate even when I fix the global seed. My reproducing on ml-1m got fluctuation around 0.0002 after the first epoch for different runs with the same seed. I guess this might be related to the hardware-level implementation of the mamba block. Hence the slight flucuation is ok?

My reproducing on ml-1m got fluctuation around 0.0002 after the first epoch for different runs with the same seed.

Yes, the minor fluctuations also persist on my device with the fixed hyperparameters and random seeds. Across different devices and environments, the fluctuations may be higher.

The fluctuations can sometimes lead to higher validation performance, but the corresponding overfitting can also cause test performance to decrease.

I have re-run Mamba4Rec on ML-1M from scratch, starting by cloning the repository and setting up a new conda environment. The results have been reproduced with slight fluctuations. Hopefully, this new log file attached is helpful to you.

Mamba4Rec-ml-1m-Mar-14-2024_09-08-06-4dc4b9.log

I have re-run Mamba4Rec on ML-1M from scratch, starting by cloning the repository and setting up a new conda environment. The results have been reproduced with slight fluctuations. Hopefully, this new log file attached is helpful to you.

Mamba4Rec-ml-1m-Mar-14-2024_09-08-06-4dc4b9.log

Thank you for replying! It seems like all baselines also perform slightly worse on my device lol