IBM/TabFormer

Missing args.py

reactiv opened this issue · 7 comments

Hello!

The code and readme both reference an args.py but it doesn't seem to be committed?

Thanks for publishing your code!

jlko commented

Hello!
I would also like to try to run the code, and have run into the same troubles as @reactiv.
If you could upload the args.py that would be great.
However, please make sure, that the code actually works, after uploading the args.py.
Here are some things I have noticed, which I think would need to be changed, before the code can be run successfully, even after adding the args.py (I guessed the values for the args.py to get the code to run a bit.):
Here are some things I have figured out so far

  • You have to downgrade to transformers==3.
  • You have to download and put the dataset in the correct folder (It seemed to work for PRSA for me, but could you maybe add documentation for how exactly you expect that to be done?)
  • random_split_dataset has to be imported from misc.utils and not from dataset.dataset
  • tabformer_bert.py is missing some imports from transformers
  • But even if you get to this point and run TabFormerBertLM with prsa data, the code seems to run for a while (preprocessing the data, I think) but then I I eventually get No such file or directory: 'output/vocab.nb from vocal.py.

Thanks for publishing the code. Could you maybe help us get it running? Any help is much appreciated!

Hi @reactiv and @jlko -
Thank you for your interest in the work. I think I overlooked adding args.py in the public version. In the latest commit, I have added it.

@jlko : I am looking at your other comments now and will update the code and documentation that can help run it better.

Hello folks,

The latest PR should cover all the issues that you were facing. Give it a try and lemme know if you are still struggling.

Also, regarding the PRSA dataset - please download them from https://www.kaggle.com/sid321axn/beijing-multisite-airquality-data-set and place it on /data/prsa/. I have verified the runs for both card-data and prsa dataset at my end, they are training properly. However, if you struggle to get it work even now, I would be more than happy to take a look.

jlko commented

Dear ink-pad,

wow, thank you, that is lovely. Will try out soon and get back to you!
Thanks for getting back to us so quickly and comprehensively.

Best,
Jannik

Hi @ink-pad!

Thanks for getting back so quickly - I've had a look and managed to get it running for both datasets.

Thanks again for sharing your research!

Hello,
I have been trying to run the code but I am getting an error regarding the output directory. Please let me know if I am doing any mistake. The vocab. nb and logs folders are created in the output folder by the code but still I am getting this error. Kindly let me know.

Thanks and Regards
Navya

!python3 "/content/drive/My Drive/TabFormer-main/TabFormer-main/main.py" --do_train --mlm --field_ce --lm_type bert --field_hs 64 --data_type "prsa" --output_dir "/content/drive/My Drive/TabFormer-main/TabFormer-main/output/"

ValueError: Can't find a valid checkpoint at /content/drive/My Drive/TabFormer-main/TabFormer-main/output/

Hello, I have been trying to run the code but I am getting an error regarding the output directory. Please let me know if I am doing any mistake. The vocab. nb and logs folders are created in the output folder by the code but still I am getting this error. Kindly let me know.

Thanks and Regards Navya

!python3 "/content/drive/My Drive/TabFormer-main/TabFormer-main/main.py" --do_train --mlm --field_ce --lm_type bert --field_hs 64 --data_type "prsa" --output_dir "/content/drive/My Drive/TabFormer-main/TabFormer-main/output/"

ValueError: Can't find a valid checkpoint at /content/drive/My Drive/TabFormer-main/TabFormer-main/output/

Have you rectifed it ?