something wrong

Question

something wrong

yuppx opened this issue 6 years ago · 10 comments

when I run sst, [errno 2] no such file or directory: '/decaNLP/.data/sst/train_binary_sent.csv'
second when I run multinli.in.out [errno 2] no such file or directory'/decaNLP/.data/multinli/multinli_1.0/train.jsonl' third ，missing ‘/decaNLP/.data/multinli/multinli_1.0/train.jsonl’ at last, Can you tell me the download address code of these data sets? thank you

Answer 1 · 2018-07-18T22:55:48.000Z

Downloads should happen automatically, but the urls are in text/torchtext/datasets/generic.py. Can you delete .data and send me all the output of running on sst from start until you get the error?

Answer 2 · 2018-07-19T10:41:20.000Z

Are all downloaded training data like 'multinli_1.0.zip' in the ｔｈｅ folder '.embeddings'?

Answer 3 · 2018-07-19T12:37:56.000Z

What should I do if I want to change these training data to other languages, such as Japanese?

Answer 4 · 2018-07-29T01:51:56.000Z

@bmccann WinogradSchema doesn't have a url. I only found this data file but it doesn't seem to match with what the script does. Would you please update it? Thank you.

Answer 5 · 2018-07-29T01:56:12.000Z

@yuppx the data is downloaded into .data
If you would like to add additional datasets, you can see how I do that in text/torchtext/datasets/generic.py

@xuf12 please see the Getting Started section of the README.md. Make sure to run the commands related to local_data/scheme.txt. That’s the file you are looking for.

Answer 6 · 2018-07-29T02:20:11.000Z

Ah sorry I overlooked that part. Thanks!

Answer 7 · 2018-07-29T02:34:33.000Z

How to make a test set, for example, after the machine translation training is completed, let him really translate an English sentence.

Answer 8 · 2018-08-09T00:13:44.000Z

If you look in text/torchtext/datsaets/generic.py, you can see how I add different datasets. The WinogradSchema dataset would be a good example for one that uses a local file. I'll see if I can add a generic data loader for a specific format so you don't have to add custom classes.

Answer 9 · 2018-08-16T19:54:29.000Z

@yuppx

This should help with your desire to test on new examples.

Inference on a Custom Dataset

Using a pretrained model or a model you have trained yourself, you can run on new, custom datasets easily by following the instructions below. In this example, we use the checkpoint for the best MQAN trained on the entirety of decaNLP (see the section on Pretrained Models to see how to get this checkpoint) to run on my_custom_dataset.

mkdir .data/my_custom_dataset/
touch .datda/my_custom_dataset/val.jsonl
#TODO add examples line by line to val.jsonl in the form of a JSON dict: {"context": "The answer is answer.", "question": "What is the answer?", "answer": "answer"}
nvidia-docker run -it --rm -v `pwd`:/decaNLP/  decanlp bash -c "python /decaNLP/predict.py --evaluate valid --path /decaNLP/mqan_decanlp_qa_first --checkpoint_name model.pth --gpu 0 --tasks my_custom_dataset"

Answer 10 · 2018-08-17T19:36:55.000Z

If you want to run without any answers, you can leave those blank ("answer": ""), and just ignore the metrics. The predictions for each example in your custom dataset will still be written to file.