something wrong
yuppx opened this issue · 10 comments
when I run sst, [errno 2] no such file or directory: '/decaNLP/.data/sst/train_binary_sent.csv'
second when I run multinli.in.out [errno 2] no such file or directory'/decaNLP/.data/multinli/multinli_1.0/train.jsonl' third ,missing ‘/decaNLP/.data/multinli/multinli_1.0/train.jsonl’ at last, Can you tell me the download address code of these data sets? thank you
Downloads should happen automatically, but the urls are in text/torchtext/datasets/generic.py. Can you delete .data and send me all the output of running on sst from start until you get the error?
Are all downloaded training data like 'multinli_1.0.zip' in the the folder '.embeddings'?
What should I do if I want to change these training data to other languages, such as Japanese?
@bmccann WinogradSchema doesn't have a url. I only found this data file but it doesn't seem to match with what the script does. Would you please update it? Thank you.
@yuppx the data is downloaded into .data
If you would like to add additional datasets, you can see how I do that in text/torchtext/datasets/generic.py
@xuf12 please see the Getting Started section of the README.md. Make sure to run the commands related to local_data/scheme.txt. That’s the file you are looking for.
Ah sorry I overlooked that part. Thanks!
How to make a test set, for example, after the machine translation training is completed, let him really translate an English sentence.
If you look in text/torchtext/datsaets/generic.py, you can see how I add different datasets. The WinogradSchema dataset would be a good example for one that uses a local file. I'll see if I can add a generic data loader for a specific format so you don't have to add custom classes.
This should help with your desire to test on new examples.
Inference on a Custom Dataset
Using a pretrained model or a model you have trained yourself, you can run on new, custom datasets easily by following the instructions below. In this example, we use the checkpoint for the best MQAN trained on the entirety of decaNLP (see the section on Pretrained Models to see how to get this checkpoint) to run on my_custom_dataset.
mkdir .data/my_custom_dataset/
touch .datda/my_custom_dataset/val.jsonl
#TODO add examples line by line to val.jsonl in the form of a JSON dict: {"context": "The answer is answer.", "question": "What is the answer?", "answer": "answer"}
nvidia-docker run -it --rm -v `pwd`:/decaNLP/ decanlp bash -c "python /decaNLP/predict.py --evaluate valid --path /decaNLP/mqan_decanlp_qa_first --checkpoint_name model.pth --gpu 0 --tasks my_custom_dataset"
If you want to run without any answers, you can leave those blank ("answer": ""), and just ignore the metrics. The predictions for each example in your custom dataset will still be written to file.