salesforce/decaNLP

something wrong

yuppx opened this issue · 10 comments

yuppx commented

when I run sst, [errno 2] no such file or directory: '/decaNLP/.data/sst/train_binary_sent.csv'
second when I run multinli.in.out [errno 2] no such file or directory'/decaNLP/.data/multinli/multinli_1.0/train.jsonl' third ,missing ‘/decaNLP/.data/multinli/multinli_1.0/train.jsonl’ at last, Can you tell me the download address code of these data sets? thank you

Downloads should happen automatically, but the urls are in text/torchtext/datasets/generic.py. Can you delete .data and send me all the output of running on sst from start until you get the error?

yuppx commented

Are all downloaded training data like 'multinli_1.0.zip' in the the folder '.embeddings'?

yuppx commented

What should I do if I want to change these training data to other languages, such as Japanese?

xuf12 commented

@bmccann WinogradSchema doesn't have a url. I only found this data file but it doesn't seem to match with what the script does. Would you please update it? Thank you.

@yuppx the data is downloaded into .data
If you would like to add additional datasets, you can see how I do that in text/torchtext/datasets/generic.py

@xuf12 please see the Getting Started section of the README.md. Make sure to run the commands related to local_data/scheme.txt. That’s the file you are looking for.

xuf12 commented

Ah sorry I overlooked that part. Thanks!

yuppx commented

How to make a test set, for example, after the machine translation training is completed, let him really translate an English sentence.

If you look in text/torchtext/datsaets/generic.py, you can see how I add different datasets. The WinogradSchema dataset would be a good example for one that uses a local file. I'll see if I can add a generic data loader for a specific format so you don't have to add custom classes.

@yuppx

This should help with your desire to test on new examples.

Inference on a Custom Dataset

Using a pretrained model or a model you have trained yourself, you can run on new, custom datasets easily by following the instructions below. In this example, we use the checkpoint for the best MQAN trained on the entirety of decaNLP (see the section on Pretrained Models to see how to get this checkpoint) to run on my_custom_dataset.

mkdir .data/my_custom_dataset/
touch .datda/my_custom_dataset/val.jsonl
#TODO add examples line by line to val.jsonl in the form of a JSON dict: {"context": "The answer is answer.", "question": "What is the answer?", "answer": "answer"}
nvidia-docker run -it --rm -v `pwd`:/decaNLP/  decanlp bash -c "python /decaNLP/predict.py --evaluate valid --path /decaNLP/mqan_decanlp_qa_first --checkpoint_name model.pth --gpu 0 --tasks my_custom_dataset"

If you want to run without any answers, you can leave those blank ("answer": ""), and just ignore the metrics. The predictions for each example in your custom dataset will still be written to file.