/unifew

Unifew: Unified Fewshot Learning Model

Primary LanguagePythonApache License 2.0Apache-2.0

UniFew

UniFew: Unified Few-shot Learning Model

Installation

The main dependeny of this code is the fewshot package in the enclosed flex repo.
Please first follow flex installation instructions to install the fewshot pacakge. Additional dependencies are in the requirements.

git clone git@github.com:allenai/unifew.git
cd unifew

# optionally create a virtualenv with conda
conda create --name unifew python=3.8
# activate
conda activate unifew

# install flex from the flex repo
mkdir dependencies && cd dependencies
git clone git@github.com:allenai/flex.git
cd flex && pip install -e .

# then install main requirements
cd ../..
pip install -r requirements.txt

Meta-testing on Flex

You can meta-test the model on the flex benchmark with the following command:

CUDA_VISIBLE_DEVICES=0 python test.py challenge=flex +hydra.run_dir=output/

This will run the model and save predictions in the output/ directory.

You can use multiple GPUs to predict on different slices of the fleet challenge by specifing additional start and stop arguments.

If you wish to use a model that you have previously meta-trained, simply provide the relevant model.ckpt_path=/full/path/to/checkpoint.ckpt argument.

Meta-trained checkpoint

You can download a meta-trained checkpoint from below:
wget https://fleet-public.s3.us-west-2.amazonaws.com/unifew-meta-trained.ckpt

Expected download size 8.3G
md5 d44e3a7dc0658752035183d0805fdae9

Speed up meta-testing by parallelizing predictions

Meta-testing on single GPU can be slow. Depending on number of available GPUs you can manually devide the meta-test episodes between the gpu and get corresponding predictions for each split.
To do so, you can use the start and end arguments to the test.py script.
Let's say we have 4 gpus. We launch the following commands.

CUDA_VISIBLE_DEVICES=0 python test.py challenge=flex +hydra.run_dir=output/ start=0 stop=500  
CUDA_VISIBLE_DEVICES=1 python test.py challenge=flex +hydra.run_dir=output/ start=500 stop=1000
CUDA_VISIBLE_DEVICES=2 python test.py challenge=flex +hydra.run_dir=output/ start=1000 stop=1500
CUDA_VISIBLE_DEVICES=3 python test.py challenge=flex +hydra.run_dir=output/ start=1500

This would run the specified meta-test episodes for each gpu.
You would end up with four different prediction files in the output directory with name predictions_[start]-[stop].

You can merge these prodections with fewshot merge command from the flex repo:
fewshot merge predictions_* predictions-merged.json

Then you can score predictions with the flex package:
https://github.com/allenai/flex#result-validation-and-scoring

Meta-traininig the model

To meta-train the model, use the following command:

CUDA_VISIBLE_DEVICES=0 python train.py hydra.run.dir=tmp/model/ trainer.max_steps=30000 query_batch_size=4 +sampler.min_way=2

Additional arguments can be found under 'conf/train.yaml' and 'conf/test.yaml'.

Citing

If you find this repo useful, please cite our preprint:

@misc{bragg2021flex,
      title={FLEX: Unifying Evaluation for Few-Shot NLP},
      author={Jonathan Bragg and Arman Cohan and Kyle Lo and Iz Beltagy},
      year={2021},
      eprint={2107.07170},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}