This repository details the steps to run inference on Meta's OPT 175B model using the HuggingFace library.
- The jupyter notebook
OPT175B.ipynb
can be used to run inference on the OPT 175B model - This uses model parallelism (splits model into 9 A100 GPUs on internal server)
- The notebook should be runnable on the server after installing the dependencies below
weights_path
should point to the path where model weights are stored (in HF format)
pip3 install torch --extra-index-url https://download.pytorch.org/whl/cu113
pip3 install transformers accelerate
pip3 install ipywidgets jupyterlab
Steps to convert OPT weights from Meta's format to HuggingFace's format (Can be skipped if converted weights are available)
- Obtain OPT 175B weights from Meta
- Convert model parts into a single file using metaseq
python metaseq/scripts/consolidate_fsdp_shards.py ${FOLDER_PATH}/checkpoint_last --new-arch-name transformer_lm_gpt --save-prefix ${FOLDER_PATH}/consolidated
- Use conversion script from HF to convert this into HF format
python convert_opt_original_pytorch_checkpoint_to_pytorch.py --pytorch_dump_folder_path <path/to/dump/hf/model> --hf_config config.json --fairseq_path <path/to/restored.py>
[1] The jupyter notebook is an adaptation of this colab from HF