/neurIPS_submission

Initial Submission for neurIPS LLM Efficiency challenge

Primary LanguagePython

NeurIPS LLM Efficiency Challenge.

This repo contains my submission for neurIPS LLM efficiency challenge. There are 3 submissions and each has its own dockerfile (Dockerfile, Dockerfile_2,Dockerfile_3) which run different combinations of adapters on the same model. Please make sure that no other process is consuming GPU while you run this.

Info

How to Run

Note: If you want to run finetuning as well, follow: To build the Image, run

docker build -f Dockerfile.train -t neurips_train .

To run the finetuning (tunes multiple adapters with diff configs on diff datasets, might take close to 20h)

docker run --gpus "device=0" -p 8080:80 --rm -ti neurips_train

Note: The submission that looks to have qualified is the 2nd one. That doesn't need to train books_adapter. The other two submissions (1 and 3) have better scores in many scenarios but unfortunately, have a few NULL for a couple of datasets. Those submissions do make use of books_adapter. If you want to train books_adapter, set TRAIN_BOOKS in Dockerfile.train to true. The reason behind those NULL might be beam search. For 2nd submission, the training runs under 24h (without books_adapter). For 1st and 3rd submission, training runs under 24h cuz they don't use cnn_adapter. Its basically a 1-1 swap.

To run the inference with new artifacts, build the image using

docker build -f Dockerfile.final -t neurips_repro .

And run the same image using

docker run --gpus "device=0" -p 8080:80 --rm -ti neurips_repro

To build the Image, run

docker build -f Dockerfile -t neurips_inference .

For 2nd and 3rd submissions, please run

docker build -f Dockerfile_2 -t neurips_inference .
docker build -f Dockerfile_3 -t neurips_inference .

To start the server up and make it ready for inference, run

docker run -v --gpus "device=0" -p 8080:80 --rm -ti neurips_inference

This will start the server on port 8080. Once the server is up, you can start sending requests via HELM.