This is my submission to the NVIDIA RTX 4090 track of the NeurIPS 2023 Large Language Model Efficiency Challenge: 1 LLM + 1 GPU + 1 Day.
There are three variants to be submitted. Each has a Dockerfile
located in its directory.
inference
inference2
inference3
Further information on the finetuning data and procedure is coming soon.
-
Ensure the NVIDIA Container Toolkit is installed.
-
Build container.
cd inference docker build -t neurips_submission .
-
Run
docker run --gpus all -p 8080:80 neurips_submission
-
Example API request.
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "What is the meaning of life, the universe, and everything?","echo_prompt":0}' http://localhost:8080/proces
{"text":"The answer is 42.","tokens":[],"logprob":0.0,"request_time":0.766957417014055}
Evaluation is performed with the HELM project.
- Install HELM.
pip install git+https://github.com/stanford-crfm/helm.git
- Run an evaluation with a
run_specs.conf
file.helm-run --conf-paths run_specs.conf --suite v1 --max-eval-instances 10 helm-summarize --suite v1
- View the results.
helm-server