Challenge Submission

This is my submission to the NVIDIA RTX 4090 track of the NeurIPS 2023 Large Language Model Efficiency Challenge: 1 LLM + 1 GPU + 1 Day.

Submission Details

There are three variants to be submitted. Each has a Dockerfile located in its directory.

inference
inference2
inference3

Further information on the finetuning data and procedure is coming soon.

Running Inference Server

Ensure the NVIDIA Container Toolkit is installed.

Build container.

cd inference
docker build -t neurips_submission .

Run

docker run --gpus all -p 8080:80 neurips_submission

Example API request.

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "What is the meaning of life, the universe, and everything?","echo_prompt":0}' http://localhost:8080/proces

{"text":"The answer is 42.","tokens":[],"logprob":0.0,"request_time":0.766957417014055}

Evaluation

Evaluation is performed with the HELM project.

Install HELM.

pip install git+https://github.com/stanford-crfm/helm.git

Run an evaluation with a run_specs.conf file.

helm-run --conf-paths run_specs.conf --suite v1 --max-eval-instances 10
helm-summarize --suite v1

View the results.
```
helm-server
```

matthewdouglas/NeurIPS-LLM-Challenge-2023

Challenge Submission

Submission Details

Running Inference Server

Evaluation