llama.cpp-docker-inference-endpoint: A Dockerfile repository from ngxson

How to build and push

Pre-requirements:

For image tag name: it will be in format of {your_docker_username}/llamacpp-server

In this example, we will use ngxson as {your_docker_username}

To select a specific branch or commit

Modify Dockerfile:

RUN git clone https://github.com/ggerganov/llama.cpp -b master --depth 1 .

Change -b to the branch you want, for example:

RUN git clone https://github.com/ggerganov/llama.cpp -b gg/phi-2 --depth 1 .

Additionally, if you want to select a specific commit, you can add git reset. For example:

RUN git clone https://github.com/ggerganov/llama.cpp -b master --depth 1 . && git reset --hard 911b437

Build and push to docker hub

You need to re-build the image whenever you make a change in Dockerfile (for example, change the git branch)

docker build --platform linux/amd64 -t ngxson/llamacpp-server .
docker push ngxson/llamacpp-server

After a while, the image will appear on your docker hub account. You can then use it to create inference endpoint.