A simple method to provision RHOAI on Single Node OpenShift to try out different quantized LLM's including meta's llama2,3 and ibm/redhat granite models.
We use a g6.4xlarge on aws spot - which comes with a modern Nvidia L4 (24GB), 16 vCPU, 64 GiB RAM.
Running OpenShift 4.15 Single Node. We configure Nvidia time slicing to parallel share the GPU for running jupyter notebooks and model serving.
Install OCP using SNO on SPOT.
export AWS_PROFILE=sno-llama
export AWS_DEFAULT_REGION=us-east-2
export AWS_DEFAULT_ZONES=["us-east-2c"]
export CLUSTER_NAME=sno
export BASE_DOMAIN=sandbox.opentlc.com
export PULL_SECRET=$(cat ~/tmp/pull-secret)
export SSH_KEY=$(cat ~/.ssh/id_rsa.pub)
export INSTANCE_TYPE=g6.4xlarge
export ROOT_VOLUME_SIZE=200
export OPENSHIFT_VERSION=4.15.9
mkdir -p ~/tmp/sno-${AWS_PROFILE} && cd ~/tmp/sno-${AWS_PROFILE}
curl -Ls https://raw.githubusercontent.com/eformat/sno-for-100/main/sno-for-100.sh | bash -s -- -d
Bootstrap ArgoCD operator and everything using gitops (GPU, Cluster PerfEnhancements, CertManager, GPU Setup, LVM+Noobaa/S3 Storage, RHOAI). Your SNO will reboot for MachineConfig updates.
./gitops/install.sh -d
Create Users using htpasswd. Delete's the kubeadmin user.
./gitops/users.sh
Install Let's Encrypt certificates for api, apps - using CertManager and Route53.
./gitops/certificates.sh
Scale the RHOAI Platform down a bit so we free up some cpu.
./gitops/scale-resources.sh
The manual instructions are still here if you want to run them.
Now open RHOAI and Login.
Run the jupyter Notebook - "PyTorch, CUDA v11.8, Python v3.9, PyTorch v2.0, Small, 1 NVIDIA GPU Accelerator".
Make sure you give your notebook plenty of local storage (50-100GB).
You can login as admin or admin2 and work on each notebook separately to see GPU timeslicing in action.
Meta's llama-2 model.
Open the sno-llama2.ipynb notebook and have a play.
Meta's llama-3 model.
Open the sno-llama3.ipynb notebook and have a play.
InstructLab's opensource granite model.
Open the sno-granite.ipynb notebook and have a play.
Deploy your own IDE python coding assistant.
Open the sno-code-llama.ipynb notebook and have a play.
How can we start to remeber previous chat contexts using llama.cpp
Open the sno-prompt-cache.ipynb notebook and have a play.
Use RHOAI try out instructlab using a notebook image. See Instructlab README.md
Open the sno-instructlab.ipynb notebook and have a play.
Use RHOAI to serve the models with a llama-cpp custom runtime. See Serving README.md
If you no longer need your instance, to remove all related aws objects just run inside your $RUNDIR
.
openshift-install destroy cluster --dir=cluster