/gke-llm-serve

serve llama2 on L4s in GKE

Primary LanguageHCL

run llama2 7B on L4s in GKE

install knative serving CRDS

kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.13.1/serving-crds.yaml

install knative serving

kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.13.1/serving-core.yaml

use kourier networking layer

kubectl apply -f https://github.com/knative/net-kourier/releases/download/knative-v1.13.0/kourier.yaml
kubectl patch configmap/config-network \
  --namespace knative-serving \
  --type merge \
  --patch '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'