llama-stack-mirror/docs/source/distributions/k8s/ingress-k8s.yaml.template at 3511af7c33d32c580b318ca2d68bc6e2e91915a2 - phoenix-oss/llama-stack-mirror - Git for basel.kvant.cloud

phoenix-oss/llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-05 20:27:35 +00:00

Ashwin Bharambe 7fb4bdabea

docs(kubernetes): add more fleshed-out example of a Demo Kubernetes cluster (#2329 )

This Kubernetes cluster has:

- vLLM for serving an inference model
- vLLM for serving a safety model
- Postgres DB (for metadata and other state for the Llama Stack distro)
- Chroma DB for Vector IO (memory)

Perhaps most importantly, this was me trying to learn Kubernetes for the
first time.

## Test Plan

Run `sh apply.sh` against an EKS cluster, then after `kubectl
port-forward service/llama-stack-service 8321:8321` and after many
attempts, we have finally:

<img width="1589" alt="image"
src="https://github.com/user-attachments/assets/c69f242d-6aaa-4def-9f7c-172113b8bfc1"
/>

<img width="1978" alt="image"
src="https://github.com/user-attachments/assets/cf678404-f551-4fa5-9077-bebe3e8e8ae8"
/>

2025-06-02 13:07:08 -07:00

12 lines

215 B

Text

Raw Blame History

 apiVersion: v1
 kind: Service
 metadata:
   name: llama-stack-service
 spec:
   type: LoadBalancer
   selector:
     app.kubernetes.io/name: llama-stack
   ports:
     - port: 8321
       targetPort: 8321
       protocol: TCP