docs: Updated documentation and configuration to make things easier for the unfamiliar

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
Francisco Javier Arceo 2025-03-31 13:08:22 -04:00
parent 9b478f3756
commit 2847216efb
10 changed files with 69 additions and 32 deletions

View file

@ -1,6 +1,9 @@
# Kubernetes Deployment Guide
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster.
### Prerequisites
In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
First, create a local Kubernetes cluster via Kind:
@ -33,6 +36,7 @@ data:
token: $(HF_TOKEN)
```
Next, start the vLLM server as a Kubernetes Deployment and Service:
```bash
@ -127,6 +131,7 @@ EOF
podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack
```
### Deploying Llama Stack Server in Kubernetes
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
@ -187,6 +192,7 @@ spec:
EOF
```
### Verifying the Deployment
We can check that the LlamaStack server has started:
```bash