mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-31 05:10:01 +00:00
docs: Updated documentation and configuration to make things easier for the unfamiliar
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This commit is contained in:
parent
9b478f3756
commit
2847216efb
10 changed files with 69 additions and 32 deletions
|
|
@ -1,6 +1,9 @@
|
|||
# Kubernetes Deployment Guide
|
||||
|
||||
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
|
||||
Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster.
|
||||
|
||||
### Prerequisites
|
||||
In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes.
|
||||
|
||||
First, create a local Kubernetes cluster via Kind:
|
||||
|
||||
|
|
@ -33,6 +36,7 @@ data:
|
|||
token: $(HF_TOKEN)
|
||||
```
|
||||
|
||||
|
||||
Next, start the vLLM server as a Kubernetes Deployment and Service:
|
||||
|
||||
```bash
|
||||
|
|
@ -127,6 +131,7 @@ EOF
|
|||
podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s /tmp/test-vllm-llama-stack
|
||||
```
|
||||
|
||||
### Deploying Llama Stack Server in Kubernetes
|
||||
|
||||
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
|
||||
|
||||
|
|
@ -187,6 +192,7 @@ spec:
|
|||
EOF
|
||||
```
|
||||
|
||||
### Verifying the Deployment
|
||||
We can check that the LlamaStack server has started:
|
||||
|
||||
```bash
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue