env var and fix logs

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
This commit is contained in:
Yuan Tang 2025-01-29 21:39:29 -05:00
parent ddba43fada
commit 3bcc778e16
No known key found for this signature in database

View file

@ -8,7 +8,7 @@ First, create a local Kubernetes cluster via Kind:
kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test
``` ```
Start vLLM server as a Kubernetes Pod and Service (remember to replace `<YOUR-HF-TOKEN>` with your actual token and `<VLLM-IMAGE>` to meet your local system architecture): Start vLLM server as a Kubernetes Pod and Service:
```bash ```bash
cat <<EOF |kubectl apply -f - cat <<EOF |kubectl apply -f -
@ -30,7 +30,7 @@ metadata:
name: hf-token-secret name: hf-token-secret
type: Opaque type: Opaque
data: data:
token: "<YOUR-HF-TOKEN>" token: $(HF_TOKEN)
--- ---
apiVersion: apps/v1 apiVersion: apps/v1
kind: Deployment kind: Deployment
@ -48,7 +48,7 @@ spec:
spec: spec:
containers: containers:
- name: llama-stack - name: llama-stack
image: <VLLM-IMAGE> image: $(VLLM_IMAGE)
command: command:
- bash - bash
- -c - -c
@ -92,7 +92,7 @@ EOF
We can verify that the vLLM server has started successfully via the logs (this might take a couple of minutes to download the model): We can verify that the vLLM server has started successfully via the logs (this might take a couple of minutes to download the model):
```bash ```bash
$ kubectl logs vllm-server $ kubectl logs -l app.kubernetes.io/name=vllm
... ...
INFO: Started server process [1] INFO: Started server process [1]
INFO: Waiting for application startup. INFO: Waiting for application startup.
@ -190,7 +190,7 @@ EOF
We can check that the LlamaStack server has started: We can check that the LlamaStack server has started:
```bash ```bash
$ kubectl logs vllm-server $ kubectl logs -l app.kubernetes.io/name=llama-stack
... ...
INFO: Started server process [1] INFO: Started server process [1]
INFO: Waiting for application startup. INFO: Waiting for application startup.