forked from phoenix-oss/llama-stack-mirror
docs: Simplify vLLM deployment in K8s deployment guide (#1655)
# What does this PR do? * Removes the use of `huggingface-cli` * Simplifies HF cache mount path * Simplifies vLLM server startup command * Separates PVC/secret creation from deployment/service * Fixes a typo: "pod" should be "deployment" Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
This commit is contained in:
parent
9e1ddf2b53
commit
9ff82036f7
1 changed files with 20 additions and 20 deletions
|
@ -8,7 +8,7 @@ First, create a local Kubernetes cluster via Kind:
|
|||
kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test
|
||||
```
|
||||
|
||||
Start vLLM server as a Kubernetes Pod and Service:
|
||||
First, create a Kubernetes PVC and Secret for downloading and storing Hugging Face model:
|
||||
|
||||
```bash
|
||||
cat <<EOF |kubectl apply -f -
|
||||
|
@ -31,7 +31,12 @@ metadata:
|
|||
type: Opaque
|
||||
data:
|
||||
token: $(HF_TOKEN)
|
||||
---
|
||||
```
|
||||
|
||||
Next, start the vLLM server as a Kubernetes Deployment and Service:
|
||||
|
||||
```bash
|
||||
cat <<EOF |kubectl apply -f -
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
|
@ -47,28 +52,23 @@ spec:
|
|||
app.kubernetes.io/name: vllm
|
||||
spec:
|
||||
containers:
|
||||
- name: llama-stack
|
||||
image: $(VLLM_IMAGE)
|
||||
command:
|
||||
- bash
|
||||
- -c
|
||||
- |
|
||||
MODEL="meta-llama/Llama-3.2-1B-Instruct"
|
||||
MODEL_PATH=/app/model/$(basename $MODEL)
|
||||
huggingface-cli login --token $HUGGING_FACE_HUB_TOKEN
|
||||
huggingface-cli download $MODEL --local-dir $MODEL_PATH --cache-dir $MODEL_PATH
|
||||
python3 -m vllm.entrypoints.openai.api_server --model $MODEL_PATH --served-model-name $MODEL --port 8000
|
||||
- name: vllm
|
||||
image: vllm/vllm-openai:latest
|
||||
command: ["/bin/sh", "-c"]
|
||||
args: [
|
||||
"vllm serve meta-llama/Llama-3.2-1B-Instruct"
|
||||
]
|
||||
env:
|
||||
- name: HUGGING_FACE_HUB_TOKEN
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: hf-token-secret
|
||||
key: token
|
||||
ports:
|
||||
- containerPort: 8000
|
||||
volumeMounts:
|
||||
- name: llama-storage
|
||||
mountPath: /app/model
|
||||
env:
|
||||
- name: HUGGING_FACE_HUB_TOKEN
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: hf-token-secret
|
||||
key: token
|
||||
mountPath: /root/.cache/huggingface
|
||||
volumes:
|
||||
- name: llama-storage
|
||||
persistentVolumeClaim:
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue