mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 19:04:19 +00:00
docs: Avoid bash script syntax highlighting for dark mode (#1918)
See https://github.com/meta-llama/llama-stack/pull/1913#issuecomment-2790153778 Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
This commit is contained in:
parent
36a31fe5dd
commit
712c6758c6
1 changed files with 8 additions and 8 deletions
|
@ -7,13 +7,13 @@ In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a
|
||||||
|
|
||||||
First, create a local Kubernetes cluster via Kind:
|
First, create a local Kubernetes cluster via Kind:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test
|
kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test
|
||||||
```
|
```
|
||||||
|
|
||||||
First, create a Kubernetes PVC and Secret for downloading and storing Hugging Face model:
|
First, create a Kubernetes PVC and Secret for downloading and storing Hugging Face model:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
cat <<EOF |kubectl apply -f -
|
cat <<EOF |kubectl apply -f -
|
||||||
apiVersion: v1
|
apiVersion: v1
|
||||||
kind: PersistentVolumeClaim
|
kind: PersistentVolumeClaim
|
||||||
|
@ -39,7 +39,7 @@ data:
|
||||||
|
|
||||||
Next, start the vLLM server as a Kubernetes Deployment and Service:
|
Next, start the vLLM server as a Kubernetes Deployment and Service:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
cat <<EOF |kubectl apply -f -
|
cat <<EOF |kubectl apply -f -
|
||||||
apiVersion: apps/v1
|
apiVersion: apps/v1
|
||||||
kind: Deployment
|
kind: Deployment
|
||||||
|
@ -95,7 +95,7 @@ EOF
|
||||||
|
|
||||||
We can verify that the vLLM server has started successfully via the logs (this might take a couple of minutes to download the model):
|
We can verify that the vLLM server has started successfully via the logs (this might take a couple of minutes to download the model):
|
||||||
|
|
||||||
```bash
|
```
|
||||||
$ kubectl logs -l app.kubernetes.io/name=vllm
|
$ kubectl logs -l app.kubernetes.io/name=vllm
|
||||||
...
|
...
|
||||||
INFO: Started server process [1]
|
INFO: Started server process [1]
|
||||||
|
@ -119,7 +119,7 @@ providers:
|
||||||
|
|
||||||
Once we have defined the run configuration for Llama Stack, we can build an image with that configuration and the server source code:
|
Once we have defined the run configuration for Llama Stack, we can build an image with that configuration and the server source code:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
cat >/tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <<EOF
|
cat >/tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <<EOF
|
||||||
FROM distribution-myenv:dev
|
FROM distribution-myenv:dev
|
||||||
|
|
||||||
|
@ -135,7 +135,7 @@ podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t
|
||||||
|
|
||||||
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
|
We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
cat <<EOF |kubectl apply -f -
|
cat <<EOF |kubectl apply -f -
|
||||||
apiVersion: v1
|
apiVersion: v1
|
||||||
kind: PersistentVolumeClaim
|
kind: PersistentVolumeClaim
|
||||||
|
@ -195,7 +195,7 @@ EOF
|
||||||
### Verifying the Deployment
|
### Verifying the Deployment
|
||||||
We can check that the LlamaStack server has started:
|
We can check that the LlamaStack server has started:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
$ kubectl logs -l app.kubernetes.io/name=llama-stack
|
$ kubectl logs -l app.kubernetes.io/name=llama-stack
|
||||||
...
|
...
|
||||||
INFO: Started server process [1]
|
INFO: Started server process [1]
|
||||||
|
@ -207,7 +207,7 @@ INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit
|
||||||
|
|
||||||
Finally, we forward the Kubernetes service to a local port and test some inference requests against it via the Llama Stack Client:
|
Finally, we forward the Kubernetes service to a local port and test some inference requests against it via the Llama Stack Client:
|
||||||
|
|
||||||
```bash
|
```
|
||||||
kubectl port-forward service/llama-stack-service 5000:5000
|
kubectl port-forward service/llama-stack-service 5000:5000
|
||||||
llama-stack-client --endpoint http://localhost:5000 inference chat-completion --message "hello, what model are you?"
|
llama-stack-client --endpoint http://localhost:5000 inference chat-completion --message "hello, what model are you?"
|
||||||
```
|
```
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue