From 09ed0e9c9f9a49c8b96607a3b481e3dac6205809 Mon Sep 17 00:00:00 2001 From: Yuan Tang Date: Thu, 6 Feb 2025 13:28:02 -0500 Subject: [PATCH] Add Kubernetes deployment guide (#899) This PR moves some content from [the recent blog post](https://blog.vllm.ai/2025/01/27/intro-to-llama-stack-with-vllm.html) to here as a more official guide for users who'd like to deploy Llama Stack on Kubernetes. --------- Signed-off-by: Yuan Tang --- docs/source/distributions/index.md | 8 +- .../distributions/kubernetes_deployment.md | 207 ++++++++++++++++++ 2 files changed, 214 insertions(+), 1 deletion(-) create mode 100644 docs/source/distributions/kubernetes_deployment.md diff --git a/docs/source/distributions/index.md b/docs/source/distributions/index.md index ee7f4f23c..1f766e75e 100644 --- a/docs/source/distributions/index.md +++ b/docs/source/distributions/index.md @@ -14,7 +14,12 @@ Another simple way to start interacting with Llama Stack is to just spin up a co **Conda**: -Lastly, if you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details. +If you have a custom or an advanced setup or you are developing on Llama Stack you can also build a custom Llama Stack server. Using `llama stack build` and `llama stack run` you can build/run a custom Llama Stack server containing the exact combination of providers you wish. We have also provided various templates to make getting started easier. See [Building a Custom Distribution](building_distro) for more details. + + +**Kubernetes**: + +If you have built a container image and want to deploy it in a Kubernetes cluster instead of starting the Llama Stack server locally. See [Kubernetes Deployment Guide](kubernetes_deployment) for more details. ```{toctree} @@ -25,4 +30,5 @@ importing_as_library building_distro configuration selection +kubernetes_deployment ``` diff --git a/docs/source/distributions/kubernetes_deployment.md b/docs/source/distributions/kubernetes_deployment.md new file mode 100644 index 000000000..6cca2bc47 --- /dev/null +++ b/docs/source/distributions/kubernetes_deployment.md @@ -0,0 +1,207 @@ +# Kubernetes Deployment Guide + +Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes. + +First, create a local Kubernetes cluster via Kind: + +```bash +kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test +``` + +Start vLLM server as a Kubernetes Pod and Service: + +```bash +cat </tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <