--- title: Kubernetes Deployment Guide description: Deploy Llama Stack on Kubernetes clusters with vLLM inference service sidebar_label: Kubernetes sidebar_position: 2 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Kubernetes Deployment Guide Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers deployment using the Kubernetes operator to manage the Llama Stack server with Kind. The vLLM inference server is deployed manually. ## Prerequisites ### Local Kubernetes Setup Create a local Kubernetes cluster via Kind: ```bash kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test ``` Set your Hugging Face token: ```bash export HF_TOKEN=$(echo -n "your-hf-token" | base64) ``` ## Quick Deployment ### Step 1: Create Storage and Secrets ```yaml cat <-service`): ```bash # List services to find the service name kubectl get services | grep llamastack # Port forward and test (replace SERVICE_NAME with the actual service name) kubectl port-forward service/llamastack-vllm-service 8321:8321 ``` In another terminal, test the deployment: ```bash llama-stack-client --endpoint http://localhost:8321 inference chat-completion --message "hello, what model are you?" ``` ## Troubleshooting ### vLLM Server Issues **Check vLLM pod status:** ```bash kubectl get pods -l app.kubernetes.io/name=vllm kubectl logs -l app.kubernetes.io/name=vllm ``` **Test vLLM service connectivity:** ```bash kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://vllm-server:8000/v1/models ``` ### Llama Stack Server Issues **Check LlamaStackDistribution status:** ```bash # Get detailed status kubectl describe llamastackdistribution llamastack-vllm # Check for events kubectl get events --sort-by='.lastTimestamp' | grep llamastack-vllm ``` **Check operator-managed pods:** ```bash # List all pods managed by the operator kubectl get pods -l app.kubernetes.io/name=llama-stack # Check pod logs (replace POD_NAME with actual pod name) kubectl logs -l app.kubernetes.io/name=llama-stack ``` **Check operator status:** ```bash # Verify the operator is running kubectl get pods -n llama-stack-operator-system # Check operator logs if issues persist kubectl logs -n llama-stack-operator-system -l control-plane=controller-manager ``` **Verify service connectivity:** ```bash # Get the service endpoint kubectl get svc llamastack-vllm-service # Test connectivity from within the cluster kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://llamastack-vllm-service:8321/health ``` ## Related Resources - **[Deployment Overview](/docs/deploying/)** - Overview of deployment options - **[Distributions](/docs/distributions)** - Understanding Llama Stack distributions - **[Configuration](/docs/distributions/configuration)** - Detailed configuration options - **[LlamaStack Operator](https://github.com/llamastack/llama-stack-k8s-operator)** - Overview of llama-stack kubernetes operator - **[LlamaStackDistribution](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md)** - API Spec of the llama-stack operator Custom Resource.