--- title: Kubernetes Deployment Guide description: Deploy Llama Stack on Kubernetes clusters with vLLM inference service sidebar_label: Kubernetes sidebar_position: 2 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Kubernetes Deployment Guide Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers both local development with Kind and production deployment on AWS EKS. ## Prerequisites ### Local Kubernetes Setup Create a local Kubernetes cluster via Kind: ```bash kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test ``` Set your Hugging Face token: ```bash export HF_TOKEN=$(echo -n "your-hf-token" | base64) ``` ## Quick Deployment ### Step 1: Create Storage and Secrets ```yaml cat <$tmp_dir/Containerfile.llama-stack-run-k8s <