# Kubernetes Deployment Guide Instead of starting the Llama Stack and vLLM servers locally. We can deploy them in a Kubernetes cluster. ### Prerequisites In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a vLLM inference service in the same cluster for demonstration purposes. First, create a local Kubernetes cluster via Kind: ``` kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test ``` First set your hugging face token as an environment variable. ``` export HF_TOKEN=$(echo -n "your-hf-token" | base64) ``` Now create a Kubernetes PVC and Secret for downloading and storing Hugging Face model: ``` cat <$tmp_dir/Containerfile.llama-stack-run-k8s <