llama-stack-mirror/chart/README.md
2025-03-18 12:13:02 -04:00

26 KiB

Llama Stack Helm Chart

This Helm chart is designed to install the Llama Stack, a comprehensive platform for llama-related tasks.

The chart provides a convenient way to deploy and manage the Llama Stack on Kubernetes or OpenShift clusters. It offers flexibility in customizing the deployment by allowing users to modify values such as image repositories, probe configurations, resource limits, and more.

Optionally, the chart also supports the installation of the llama-stack-playground, which provides a web-based interface for interacting with the Llama Stack.

Quick Start

Create a local-values.yaml file with the following:

Note

Chart currently only supports vllm framework directly. But other distributions can managed by adding to the env inside the values file directly.


distribution: distribution-remote-vllm

vllm:
  url: "https://<MY_VLLM_INSTANCE>:443/v1"
  inferenceModel: "meta-llama/Llama-3.1-8B-Instruct"
  apiKey: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Login to Kubernetes through the CLI and run:

helm upgrade -i ollama-stack . -f local-values.yaml

Custom Configuration

By default llama-stack will use the run.yaml config that comes with the specified distribution. For more granular control the customRunConfig can be set to true, in which case the helm chart will use the values inside of the files/run.yaml instead.

Values

Llama Stack Specific

Key Type Default Description
customRunConfig bool false Indicates whether a custom run configuration is being used.
distribution string "distribution-remote-vllm" Specifies the distribution or type of deployment being used (in this case, related to a remote vLLM distribution).
telemetry.enabled bool false Enables or disables telemetry collection.
telemetry.serviceName string "otel-collector.openshift-opentelemetry-operator.svc.cluster.local:4318" The service name and address of the telemetry collector.
telemetry.sinks string "console,sqlite,otel" Specifies the destinations or sinks where telemetry data will be sent.
vllm.inferenceModel string "llama2-7b-chat" The specific inference model to be used by vLLM (a high-throughput and memory-efficient inference service for large language models).
vllm.url string "http://vllm-server" The URL of the vLLM service.
env object N/A A set of key/value pairs that can be set in the pod

General

Key Type Default Description
autoscaling.enabled bool false Enables or disables horizontal pod autoscaling, which automatically adjusts the number of running instances based on CPU utilization.
autoscaling.maxReplicas int 100 The maximum number of pod replicas that the autoscaler can scale up to.
autoscaling.minReplicas int 1 The minimum number of pod replicas that will always be running.
autoscaling.targetCPUUtilizationPercentage int 80 The target average CPU utilization across all running pods that the autoscaler will aim to maintain.
image.pullPolicy string "Always" Defines when to pull the Docker image for the container (e.g., always pull, pull if not present, etc.).
image.repository string "docker.io/llamastack/{{ $.Values.distribution }}" The Docker image repository where the container image is located. It likely uses the distribution value to construct the full image path.
image.tag string "0.1.6" The specific version tag of the Docker image to use.
ingress.annotations object {} Kubernetes Ingress annotations, which can be used to configure load balancers and other external access settings.
ingress.className string "" The name of the Ingress controller to use for this Ingress resource.
ingress.enabled bool true Enables or disables the creation of a Kubernetes Ingress resource, which allows external access to the application.
ingress.hosts[0].host string "chart-example.local" The hostname that the Ingress will route traffic to. This is often a placeholder or example.
ingress.hosts[0].paths[0].path string "/" The path on the specified host that the Ingress will route traffic to (in this case, the root path).
ingress.hosts[0].paths[0].pathType string "ImplementationSpecific" The type of path matching used by the Ingress controller.
ingress.tls list [] Configuration for Transport Layer Security (TLS) termination at the Ingress, allowing for HTTPS.
livenessProbe.httpGet.path string "/v1/health" The HTTP endpoint path that the liveness probe will check to determine if the container is running and healthy.
livenessProbe.httpGet.port int 5001 The port that the liveness probe will connect to for the HTTP health check.
podAnnotations object {} Kubernetes Pod annotations, which can be used to attach arbitrary non-identifying metadata to the Pod.
podLabels object {} Kubernetes Pod labels, which are key/value pairs that are attached to Pods and can be used for organizing and selecting groups of Pods.
podSecurityContext object {} Defines the security context for the Pod, such as user and group IDs, security capabilities, etc.
readinessProbe.httpGet.path string "/v1/health" The HTTP endpoint path that the readiness probe will check to determine if the container is ready to serve traffic.
readinessProbe.httpGet.port int 5001 The port that the readiness probe will connect to for the HTTP readiness check.
replicaCount int 1 The desired number of pod replicas to run.
resources.limits.cpu string "100m" The maximum amount of CPU resources that a container can use (in millicores).
resources.limits.memory string "500Mi" The maximum amount of memory that a container can use (in megabytes).
resources.requests.cpu string "100m" The amount of CPU resources that Kubernetes will guarantee to be available for the container.
resources.requests.memory string "500Mi" The amount of memory that Kubernetes will guarantee to be available for the container (in megabytes).
route object {"annotations":{},"enabled":false,"host":"","path":"","tls":{"enabled":true,"insecureEdgeTerminationPolicy":"Redirect","termination":"edge"}} Configuration for an OpenShift Route object, which is used for exposing services externally on OpenShift.
route.annotations object {} Additional custom annotations for the OpenShift Route object.
route.host string Set by OpenShift The hostname for the OpenShift Route. This is typically managed by OpenShift.
route.path string "" The path for the OpenShift Route.
route.tls.enabled bool true Enables or disables TLS for the OpenShift Route, providing secure communication.
route.tls.insecureEdgeTerminationPolicy string "Redirect" The policy for handling insecure (HTTP) requests when TLS termination is at the edge (Route).
route.tls.termination string "edge" Specifies that TLS termination occurs at the OpenShift Route edge.
runConfig.enabled bool false Indicates whether a specific run configuration is enabled.
service.port int 5001 The port on which the Kubernetes Service will be exposed internally within the cluster.
service.type string "ClusterIP" The type of Kubernetes Service. ClusterIP makes the service only reachable from within the cluster.
serviceAccount.annotations object {} Annotations for the Kubernetes ServiceAccount.
serviceAccount.automount bool true Indicates whether the ServiceAccount token should be automatically mounted into the Pods.
serviceAccount.create bool false Determines whether a new Kubernetes ServiceAccount should be created.
serviceAccount.name string "" The name of an existing Kubernetes ServiceAccount to use. If create is true and this is empty, a default name will be generated.
startupProbe.failureThreshold int 30 The number of consecutive failures of the startup probe before Kubernetes considers the container failed to start.
startupProbe.httpGet.path string "/v1/health" The HTTP endpoint path for the startup probe, used to determine if the application has started successfully.
startupProbe.httpGet.port int 5001 The port for the HTTP startup probe.
startupProbe.initialDelaySeconds int 40 The number of seconds to wait after the container has started before the startup probe is first initiated.
startupProbe.periodSeconds int 10 The interval (in seconds) at which the startup probe will be executed.
volumeMounts list [] A list of volume mounts that define how volumes should be mounted into the container's filesystem.
volumes list [] A list of volume definitions that provide storage for the Pod.