Merge branch 'main' into docs-4

2025-12-31 06:19:59 +00:00 · 2025-04-10 14:15:54 -06:00 · 2025-04-10 14:15:54 -06:00 · d7c976c6d2
commit d7c976c6d2
parent 5ffe6cee36 de6ec5803e
38 changed files with 4709 additions and 8876 deletions
--- a/docs/_static/css/my_theme.css
+++ b/docs/_static/css/my_theme.css
@ -20,3 +20,6 @@
 h3 {
    font-weight: normal;
 }
+html[data-theme="dark"] .rst-content div[class^="highlight"] {
+  background-color: #0b0b0b;
+}
--- a/docs/_static/js/detect_theme.js
+++ b/docs/_static/js/detect_theme.js
@ -0,0 +1,9 @@
+document.addEventListener("DOMContentLoaded", function () {
+  const prefersDark = window.matchMedia("(prefers-color-scheme: dark)").matches;
+  const htmlElement = document.documentElement;
+  if (prefersDark) {
+    htmlElement.setAttribute("data-theme", "dark");
+  } else {
+    htmlElement.setAttribute("data-theme", "light");
+  }
+});
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@ -112,6 +112,8 @@ html_theme_options = {
    # "style_nav_header_background": "#c3c9d4",
 }

+default_dark_mode = False
+
 html_static_path = ["../_static"]
 # html_logo = "../_static/llama-stack-logo.png"
 # html_style = "../_static/css/my_theme.css"
@ -119,6 +121,7 @@ html_static_path = ["../_static"]

 def setup(app):
    app.add_css_file("css/my_theme.css")
+    app.add_js_file("js/detect_theme.js")

    def dockerhub_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
        url = f"https://hub.docker.com/r/llamastack/{text}"
--- a/docs/source/distributions/kubernetes_deployment.md
+++ b/docs/source/distributions/kubernetes_deployment.md
@ -7,13 +7,13 @@ In this guide, we'll use a local [Kind](https://kind.sigs.k8s.io/) cluster and a

 First, create a local Kubernetes cluster via Kind:

-```bash
+```
 kind create cluster --image kindest/node:v1.32.0 --name llama-stack-test
 ```

 First, create a Kubernetes PVC and Secret for downloading and storing Hugging Face model:

-```bash
+```
 cat <<EOF |kubectl apply -f -
 apiVersion: v1
 kind: PersistentVolumeClaim
@ -39,7 +39,7 @@ data:

 Next, start the vLLM server as a Kubernetes Deployment and Service:

-```bash
+```
 cat <<EOF |kubectl apply -f -
 apiVersion: apps/v1
 kind: Deployment
@ -95,7 +95,7 @@ EOF

 We can verify that the vLLM server has started successfully via the logs (this might take a couple of minutes to download the model):

-```bash
+```
 $ kubectl logs -l app.kubernetes.io/name=vllm
 ...
 INFO:     Started server process [1]
@ -119,7 +119,7 @@ providers:

 Once we have defined the run configuration for Llama Stack, we can build an image with that configuration and the server source code:

-```bash
+```
 cat >/tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s <<EOF
 FROM distribution-myenv:dev

@ -135,7 +135,7 @@ podman build -f /tmp/test-vllm-llama-stack/Containerfile.llama-stack-run-k8s -t

 We can then start the Llama Stack server by deploying a Kubernetes Pod and Service:

-```bash
+```
 cat <<EOF |kubectl apply -f -
 apiVersion: v1
 kind: PersistentVolumeClaim
@ -195,7 +195,7 @@ EOF
 ### Verifying the Deployment
 We can check that the LlamaStack server has started:

-```bash
+```
 $ kubectl logs -l app.kubernetes.io/name=llama-stack
 ...
 INFO:     Started server process [1]
@ -207,7 +207,7 @@ INFO:     Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit

 Finally, we forward the Kubernetes service to a local port and test some inference requests against it via the Llama Stack Client:

-```bash
+```
 kubectl port-forward service/llama-stack-service 5000:5000
 llama-stack-client --endpoint http://localhost:5000 inference chat-completion --message "hello, what model are you?"
 ```
--- a/docs/source/distributions/self_hosted_distro/remote-vllm.md
+++ b/docs/source/distributions/self_hosted_distro/remote-vllm.md
@ -25,7 +25,7 @@ The `llamastack/distribution-remote-vllm` distribution consists of the following
 | vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |


-You can use this distribution if you have GPUs and want to run an independent vLLM server container for running inference.
+You can use this distribution if you want to run an independent vLLM server for inference.

 ### Environment Variables

@ -41,7 +41,10 @@ The following environment variables can be configured:

 ## Setting up vLLM server

-Both AMD and NVIDIA GPUs can serve as accelerators for the vLLM server, which acts as both the LLM inference provider and the safety provider.
+In the following sections, we'll use either AMD and NVIDIA GPUs to serve as hardware accelerators for the vLLM
+server, which acts as both the LLM inference provider and the safety provider. Note that vLLM also
+[supports many other hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html) and
+that we only use GPUs here for demonstration purposes.

 ### Setting up vLLM server on AMD GPU

--- a/docs/source/playground/index.md
+++ b/docs/source/playground/index.md
@ -103,7 +103,5 @@ llama stack run together

 2. Start Streamlit UI
 ```bash
-cd llama_stack/distribution/ui
-pip install -r requirements.txt
-streamlit run app.py
+uv run --with ".[ui]" streamlit run llama_stack/distribution/ui/app.py
 ```