Merge branch 'main' into content-extension

2025-12-19 03:19:40 +00:00 · 2025-08-28 12:58:13 -06:00 · 2025-08-28 12:58:13 -06:00 · 4c1f187c71
commit 4c1f187c71
parent 3e11e1472c 52106d95d3
42 changed files with 2089 additions and 389 deletions
--- a/docs/source/advanced_apis/evaluation_concepts.md
+++ b/docs/source/advanced_apis/evaluation_concepts.md
@ -33,7 +33,7 @@ The list of open-benchmarks we currently support:
 - [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.


-You can follow this [contributing guide](https://llama-stack.readthedocs.io/en/latest/references/evals_reference/index.html#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
+You can follow this [contributing guide](../references/evals_reference/index.md#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack

 #### Run evaluation on open-benchmarks via CLI

--- a/docs/source/advanced_apis/post_training/inline_huggingface.md
+++ b/docs/source/advanced_apis/post_training/inline_huggingface.md
@ -35,3 +35,6 @@ device: cpu

 ```

+[Find more detailed information here!](huggingface.md)
+
+
--- a/docs/source/advanced_apis/post_training/inline_torchtune.md
+++ b/docs/source/advanced_apis/post_training/inline_torchtune.md
@ -22,3 +22,4 @@ checkpoint_format: meta

 ```

+[Find more detailed information here!](torchtune.md)
--- a/docs/source/building_applications/playground/index.md
+++ b/docs/source/building_applications/playground/index.md
@ -88,7 +88,7 @@ Interactive pages for users to play with and explore Llama Stack API capabilitie
 - **API Resources**: Inspect Llama Stack API resources
  - This page allows you to inspect Llama Stack API resources (`models`, `datasets`, `memory_banks`, `benchmarks`, `shields`).
  - Under the hood, it uses Llama Stack's `/<resources>/list` API to get information about each resources.
-  - Please visit [Core Concepts](https://llama-stack.readthedocs.io/en/latest/concepts/index.html) for more details about the resources.
+  - Please visit [Core Concepts](../../concepts/index.md) for more details about the resources.

 ### Starting the Llama Stack Playground

--- a/docs/source/building_applications/responses_vs_agents.md
+++ b/docs/source/building_applications/responses_vs_agents.md
@ -3,7 +3,7 @@
 Llama Stack (LLS) provides two different APIs for building AI applications with tool calling capabilities: the **Agents API** and the **OpenAI Responses API**. While both enable AI systems to use tools, and maintain full conversation history, they serve different use cases and have distinct characteristics.

 ```{note}
-For simple and basic inferencing, you may want to use the [Chat Completions API](https://llama-stack.readthedocs.io/en/latest/providers/index.html#chat-completions) directly, before progressing to Agents or Responses API.
+ **Note:** For simple and basic inferencing, you may want to use the [Chat Completions API](../providers/openai.md#chat-completions) directly, before progressing to Agents or Responses API.
 ```

 ## Overview
@ -173,7 +173,7 @@ Both APIs demonstrate distinct strengths that make them valuable on their own fo

 ## For More Information

- **LLS Agents API**: For detailed information on creating and managing agents, see the [Agents documentation](https://llama-stack.readthedocs.io/en/latest/building_applications/agent.html)
+- **LLS Agents API**: For detailed information on creating and managing agents, see the [Agents documentation](agent.md)
 - **OpenAI Responses API**: For information on using the OpenAI-compatible responses API, see the [OpenAI API documentation](https://platform.openai.com/docs/api-reference/responses)
- **Chat Completions API**: For the default backend API used by Agents, see the [Chat Completions providers documentation](https://llama-stack.readthedocs.io/en/latest/providers/index.html#chat-completions)
- **Agent Execution Loop**: For understanding how agents process turns and steps in their execution, see the [Agent Execution Loop documentation](https://llama-stack.readthedocs.io/en/latest/building_applications/agent_execution_loop.html)
+- **Chat Completions API**: For the default backend API used by Agents, see the [Chat Completions providers documentation](../providers/openai.md#chat-completions)
+- **Agent Execution Loop**: For understanding how agents process turns and steps in their execution, see the [Agent Execution Loop documentation](agent_execution_loop.md)
--- a/docs/source/concepts/distributions.md
+++ b/docs/source/concepts/distributions.md
@ -6,4 +6,4 @@ While there is a lot of flexibility to mix-and-match providers, often users will

 **Locally Hosted Distro**: You may want to run Llama Stack on your own hardware. Typically though, you still need to use Inference via an external service. You can use providers like HuggingFace TGI, Fireworks, Together, etc. for this purpose. Or you may have access to GPUs and can run a [vLLM](https://github.com/vllm-project/vllm) or [NVIDIA NIM](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere&q=llama) instance. If you "just" have a regular desktop machine, you can use [Ollama](https://ollama.com/) for inference. To provide convenient quick access to these options, we provide a number of such pre-configured locally-hosted Distros.

-**On-device Distro**: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/ios_sdk.html) and [Android](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/android_sdk.html)
+**On-device Distro**: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](../distributions/ondevice_distro/ios_sdk.md) and [Android](../distributions/ondevice_distro/android_sdk.md)
--- a/docs/source/contributing/new_api_provider.md
+++ b/docs/source/contributing/new_api_provider.md
@ -14,6 +14,13 @@ Here are some example PRs to help you get started:
   - [Nvidia Inference Implementation](https://github.com/meta-llama/llama-stack/pull/355)
   - [Model context protocol Tool Runtime](https://github.com/meta-llama/llama-stack/pull/665)

+## Guidelines for creating Internal or External Providers
+
+|**Type** |Internal (In-tree) |External (out-of-tree)
+|---------|-------------------|---------------------|
+|**Description** |A provider that is directly in the Llama Stack code|A provider that is outside of the Llama stack core codebase but is still accessible and usable by Llama Stack.
+|**Benefits** |Ability to interact with the provider with minimal additional configurations or installations| Contributors do not have to add directly to the code to create providers accessible on Llama Stack. Keep provider-specific code separate from the core Llama Stack code.
+
 ## Inference Provider Patterns

 When implementing Inference providers for OpenAI-compatible APIs, Llama Stack provides several mixin classes to simplify development and ensure consistent behavior across providers.
--- a/docs/source/distributions/importing_as_library.md
+++ b/docs/source/distributions/importing_as_library.md
@ -27,7 +27,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
 response = client.models.list()
 ```

-If you've created a [custom distribution](https://llama-stack.readthedocs.io/en/latest/distributions/building_distro.html), you can also use the run.yaml configuration file directly:
+If you've created a [custom distribution](building_distro.md), you can also use the run.yaml configuration file directly:

 ```python
 client = LlamaStackAsLibraryClient(config_path)
--- a/docs/source/distributions/k8s/apply.sh
+++ b/docs/source/distributions/k8s/apply.sh
@ -22,17 +22,17 @@ else
 fi

 if [ -z "${GITHUB_CLIENT_ID:-}" ]; then
-  echo "ERROR: GITHUB_CLIENT_ID not set. You need it for Github login to work. Refer to https://llama-stack.readthedocs.io/en/latest/deploying/index.html#kubernetes-deployment-guide"
+  echo "ERROR: GITHUB_CLIENT_ID not set. You need it for Github login to work. See the Kubernetes Deployment Guide in the Llama Stack documentation."
  exit 1
 fi

 if [ -z "${GITHUB_CLIENT_SECRET:-}" ]; then
-  echo "ERROR: GITHUB_CLIENT_SECRET not set. You need it for Github login to work. Refer to https://llama-stack.readthedocs.io/en/latest/deploying/index.html#kubernetes-deployment-guide"
+  echo "ERROR: GITHUB_CLIENT_SECRET not set. You need it for Github login to work. See the Kubernetes Deployment Guide in the Llama Stack documentation."
  exit 1
 fi

 if [ -z "${LLAMA_STACK_UI_URL:-}" ]; then
-  echo "ERROR: LLAMA_STACK_UI_URL not set. Should be set to the external URL of the UI (excluding port). You need it for Github login to work. Refer to https://llama-stack.readthedocs.io/en/latest/deploying/index.html#kubernetes-deployment-guide"
+  echo "ERROR: LLAMA_STACK_UI_URL not set. Should be set to the external URL of the UI (excluding port). You need it for Github login to work. See the Kubernetes Deployment Guide in the Llama Stack documentation."
  exit 1
 fi

--- a/docs/source/distributions/ondevice_distro/android_sdk.md
+++ b/docs/source/distributions/ondevice_distro/android_sdk.md
@ -66,7 +66,7 @@ llama stack run starter --port 5050

 Ensure the Llama Stack server version is the same as the Kotlin SDK Library for maximum compatibility.

-Other inference providers: [Table](https://llama-stack.readthedocs.io/en/latest/index.html#supported-llama-stack-implementations)
+Other inference providers: [Table](../../index.md#supported-llama-stack-implementations)

 How to set remote localhost in Demo App: [Settings](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app#settings)

--- a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
+++ b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
@ -2,7 +2,7 @@
 orphan: true
 ---
 <!-- This file was auto-generated by distro_codegen.py, please edit source -->
-# Meta Reference Distribution
+# Meta Reference GPU Distribution

 ```{toctree}
 :maxdepth: 2
@ -41,7 +41,7 @@ The following environment variables can be configured:

 ## Prerequisite: Downloading Models

-Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.

 ```
 $ llama model list --downloaded
--- a/docs/source/providers/post_training/index.md
+++ b/docs/source/providers/post_training/index.md
@ -9,7 +9,6 @@ This section contains documentation for all available providers for the **post_t
 ```{toctree}
 :maxdepth: 1

-inline_huggingface-cpu
 inline_huggingface-gpu
 inline_torchtune-cpu
 inline_torchtune-gpu
--- a/docs/source/references/evals_reference/index.md
+++ b/docs/source/references/evals_reference/index.md
@ -202,7 +202,7 @@ pprint(response)

 Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets.

-In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](https://llama-stack.readthedocs.io/en/latest/playground/index.html) for an interactive interface to upload datasets and run scorings.
+In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../../building_applications/playground/index.md) for an interactive interface to upload datasets and run scorings.

 ```python
 judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8"
--- a/llama_stack/core/build.py
+++ b/llama_stack/core/build.py
@ -80,7 +80,7 @@ def get_provider_dependencies(
    normal_deps = []
    special_deps = []
    for package in deps:
-        if "--no-deps" in package or "--index-url" in package:
+        if any(f in package for f in ["--no-deps", "--index-url", "--extra-index-url"]):
            special_deps.append(package)
        else:
            normal_deps.append(package)
--- a/llama_stack/core/stack.py
+++ b/llama_stack/core/stack.py
@ -225,7 +225,10 @@ def replace_env_vars(config: Any, path: str = "") -> Any:

        try:
            result = re.sub(pattern, get_env_var, config)
-            return _convert_string_to_proper_type(result)
+            # Only apply type conversion if substitution actually happened
+            if result != config:
+                return _convert_string_to_proper_type(result)
+            return result
        except EnvVarError as e:
            raise EnvVarError(e.var_name, e.path) from None

--- a/llama_stack/distributions/ci-tests/build.yaml
+++ b/llama_stack/distributions/ci-tests/build.yaml
@ -34,7 +34,7 @@ distribution_spec:
    telemetry:
    - provider_type: inline::meta-reference
    post_training:
-    - provider_type: inline::huggingface-cpu
+    - provider_type: inline::torchtune-cpu
    eval:
    - provider_type: inline::meta-reference
    datasetio:
--- a/llama_stack/distributions/ci-tests/run.yaml
+++ b/llama_stack/distributions/ci-tests/run.yaml
@ -156,13 +156,10 @@ providers:
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ci-tests}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  post_training:
-  - provider_id: huggingface-cpu
-    provider_type: inline::huggingface-cpu
+  - provider_id: torchtune-cpu
+    provider_type: inline::torchtune-cpu
    config:
-      checkpoint_format: huggingface
-      distributed_backend: null
-      device: cpu
-      dpo_output_dir: ~/.llama/distributions/ci-tests/dpo_output
+      checkpoint_format: meta
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/distributions/meta-reference-gpu/doc_template.md
+++ b/llama_stack/distributions/meta-reference-gpu/doc_template.md
@ -1,7 +1,7 @@
 ---
 orphan: true
 ---
-# Meta Reference Distribution
+# Meta Reference GPU Distribution

 ```{toctree}
 :maxdepth: 2
@ -29,7 +29,7 @@ The following environment variables can be configured:

 ## Prerequisite: Downloading Models

-Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.

 ```
 $ llama model list --downloaded
--- a/llama_stack/distributions/starter-gpu/build.yaml
+++ b/llama_stack/distributions/starter-gpu/build.yaml
@ -35,7 +35,7 @@ distribution_spec:
    telemetry:
    - provider_type: inline::meta-reference
    post_training:
-    - provider_type: inline::torchtune-gpu
+    - provider_type: inline::huggingface-gpu
    eval:
    - provider_type: inline::meta-reference
    datasetio:
--- a/llama_stack/distributions/starter-gpu/run.yaml
+++ b/llama_stack/distributions/starter-gpu/run.yaml
@ -156,10 +156,13 @@ providers:
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter-gpu}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  post_training:
-  - provider_id: torchtune-gpu
-    provider_type: inline::torchtune-gpu
+  - provider_id: huggingface-gpu
+    provider_type: inline::huggingface-gpu
    config:
-      checkpoint_format: meta
+      checkpoint_format: huggingface
+      distributed_backend: null
+      device: cpu
+      dpo_output_dir: ~/.llama/distributions/starter-gpu/dpo_output
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/distributions/starter-gpu/starter_gpu.py
+++ b/llama_stack/distributions/starter-gpu/starter_gpu.py
@ -17,6 +17,6 @@ def get_distribution_template() -> DistributionTemplate:
    template.description = "Quick start template for running Llama Stack with several popular providers. This distribution is intended for GPU-enabled environments."

    template.providers["post_training"] = [
-        BuildProvider(provider_type="inline::torchtune-gpu"),
+        BuildProvider(provider_type="inline::huggingface-gpu"),
    ]
    return template
--- a/llama_stack/distributions/starter/build.yaml
+++ b/llama_stack/distributions/starter/build.yaml
@ -35,7 +35,7 @@ distribution_spec:
    telemetry:
    - provider_type: inline::meta-reference
    post_training:
-    - provider_type: inline::huggingface-cpu
+    - provider_type: inline::torchtune-cpu
    eval:
    - provider_type: inline::meta-reference
    datasetio:
--- a/llama_stack/distributions/starter/run.yaml
+++ b/llama_stack/distributions/starter/run.yaml
@ -156,13 +156,10 @@ providers:
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  post_training:
-  - provider_id: huggingface-cpu
-    provider_type: inline::huggingface-cpu
+  - provider_id: torchtune-cpu
+    provider_type: inline::torchtune-cpu
    config:
-      checkpoint_format: huggingface
-      distributed_backend: null
-      device: cpu
-      dpo_output_dir: ~/.llama/distributions/starter/dpo_output
+      checkpoint_format: meta
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/distributions/starter/starter.py
+++ b/llama_stack/distributions/starter/starter.py
@ -120,7 +120,7 @@ def get_distribution_template() -> DistributionTemplate:
        ],
        "agents": [BuildProvider(provider_type="inline::meta-reference")],
        "telemetry": [BuildProvider(provider_type="inline::meta-reference")],
-        "post_training": [BuildProvider(provider_type="inline::huggingface-cpu")],
+        "post_training": [BuildProvider(provider_type="inline::torchtune-cpu")],
        "eval": [BuildProvider(provider_type="inline::meta-reference")],
        "datasetio": [
            BuildProvider(provider_type="remote::huggingface"),
--- a/llama_stack/providers/registry/inference.py
+++ b/llama_stack/providers/registry/inference.py
@ -40,8 +40,9 @@ def available_providers() -> list[ProviderSpec]:
        InlineProviderSpec(
            api=Api.inference,
            provider_type="inline::sentence-transformers",
+            # CrossEncoder depends on torchao.quantization
            pip_packages=[
-                "torch torchvision --index-url https://download.pytorch.org/whl/cpu",
+                "torch torchvision torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu",
                "sentence-transformers --no-deps",
            ],
            module="llama_stack.providers.inline.inference.sentence_transformers",
--- a/llama_stack/providers/registry/post_training.py
+++ b/llama_stack/providers/registry/post_training.py
@ -13,7 +13,7 @@ from llama_stack.providers.datatypes import AdapterSpec, Api, InlineProviderSpec
 # The CPU version is used for distributions that don't have GPU support -- they result in smaller container images.
 torchtune_def = dict(
    api=Api.post_training,
-    pip_packages=["torchtune==0.5.0", "torchao==0.8.0", "numpy"],
+    pip_packages=["numpy"],
    module="llama_stack.providers.inline.post_training.torchtune",
    config_class="llama_stack.providers.inline.post_training.torchtune.TorchtunePostTrainingConfig",
    api_dependencies=[
@ -23,56 +23,39 @@ torchtune_def = dict(
    description="TorchTune-based post-training provider for fine-tuning and optimizing models using Meta's TorchTune framework.",
 )

-huggingface_def = dict(
-    api=Api.post_training,
-    pip_packages=["trl", "transformers", "peft", "datasets"],
-    module="llama_stack.providers.inline.post_training.huggingface",
-    config_class="llama_stack.providers.inline.post_training.huggingface.HuggingFacePostTrainingConfig",
-    api_dependencies=[
-        Api.datasetio,
-        Api.datasets,
-    ],
-    description="HuggingFace-based post-training provider for fine-tuning models using the HuggingFace ecosystem.",
-)
-

 def available_providers() -> list[ProviderSpec]:
    return [
        InlineProviderSpec(
-            **{
+            **{  # type: ignore
                **torchtune_def,
                "provider_type": "inline::torchtune-cpu",
                "pip_packages": (
                    cast(list[str], torchtune_def["pip_packages"])
-                    + ["torch torchtune==0.5.0 torchao==0.8.0 --index-url https://download.pytorch.org/whl/cpu"]
+                    + ["torch torchtune>=0.5.0 torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu"]
                ),
            },
        ),
        InlineProviderSpec(
-            **{
-                **huggingface_def,
-                "provider_type": "inline::huggingface-cpu",
-                "pip_packages": (
-                    cast(list[str], huggingface_def["pip_packages"])
-                    + ["torch --index-url https://download.pytorch.org/whl/cpu"]
-                ),
-            },
-        ),
-        InlineProviderSpec(
-            **{
+            **{  # type: ignore
                **torchtune_def,
                "provider_type": "inline::torchtune-gpu",
                "pip_packages": (
-                    cast(list[str], torchtune_def["pip_packages"]) + ["torch torchtune==0.5.0 torchao==0.8.0"]
+                    cast(list[str], torchtune_def["pip_packages"]) + ["torch torchtune>=0.5.0 torchao>=0.12.0"]
                ),
            },
        ),
        InlineProviderSpec(
-            **{
-                **huggingface_def,
-                "provider_type": "inline::huggingface-gpu",
-                "pip_packages": (cast(list[str], huggingface_def["pip_packages"]) + ["torch"]),
-            },
+            api=Api.post_training,
+            provider_type="inline::huggingface-gpu",
+            pip_packages=["trl", "transformers", "peft", "datasets", "torch"],
+            module="llama_stack.providers.inline.post_training.huggingface",
+            config_class="llama_stack.providers.inline.post_training.huggingface.HuggingFacePostTrainingConfig",
+            api_dependencies=[
+                Api.datasetio,
+                Api.datasets,
+            ],
+            description="HuggingFace-based post-training provider for fine-tuning models using the HuggingFace ecosystem.",
        ),
        remote_provider_spec(
            api=Api.post_training,
--- a/llama_stack/testing/inference_recorder.py
+++ b/llama_stack/testing/inference_recorder.py
@ -9,7 +9,6 @@ from __future__ import annotations  # for forward references
 import hashlib
 import json
 import os
-import sqlite3
 from collections.abc import Generator
 from contextlib import contextmanager
 from enum import StrEnum
@ -125,28 +124,13 @@ class ResponseStorage:
    def __init__(self, test_dir: Path):
        self.test_dir = test_dir
        self.responses_dir = self.test_dir / "responses"
-        self.db_path = self.test_dir / "index.sqlite"

        self._ensure_directories()
-        self._init_database()

    def _ensure_directories(self):
        self.test_dir.mkdir(parents=True, exist_ok=True)
        self.responses_dir.mkdir(exist_ok=True)

-    def _init_database(self):
-        with sqlite3.connect(self.db_path) as conn:
-            conn.execute("""
-                CREATE TABLE IF NOT EXISTS recordings (
-                    request_hash TEXT PRIMARY KEY,
-                    response_file TEXT,
-                    endpoint TEXT,
-                    model TEXT,
-                    timestamp TEXT,
-                    is_streaming BOOLEAN
-                )
-            """)
-
    def store_recording(self, request_hash: str, request: dict[str, Any], response: dict[str, Any]):
        """Store a request/response pair."""
        # Generate unique response filename
@ -169,34 +153,9 @@ class ResponseStorage:
            f.write("\n")
            f.flush()

-        # Update SQLite index
-        with sqlite3.connect(self.db_path) as conn:
-            conn.execute(
-                """
-                INSERT OR REPLACE INTO recordings
-                (request_hash, response_file, endpoint, model, timestamp, is_streaming)
-                VALUES (?, ?, ?, ?, datetime('now'), ?)
-            """,
-                (
-                    request_hash,
-                    response_file,
-                    request.get("endpoint", ""),
-                    request.get("model", ""),
-                    response.get("is_streaming", False),
-                ),
-            )
-
    def find_recording(self, request_hash: str) -> dict[str, Any] | None:
        """Find a recorded response by request hash."""
-        with sqlite3.connect(self.db_path) as conn:
-            result = conn.execute(
-                "SELECT response_file FROM recordings WHERE request_hash = ?", (request_hash,)
-            ).fetchone()
-
-        if not result:
-            return None
-
-        response_file = result[0]
+        response_file = f"{request_hash[:12]}.json"
        response_path = self.responses_dir / response_file

        if not response_path.exists():
--- a/llama_stack/ui/app/chat-playground/chunk-processor.test.tsx
+++ b/llama_stack/ui/app/chat-playground/chunk-processor.test.tsx
@ -0,0 +1,610 @@
+import { describe, test, expect } from "@jest/globals";
+
+// Extract the exact processChunk function implementation for testing
+function createProcessChunk() {
+  return (chunk: unknown): { text: string | null; isToolCall: boolean } => {
+    const chunkObj = chunk as Record<string, unknown>;
+
+    // Helper function to check if content contains function call JSON
+    const containsToolCall = (content: string): boolean => {
+      return (
+        content.includes('"type": "function"') ||
+        content.includes('"name": "knowledge_search"') ||
+        content.includes('"parameters":') ||
+        !!content.match(/\{"type":\s*"function".*?\}/)
+      );
+    };
+
+    // Check if this chunk contains a tool call (function call)
+    let isToolCall = false;
+
+    // Check direct chunk content if it's a string
+    if (typeof chunk === "string") {
+      isToolCall = containsToolCall(chunk);
+    }
+
+    // Check delta structures
+    if (
+      chunkObj?.delta &&
+      typeof chunkObj.delta === "object" &&
+      chunkObj.delta !== null
+    ) {
+      const delta = chunkObj.delta as Record<string, unknown>;
+      if ("tool_calls" in delta) {
+        isToolCall = true;
+      }
+      if (typeof delta.text === "string") {
+        if (containsToolCall(delta.text)) {
+          isToolCall = true;
+        }
+      }
+    }
+
+    // Check event structures
+    if (
+      chunkObj?.event &&
+      typeof chunkObj.event === "object" &&
+      chunkObj.event !== null
+    ) {
+      const event = chunkObj.event as Record<string, unknown>;
+
+      // Check event payload
+      if (
+        event?.payload &&
+        typeof event.payload === "object" &&
+        event.payload !== null
+      ) {
+        const payload = event.payload as Record<string, unknown>;
+        if (typeof payload.content === "string") {
+          if (containsToolCall(payload.content)) {
+            isToolCall = true;
+          }
+        }
+
+        // Check payload delta
+        if (
+          payload?.delta &&
+          typeof payload.delta === "object" &&
+          payload.delta !== null
+        ) {
+          const delta = payload.delta as Record<string, unknown>;
+          if (typeof delta.text === "string") {
+            if (containsToolCall(delta.text)) {
+              isToolCall = true;
+            }
+          }
+        }
+      }
+
+      // Check event delta
+      if (
+        event?.delta &&
+        typeof event.delta === "object" &&
+        event.delta !== null
+      ) {
+        const delta = event.delta as Record<string, unknown>;
+        if (typeof delta.text === "string") {
+          if (containsToolCall(delta.text)) {
+            isToolCall = true;
+          }
+        }
+        if (typeof delta.content === "string") {
+          if (containsToolCall(delta.content)) {
+            isToolCall = true;
+          }
+        }
+      }
+    }
+
+    // if it's a tool call, skip it (don't display in chat)
+    if (isToolCall) {
+      return { text: null, isToolCall: true };
+    }
+
+    // Extract text content from various chunk formats
+    let text: string | null = null;
+
+    // Helper function to extract clean text content, filtering out function calls
+    const extractCleanText = (content: string): string | null => {
+      if (containsToolCall(content)) {
+        try {
+          // Try to parse and extract non-function call parts
+          const jsonMatch = content.match(
+            /\{"type":\s*"function"[^}]*\}[^}]*\}/
+          );
+          if (jsonMatch) {
+            const jsonPart = jsonMatch[0];
+            const parsedJson = JSON.parse(jsonPart);
+
+            // If it's a function call, extract text after JSON
+            if (parsedJson.type === "function") {
+              const textAfterJson = content
+                .substring(content.indexOf(jsonPart) + jsonPart.length)
+                .trim();
+              return textAfterJson || null;
+            }
+          }
+          // If we can't parse it properly, skip the whole thing
+          return null;
+        } catch {
+          return null;
+        }
+      }
+      return content;
+    };
+
+    // Try direct delta text
+    if (
+      chunkObj?.delta &&
+      typeof chunkObj.delta === "object" &&
+      chunkObj.delta !== null
+    ) {
+      const delta = chunkObj.delta as Record<string, unknown>;
+      if (typeof delta.text === "string") {
+        text = extractCleanText(delta.text);
+      }
+    }
+
+    // Try event structures
+    if (
+      !text &&
+      chunkObj?.event &&
+      typeof chunkObj.event === "object" &&
+      chunkObj.event !== null
+    ) {
+      const event = chunkObj.event as Record<string, unknown>;
+
+      // Try event payload content
+      if (
+        event?.payload &&
+        typeof event.payload === "object" &&
+        event.payload !== null
+      ) {
+        const payload = event.payload as Record<string, unknown>;
+
+        // Try direct payload content
+        if (typeof payload.content === "string") {
+          text = extractCleanText(payload.content);
+        }
+
+        // Try turn_complete event structure: payload.turn.output_message.content
+        if (
+          !text &&
+          payload?.turn &&
+          typeof payload.turn === "object" &&
+          payload.turn !== null
+        ) {
+          const turn = payload.turn as Record<string, unknown>;
+          if (
+            turn?.output_message &&
+            typeof turn.output_message === "object" &&
+            turn.output_message !== null
+          ) {
+            const outputMessage = turn.output_message as Record<
+              string,
+              unknown
+            >;
+            if (typeof outputMessage.content === "string") {
+              text = extractCleanText(outputMessage.content);
+            }
+          }
+
+          // Fallback to model_response in steps if no output_message
+          if (
+            !text &&
+            turn?.steps &&
+            Array.isArray(turn.steps) &&
+            turn.steps.length > 0
+          ) {
+            for (const step of turn.steps) {
+              if (step && typeof step === "object" && step !== null) {
+                const stepObj = step as Record<string, unknown>;
+                if (
+                  stepObj?.model_response &&
+                  typeof stepObj.model_response === "object" &&
+                  stepObj.model_response !== null
+                ) {
+                  const modelResponse = stepObj.model_response as Record<
+                    string,
+                    unknown
+                  >;
+                  if (typeof modelResponse.content === "string") {
+                    text = extractCleanText(modelResponse.content);
+                    break;
+                  }
+                }
+              }
+            }
+          }
+        }
+
+        // Try payload delta
+        if (
+          !text &&
+          payload?.delta &&
+          typeof payload.delta === "object" &&
+          payload.delta !== null
+        ) {
+          const delta = payload.delta as Record<string, unknown>;
+          if (typeof delta.text === "string") {
+            text = extractCleanText(delta.text);
+          }
+        }
+      }
+
+      // Try event delta
+      if (
+        !text &&
+        event?.delta &&
+        typeof event.delta === "object" &&
+        event.delta !== null
+      ) {
+        const delta = event.delta as Record<string, unknown>;
+        if (typeof delta.text === "string") {
+          text = extractCleanText(delta.text);
+        }
+        if (!text && typeof delta.content === "string") {
+          text = extractCleanText(delta.content);
+        }
+      }
+    }
+
+    // Try choices structure (ChatML format)
+    if (
+      !text &&
+      chunkObj?.choices &&
+      Array.isArray(chunkObj.choices) &&
+      chunkObj.choices.length > 0
+    ) {
+      const choice = chunkObj.choices[0] as Record<string, unknown>;
+      if (
+        choice?.delta &&
+        typeof choice.delta === "object" &&
+        choice.delta !== null
+      ) {
+        const delta = choice.delta as Record<string, unknown>;
+        if (typeof delta.content === "string") {
+          text = extractCleanText(delta.content);
+        }
+      }
+    }
+
+    // Try direct string content
+    if (!text && typeof chunk === "string") {
+      text = extractCleanText(chunk);
+    }
+
+    return { text, isToolCall: false };
+  };
+}
+
+describe("Chunk Processor", () => {
+  const processChunk = createProcessChunk();
+
+  describe("Real Event Structures", () => {
+    test("handles turn_complete event with cancellation policy response", () => {
+      const chunk = {
+        event: {
+          payload: {
+            event_type: "turn_complete",
+            turn: {
+              turn_id: "50a2d6b7-49ed-4d1e-b1c2-6d68b3f726db",
+              session_id: "e7f62b8e-518c-4450-82df-e65fe49f27a3",
+              input_messages: [
+                {
+                  role: "user",
+                  content: "nice, what's the cancellation policy?",
+                  context: null,
+                },
+              ],
+              steps: [
+                {
+                  turn_id: "50a2d6b7-49ed-4d1e-b1c2-6d68b3f726db",
+                  step_id: "54074310-af42-414c-9ffe-fba5b2ead0ad",
+                  started_at: "2025-08-27T18:15:25.870703Z",
+                  completed_at: "2025-08-27T18:15:51.288993Z",
+                  step_type: "inference",
+                  model_response: {
+                    role: "assistant",
+                    content:
+                      "According to the search results, the cancellation policy for Red Hat Summit is as follows:\n\n* Cancellations must be received by 5 PM EDT on April 18, 2025 for a 50% refund of the registration fee.\n* No refunds will be given for cancellations received after 5 PM EDT on April 18, 2025.\n* Cancellation of travel reservations and hotel reservations are the responsibility of the registrant.",
+                    stop_reason: "end_of_turn",
+                    tool_calls: [],
+                  },
+                },
+              ],
+              output_message: {
+                role: "assistant",
+                content:
+                  "According to the search results, the cancellation policy for Red Hat Summit is as follows:\n\n* Cancellations must be received by 5 PM EDT on April 18, 2025 for a 50% refund of the registration fee.\n* No refunds will be given for cancellations received after 5 PM EDT on April 18, 2025.\n* Cancellation of travel reservations and hotel reservations are the responsibility of the registrant.",
+                stop_reason: "end_of_turn",
+                tool_calls: [],
+              },
+              output_attachments: [],
+              started_at: "2025-08-27T18:15:25.868548Z",
+              completed_at: "2025-08-27T18:15:51.289262Z",
+            },
+          },
+        },
+      };
+
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toContain(
+        "According to the search results, the cancellation policy for Red Hat Summit is as follows:"
+      );
+      expect(result.text).toContain("5 PM EDT on April 18, 2025");
+    });
+
+    test("handles turn_complete event with address response", () => {
+      const chunk = {
+        event: {
+          payload: {
+            event_type: "turn_complete",
+            turn: {
+              turn_id: "2f4a1520-8ecc-4cb7-bb7b-886939e042b0",
+              session_id: "e7f62b8e-518c-4450-82df-e65fe49f27a3",
+              input_messages: [
+                {
+                  role: "user",
+                  content: "what's francisco's address",
+                  context: null,
+                },
+              ],
+              steps: [
+                {
+                  turn_id: "2f4a1520-8ecc-4cb7-bb7b-886939e042b0",
+                  step_id: "c13dd277-1acb-4419-8fbf-d5e2f45392ea",
+                  started_at: "2025-08-27T18:14:52.558761Z",
+                  completed_at: "2025-08-27T18:15:11.306032Z",
+                  step_type: "inference",
+                  model_response: {
+                    role: "assistant",
+                    content:
+                      "Francisco Arceo's address is:\n\nRed Hat\nUnited States\n17 Primrose Ln \nBasking Ridge New Jersey 07920",
+                    stop_reason: "end_of_turn",
+                    tool_calls: [],
+                  },
+                },
+              ],
+              output_message: {
+                role: "assistant",
+                content:
+                  "Francisco Arceo's address is:\n\nRed Hat\nUnited States\n17 Primrose Ln \nBasking Ridge New Jersey 07920",
+                stop_reason: "end_of_turn",
+                tool_calls: [],
+              },
+              output_attachments: [],
+              started_at: "2025-08-27T18:14:52.553707Z",
+              completed_at: "2025-08-27T18:15:11.306729Z",
+            },
+          },
+        },
+      };
+
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toContain("Francisco Arceo's address is:");
+      expect(result.text).toContain("17 Primrose Ln");
+      expect(result.text).toContain("Basking Ridge New Jersey 07920");
+    });
+
+    test("handles turn_complete event with ticket cost response", () => {
+      const chunk = {
+        event: {
+          payload: {
+            event_type: "turn_complete",
+            turn: {
+              turn_id: "7ef244a3-efee-42ca-a9c8-942865251002",
+              session_id: "e7f62b8e-518c-4450-82df-e65fe49f27a3",
+              input_messages: [
+                {
+                  role: "user",
+                  content: "what was the ticket cost for summit?",
+                  context: null,
+                },
+              ],
+              steps: [
+                {
+                  turn_id: "7ef244a3-efee-42ca-a9c8-942865251002",
+                  step_id: "7651dda0-315a-472d-b1c1-3c2725f55bc5",
+                  started_at: "2025-08-27T18:14:21.710611Z",
+                  completed_at: "2025-08-27T18:14:39.706452Z",
+                  step_type: "inference",
+                  model_response: {
+                    role: "assistant",
+                    content:
+                      "The ticket cost for the Red Hat Summit was $999.00 for a conference pass.",
+                    stop_reason: "end_of_turn",
+                    tool_calls: [],
+                  },
+                },
+              ],
+              output_message: {
+                role: "assistant",
+                content:
+                  "The ticket cost for the Red Hat Summit was $999.00 for a conference pass.",
+                stop_reason: "end_of_turn",
+                tool_calls: [],
+              },
+              output_attachments: [],
+              started_at: "2025-08-27T18:14:21.705289Z",
+              completed_at: "2025-08-27T18:14:39.706752Z",
+            },
+          },
+        },
+      };
+
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe(
+        "The ticket cost for the Red Hat Summit was $999.00 for a conference pass."
+      );
+    });
+  });
+
+  describe("Function Call Detection", () => {
+    test("detects function calls in direct string chunks", () => {
+      const chunk =
+        '{"type": "function", "name": "knowledge_search", "parameters": {"query": "test"}}';
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(true);
+      expect(result.text).toBe(null);
+    });
+
+    test("detects function calls in event payload content", () => {
+      const chunk = {
+        event: {
+          payload: {
+            content:
+              '{"type": "function", "name": "knowledge_search", "parameters": {"query": "test"}}',
+          },
+        },
+      };
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(true);
+      expect(result.text).toBe(null);
+    });
+
+    test("detects tool_calls in delta structure", () => {
+      const chunk = {
+        delta: {
+          tool_calls: [{ function: { name: "knowledge_search" } }],
+        },
+      };
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(true);
+      expect(result.text).toBe(null);
+    });
+
+    test("detects function call in mixed content but skips it", () => {
+      const chunk =
+        '{"type": "function", "name": "knowledge_search", "parameters": {"query": "test"}} Based on the search results, here is your answer.';
+      const result = processChunk(chunk);
+      // This is detected as a tool call and skipped entirely - the implementation prioritizes safety
+      expect(result.isToolCall).toBe(true);
+      expect(result.text).toBe(null);
+    });
+  });
+
+  describe("Text Extraction", () => {
+    test("extracts text from direct string chunks", () => {
+      const chunk = "Hello, this is a normal response.";
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe("Hello, this is a normal response.");
+    });
+
+    test("extracts text from delta structure", () => {
+      const chunk = {
+        delta: {
+          text: "Hello, this is a normal response.",
+        },
+      };
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe("Hello, this is a normal response.");
+    });
+
+    test("extracts text from choices structure", () => {
+      const chunk = {
+        choices: [
+          {
+            delta: {
+              content: "Hello, this is a normal response.",
+            },
+          },
+        ],
+      };
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe("Hello, this is a normal response.");
+    });
+
+    test("prioritizes output_message over model_response in turn structure", () => {
+      const chunk = {
+        event: {
+          payload: {
+            turn: {
+              steps: [
+                {
+                  model_response: {
+                    content: "Model response content.",
+                  },
+                },
+              ],
+              output_message: {
+                content: "Final output message content.",
+              },
+            },
+          },
+        },
+      };
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe("Final output message content.");
+    });
+
+    test("falls back to model_response when no output_message", () => {
+      const chunk = {
+        event: {
+          payload: {
+            turn: {
+              steps: [
+                {
+                  model_response: {
+                    content: "This is from the model response.",
+                  },
+                },
+              ],
+            },
+          },
+        },
+      };
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe("This is from the model response.");
+    });
+  });
+
+  describe("Edge Cases", () => {
+    test("handles empty chunks", () => {
+      const result = processChunk("");
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe("");
+    });
+
+    test("handles null chunks", () => {
+      const result = processChunk(null);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe(null);
+    });
+
+    test("handles undefined chunks", () => {
+      const result = processChunk(undefined);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe(null);
+    });
+
+    test("handles chunks with no text content", () => {
+      const chunk = {
+        event: {
+          metadata: {
+            timestamp: "2024-01-01",
+          },
+        },
+      };
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(false);
+      expect(result.text).toBe(null);
+    });
+
+    test("handles malformed JSON in function calls gracefully", () => {
+      const chunk =
+        '{"type": "function", "name": "knowledge_search"} incomplete json';
+      const result = processChunk(chunk);
+      expect(result.isToolCall).toBe(true);
+      expect(result.text).toBe(null);
+    });
+  });
+});
--- a/llama_stack/ui/app/chat-playground/page.test.tsx
+++ b/llama_stack/ui/app/chat-playground/page.test.tsx
@ -31,6 +31,9 @@ const mockClient = {
  toolgroups: {
    list: jest.fn(),
  },
+  vectorDBs: {
+    list: jest.fn(),
+  },
 };

 jest.mock("@/hooks/use-auth-client", () => ({
@ -164,7 +167,7 @@ describe("ChatPlaygroundPage", () => {
      session_name: "Test Session",
      started_at: new Date().toISOString(),
      turns: [],
-    }); // No turns by default
+    });
    mockClient.agents.retrieve.mockResolvedValue({
      agent_id: "test-agent",
      agent_config: {
@ -417,7 +420,6 @@ describe("ChatPlaygroundPage", () => {
      });

      await waitFor(() => {
-        // first agent should be auto-selected
        expect(mockClient.agents.session.create).toHaveBeenCalledWith(
          "agent_123",
          { session_name: "Default Session" }
@ -464,7 +466,7 @@ describe("ChatPlaygroundPage", () => {
      });
    });

-    test("hides delete button when only one agent exists", async () => {
+    test("shows delete button even when only one agent exists", async () => {
      mockClient.agents.list.mockResolvedValue({
        data: [mockAgents[0]],
      });
@ -474,9 +476,7 @@ describe("ChatPlaygroundPage", () => {
      });

      await waitFor(() => {
-        expect(
-          screen.queryByTitle("Delete current agent")
-        ).not.toBeInTheDocument();
+        expect(screen.getByTitle("Delete current agent")).toBeInTheDocument();
      });
    });

@ -505,7 +505,7 @@ describe("ChatPlaygroundPage", () => {
      await waitFor(() => {
        expect(mockClient.agents.delete).toHaveBeenCalledWith("agent_123");
        expect(global.confirm).toHaveBeenCalledWith(
-          "Are you sure you want to delete this agent? This action cannot be undone and will delete all associated sessions."
+          "Are you sure you want to delete this agent? This action cannot be undone and will delete the agent and all its sessions."
        );
      });

@ -584,4 +584,207 @@ describe("ChatPlaygroundPage", () => {
      consoleSpy.mockRestore();
    });
  });
+
+  describe("RAG File Upload", () => {
+    let mockFileReader: {
+      readAsDataURL: jest.Mock;
+      readAsText: jest.Mock;
+      result: string | null;
+      onload: (() => void) | null;
+      onerror: (() => void) | null;
+    };
+    let mockRAGTool: {
+      insert: jest.Mock;
+    };
+
+    beforeEach(() => {
+      mockFileReader = {
+        readAsDataURL: jest.fn(),
+        readAsText: jest.fn(),
+        result: null,
+        onload: null,
+        onerror: null,
+      };
+      global.FileReader = jest.fn(() => mockFileReader);
+
+      mockRAGTool = {
+        insert: jest.fn().mockResolvedValue({}),
+      };
+      mockClient.toolRuntime = {
+        ragTool: mockRAGTool,
+      };
+    });
+
+    afterEach(() => {
+      jest.clearAllMocks();
+    });
+
+    test("handles text file upload", async () => {
+      new File(["Hello, world!"], "test.txt", {
+        type: "text/plain",
+      });
+
+      mockClient.agents.retrieve.mockResolvedValue({
+        agent_id: "test-agent",
+        agent_config: {
+          toolgroups: [
+            {
+              name: "builtin::rag/knowledge_search",
+              args: { vector_db_ids: ["test-vector-db"] },
+            },
+          ],
+        },
+      });
+
+      await act(async () => {
+        render(<ChatPlaygroundPage />);
+      });
+
+      await waitFor(() => {
+        expect(screen.getByTestId("chat-component")).toBeInTheDocument();
+      });
+
+      const chatComponent = screen.getByTestId("chat-component");
+      chatComponent.getAttribute("data-onragfileupload");
+
+      // this is a simplified test
+      expect(mockRAGTool.insert).not.toHaveBeenCalled();
+    });
+
+    test("handles PDF file upload with FileReader", async () => {
+      new File([new ArrayBuffer(1000)], "test.pdf", {
+        type: "application/pdf",
+      });
+
+      const mockDataURL = "data:application/pdf;base64,JVBERi0xLjQK";
+      mockFileReader.result = mockDataURL;
+
+      mockClient.agents.retrieve.mockResolvedValue({
+        agent_id: "test-agent",
+        agent_config: {
+          toolgroups: [
+            {
+              name: "builtin::rag/knowledge_search",
+              args: { vector_db_ids: ["test-vector-db"] },
+            },
+          ],
+        },
+      });
+
+      await act(async () => {
+        render(<ChatPlaygroundPage />);
+      });
+
+      await waitFor(() => {
+        expect(screen.getByTestId("chat-component")).toBeInTheDocument();
+      });
+
+      expect(global.FileReader).toBeDefined();
+    });
+
+    test("handles different file types correctly", () => {
+      const getContentType = (filename: string): string => {
+        const ext = filename.toLowerCase().split(".").pop();
+        switch (ext) {
+          case "pdf":
+            return "application/pdf";
+          case "txt":
+            return "text/plain";
+          case "md":
+            return "text/markdown";
+          case "html":
+            return "text/html";
+          case "csv":
+            return "text/csv";
+          case "json":
+            return "application/json";
+          case "docx":
+            return "application/vnd.openxmlformats-officedocument.wordprocessingml.document";
+          case "doc":
+            return "application/msword";
+          default:
+            return "application/octet-stream";
+        }
+      };
+
+      expect(getContentType("test.pdf")).toBe("application/pdf");
+      expect(getContentType("test.txt")).toBe("text/plain");
+      expect(getContentType("test.md")).toBe("text/markdown");
+      expect(getContentType("test.html")).toBe("text/html");
+      expect(getContentType("test.csv")).toBe("text/csv");
+      expect(getContentType("test.json")).toBe("application/json");
+      expect(getContentType("test.docx")).toBe(
+        "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+      );
+      expect(getContentType("test.doc")).toBe("application/msword");
+      expect(getContentType("test.unknown")).toBe("application/octet-stream");
+    });
+
+    test("determines text vs binary file types correctly", () => {
+      const isTextFile = (mimeType: string): boolean => {
+        return (
+          mimeType.startsWith("text/") ||
+          mimeType === "application/json" ||
+          mimeType === "text/markdown" ||
+          mimeType === "text/html" ||
+          mimeType === "text/csv"
+        );
+      };
+
+      expect(isTextFile("text/plain")).toBe(true);
+      expect(isTextFile("text/markdown")).toBe(true);
+      expect(isTextFile("text/html")).toBe(true);
+      expect(isTextFile("text/csv")).toBe(true);
+      expect(isTextFile("application/json")).toBe(true);
+
+      expect(isTextFile("application/pdf")).toBe(false);
+      expect(isTextFile("application/msword")).toBe(false);
+      expect(
+        isTextFile(
+          "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+        )
+      ).toBe(false);
+      expect(isTextFile("application/octet-stream")).toBe(false);
+    });
+
+    test("handles FileReader error gracefully", async () => {
+      const pdfFile = new File([new ArrayBuffer(1000)], "test.pdf", {
+        type: "application/pdf",
+      });
+
+      mockFileReader.onerror = jest.fn();
+      const mockError = new Error("FileReader failed");
+
+      const fileReaderPromise = new Promise<string>((resolve, reject) => {
+        const reader = new FileReader();
+        reader.onload = () => resolve(reader.result as string);
+        reader.onerror = () => reject(reader.error || mockError);
+        reader.readAsDataURL(pdfFile);
+
+        setTimeout(() => {
+          reader.onerror?.(new ProgressEvent("error"));
+        }, 0);
+      });
+
+      await expect(fileReaderPromise).rejects.toBeDefined();
+    });
+
+    test("handles large file upload with FileReader approach", () => {
+      // create a large file
+      const largeFile = new File(
+        [new ArrayBuffer(10 * 1024 * 1024)],
+        "large.pdf",
+        {
+          type: "application/pdf",
+        }
+      );
+
+      expect(largeFile.size).toBe(10 * 1024 * 1024); // 10MB
+
+      expect(global.FileReader).toBeDefined();
+
+      const reader = new FileReader();
+      expect(reader.readAsDataURL).toBeDefined();
+    });
+  });
 });
--- a/llama_stack/ui/app/chat-playground/page.tsx
+++ b/llama_stack/ui/app/chat-playground/page.tsx
--- a/llama_stack/ui/components/chat-playground/chat.tsx
+++ b/llama_stack/ui/components/chat-playground/chat.tsx
@ -35,6 +35,7 @@ interface ChatPropsBase {
  ) => void;
  setMessages?: (messages: Message[]) => void;
  transcribeAudio?: (blob: Blob) => Promise<string>;
+  onRAGFileUpload?: (file: File) => Promise<void>;
 }

 interface ChatPropsWithoutSuggestions extends ChatPropsBase {
@ -62,6 +63,7 @@ export function Chat({
  onRateResponse,
  setMessages,
  transcribeAudio,
+  onRAGFileUpload,
 }: ChatProps) {
  const lastMessage = messages.at(-1);
  const isEmpty = messages.length === 0;
@ -226,16 +228,17 @@ export function Chat({
            isPending={isGenerating || isTyping}
            handleSubmit={handleSubmit}
          >
-            {({ files, setFiles }) => (
+            {() => (
              <MessageInput
                value={input}
                onChange={handleInputChange}
-                allowAttachments
-                files={files}
-                setFiles={setFiles}
+                allowAttachments={true}
+                files={null}
+                setFiles={() => {}}
                stop={handleStop}
                isGenerating={isGenerating}
                transcribeAudio={transcribeAudio}
+                onRAGFileUpload={onRAGFileUpload}
              />
            )}
          </ChatForm>
--- a/llama_stack/ui/components/chat-playground/conversations.tsx
+++ b/llama_stack/ui/components/chat-playground/conversations.tsx
@ -14,6 +14,7 @@ import { Card } from "@/components/ui/card";
 import { Trash2 } from "lucide-react";
 import type { Message } from "@/components/chat-playground/chat-message";
 import { useAuthClient } from "@/hooks/use-auth-client";
+import { cleanMessageContent } from "@/lib/message-content-utils";
 import type {
  Session,
  SessionCreateParams,
@ -219,10 +220,7 @@ export function Conversations({
            messages.push({
              id: `${turn.turn_id}-assistant-${messages.length}`,
              role: "assistant",
-              content:
-                typeof turn.output_message.content === "string"
-                  ? turn.output_message.content
-                  : JSON.stringify(turn.output_message.content),
+              content: cleanMessageContent(turn.output_message.content),
              createdAt: new Date(
                turn.completed_at || turn.started_at || Date.now()
              ),
@ -271,7 +269,7 @@ export function Conversations({
  );

  const deleteSession = async (sessionId: string) => {
-    if (sessions.length <= 1 || !selectedAgentId) {
+    if (!selectedAgentId) {
      return;
    }

@ -324,7 +322,6 @@ export function Conversations({
    }
  }, [currentSession]);

-  // Don't render if no agent is selected
  if (!selectedAgentId) {
    return null;
  }
@ -357,7 +354,7 @@ export function Conversations({
          + New
        </Button>

-        {currentSession && sessions.length > 1 && (
+        {currentSession && (
          <Button
            onClick={() => deleteSession(currentSession.id)}
            variant="outline"
--- a/llama_stack/ui/components/chat-playground/message-input.tsx
+++ b/llama_stack/ui/components/chat-playground/message-input.tsx
@ -21,6 +21,7 @@ interface MessageInputBaseProps
  isGenerating: boolean;
  enableInterrupt?: boolean;
  transcribeAudio?: (blob: Blob) => Promise<string>;
+  onRAGFileUpload?: (file: File) => Promise<void>;
 }

 interface MessageInputWithoutAttachmentProps extends MessageInputBaseProps {
@ -213,8 +214,13 @@ export function MessageInput({
              className
            )}
            {...(props.allowAttachments
-              ? omit(props, ["allowAttachments", "files", "setFiles"])
-              : omit(props, ["allowAttachments"]))}
+              ? omit(props, [
+                  "allowAttachments",
+                  "files",
+                  "setFiles",
+                  "onRAGFileUpload",
+                ])
+              : omit(props, ["allowAttachments", "onRAGFileUpload"]))}
          />

          {props.allowAttachments && (
@ -254,11 +260,19 @@ export function MessageInput({
            size="icon"
            variant="outline"
            className="h-8 w-8"
-            aria-label="Attach a file"
-            disabled={true}
+            aria-label="Upload file to RAG"
+            disabled={false}
            onClick={async () => {
-              const files = await showFileUploadDialog();
-              addFiles(files);
+              const input = document.createElement("input");
+              input.type = "file";
+              input.accept = ".pdf,.txt,.md,.html,.csv,.json";
+              input.onchange = async e => {
+                const file = (e.target as HTMLInputElement).files?.[0];
+                if (file && props.onRAGFileUpload) {
+                  await props.onRAGFileUpload(file);
+                }
+              };
+              input.click();
            }}
          >
            <Paperclip className="h-4 w-4" />
@ -337,28 +351,6 @@ function FileUploadOverlay({ isDragging }: FileUploadOverlayProps) {
  );
 }

-function showFileUploadDialog() {
-  const input = document.createElement("input");
-
-  input.type = "file";
-  input.multiple = true;
-  input.accept = "*/*";
-  input.click();
-
-  return new Promise<File[] | null>(resolve => {
-    input.onchange = e => {
-      const files = (e.currentTarget as HTMLInputElement).files;
-
-      if (files) {
-        resolve(Array.from(files));
-        return;
-      }
-
-      resolve(null);
-    };
-  });
-}
-
 function TranscribingOverlay() {
  return (
    <motion.div
--- a/llama_stack/ui/components/chat-playground/vector-db-creator.tsx
+++ b/llama_stack/ui/components/chat-playground/vector-db-creator.tsx
@ -0,0 +1,243 @@
+"use client";
+
+import { useState, useEffect } from "react";
+import { Button } from "@/components/ui/button";
+import { Input } from "@/components/ui/input";
+import { Card } from "@/components/ui/card";
+import {
+  Select,
+  SelectContent,
+  SelectItem,
+  SelectTrigger,
+  SelectValue,
+} from "@/components/ui/select";
+import { useAuthClient } from "@/hooks/use-auth-client";
+import type { Model } from "llama-stack-client/resources/models";
+
+interface VectorDBCreatorProps {
+  models: Model[];
+  onVectorDBCreated?: (vectorDbId: string) => void;
+  onCancel?: () => void;
+}
+
+interface VectorDBProvider {
+  api: string;
+  provider_id: string;
+  provider_type: string;
+}
+
+export function VectorDBCreator({
+  models,
+  onVectorDBCreated,
+  onCancel,
+}: VectorDBCreatorProps) {
+  const [vectorDbName, setVectorDbName] = useState("");
+  const [selectedEmbeddingModel, setSelectedEmbeddingModel] = useState("");
+  const [selectedProvider, setSelectedProvider] = useState("faiss");
+  const [availableProviders, setAvailableProviders] = useState<
+    VectorDBProvider[]
+  >([]);
+  const [isCreating, setIsCreating] = useState(false);
+  const [isLoadingProviders, setIsLoadingProviders] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+  const client = useAuthClient();
+
+  const embeddingModels = models.filter(
+    model => model.model_type === "embedding"
+  );
+
+  useEffect(() => {
+    const fetchProviders = async () => {
+      setIsLoadingProviders(true);
+      try {
+        const providersResponse = await client.providers.list();
+
+        const vectorIoProviders = providersResponse.filter(
+          (provider: VectorDBProvider) => provider.api === "vector_io"
+        );
+
+        setAvailableProviders(vectorIoProviders);
+
+        if (vectorIoProviders.length > 0) {
+          const faissProvider = vectorIoProviders.find(
+            (p: VectorDBProvider) => p.provider_id === "faiss"
+          );
+          setSelectedProvider(
+            faissProvider?.provider_id || vectorIoProviders[0].provider_id
+          );
+        }
+      } catch (err) {
+        console.error("Error fetching providers:", err);
+        setAvailableProviders([
+          {
+            api: "vector_io",
+            provider_id: "faiss",
+            provider_type: "inline::faiss",
+          },
+        ]);
+      } finally {
+        setIsLoadingProviders(false);
+      }
+    };
+
+    fetchProviders();
+  }, [client]);
+
+  const handleCreate = async () => {
+    if (!vectorDbName.trim() || !selectedEmbeddingModel) {
+      setError("Please provide a name and select an embedding model");
+      return;
+    }
+
+    setIsCreating(true);
+    setError(null);
+
+    try {
+      const embeddingModel = embeddingModels.find(
+        m => m.identifier === selectedEmbeddingModel
+      );
+
+      if (!embeddingModel) {
+        throw new Error("Selected embedding model not found");
+      }
+
+      const embeddingDimension = embeddingModel.metadata
+        ?.embedding_dimension as number;
+
+      if (!embeddingDimension) {
+        throw new Error("Embedding dimension not available for selected model");
+      }
+
+      const vectorDbId = vectorDbName.trim() || `vector_db_${Date.now()}`;
+
+      const response = await client.vectorDBs.register({
+        vector_db_id: vectorDbId,
+        embedding_model: selectedEmbeddingModel,
+        embedding_dimension: embeddingDimension,
+        provider_id: selectedProvider,
+      });
+
+      onVectorDBCreated?.(response.identifier || vectorDbId);
+    } catch (err) {
+      console.error("Error creating vector DB:", err);
+      setError(
+        err instanceof Error ? err.message : "Failed to create vector DB"
+      );
+    } finally {
+      setIsCreating(false);
+    }
+  };
+
+  return (
+    <Card className="p-6 space-y-4">
+      <h3 className="text-lg font-semibold">Create Vector Database</h3>
+
+      <div className="space-y-4">
+        <div>
+          <label className="text-sm font-medium block mb-2">
+            Vector DB Name
+          </label>
+          <Input
+            value={vectorDbName}
+            onChange={e => setVectorDbName(e.target.value)}
+            placeholder="My Vector Database"
+          />
+        </div>
+
+        <div>
+          <label className="text-sm font-medium block mb-2">
+            Embedding Model
+          </label>
+          <Select
+            value={selectedEmbeddingModel}
+            onValueChange={setSelectedEmbeddingModel}
+          >
+            <SelectTrigger>
+              <SelectValue placeholder="Select Embedding Model" />
+            </SelectTrigger>
+            <SelectContent>
+              {embeddingModels.map(model => (
+                <SelectItem key={model.identifier} value={model.identifier}>
+                  {model.identifier}
+                </SelectItem>
+              ))}
+            </SelectContent>
+          </Select>
+          {selectedEmbeddingModel && (
+            <p className="text-xs text-muted-foreground mt-1">
+              Dimension:{" "}
+              {embeddingModels.find(
+                m => m.identifier === selectedEmbeddingModel
+              )?.metadata?.embedding_dimension || "Unknown"}
+            </p>
+          )}
+        </div>
+
+        <div>
+          <label className="text-sm font-medium block mb-2">
+            Vector Database Provider
+          </label>
+          <Select
+            value={selectedProvider}
+            onValueChange={setSelectedProvider}
+            disabled={isLoadingProviders}
+          >
+            <SelectTrigger>
+              <SelectValue
+                placeholder={
+                  isLoadingProviders
+                    ? "Loading providers..."
+                    : "Select Provider"
+                }
+              />
+            </SelectTrigger>
+            <SelectContent>
+              {availableProviders.map(provider => (
+                <SelectItem
+                  key={provider.provider_id}
+                  value={provider.provider_id}
+                >
+                  {provider.provider_id}
+                </SelectItem>
+              ))}
+            </SelectContent>
+          </Select>
+          {selectedProvider && (
+            <p className="text-xs text-muted-foreground mt-1">
+              Selected provider: {selectedProvider}
+            </p>
+          )}
+        </div>
+
+        {error && (
+          <div className="text-destructive text-sm bg-destructive/10 p-2 rounded">
+            {error}
+          </div>
+        )}
+
+        <div className="flex gap-2 pt-2">
+          <Button
+            onClick={handleCreate}
+            disabled={
+              isCreating || !vectorDbName.trim() || !selectedEmbeddingModel
+            }
+            className="flex-1"
+          >
+            {isCreating ? "Creating..." : "Create Vector DB"}
+          </Button>
+          {onCancel && (
+            <Button variant="outline" onClick={onCancel} className="flex-1">
+              Cancel
+            </Button>
+          )}
+        </div>
+      </div>
+
+      <div className="text-xs text-muted-foreground bg-muted/50 p-3 rounded">
+        <strong>Note:</strong> This will create a new vector database that can
+        be used with RAG tools. After creation, you&apos;ll be able to upload
+        documents and use it for knowledge search in your agent conversations.
+      </div>
+    </Card>
+  );
+}
--- a/llama_stack/ui/lib/message-content-utils.ts
+++ b/llama_stack/ui/lib/message-content-utils.ts
@ -0,0 +1,51 @@
+// check if content contains function call JSON
+export const containsToolCall = (content: string): boolean => {
+  return (
+    content.includes('"type": "function"') ||
+    content.includes('"name": "knowledge_search"') ||
+    content.includes('"parameters":') ||
+    !!content.match(/\{"type":\s*"function".*?\}/)
+  );
+};
+
+export const extractCleanText = (content: string): string | null => {
+  if (containsToolCall(content)) {
+    try {
+      // parse and extract non-function call parts
+      const jsonMatch = content.match(/\{"type":\s*"function"[^}]*\}[^}]*\}/);
+      if (jsonMatch) {
+        const jsonPart = jsonMatch[0];
+        const parsedJson = JSON.parse(jsonPart);
+
+        // if function call, extract text after JSON
+        if (parsedJson.type === "function") {
+          const textAfterJson = content
+            .substring(content.indexOf(jsonPart) + jsonPart.length)
+            .trim();
+          return textAfterJson || null;
+        }
+      }
+      return null;
+    } catch {
+      return null;
+    }
+  }
+  return content;
+};
+
+// removes function call JSON handling different content types
+export const cleanMessageContent = (
+  content: string | unknown[] | unknown
+): string => {
+  if (typeof content === "string") {
+    const cleaned = extractCleanText(content);
+    return cleaned || "";
+  } else if (Array.isArray(content)) {
+    return content
+      .filter((item: { type: string }) => item.type === "text")
+      .map((item: { text: string }) => item.text)
+      .join("");
+  } else {
+    return JSON.stringify(content);
+  }
+};
--- a/llama_stack/ui/package-lock.json
+++ b/llama_stack/ui/package-lock.json
@ -18,7 +18,7 @@
        "class-variance-authority": "^0.7.1",
        "clsx": "^2.1.1",
        "framer-motion": "^11.18.2",
-        "llama-stack-client": "^0.2.18",
+        "llama-stack-client": "^0.2.19",
        "lucide-react": "^0.510.0",
        "next": "15.3.3",
        "next-auth": "^4.24.11",
@ -36,7 +36,7 @@
        "@eslint/eslintrc": "^3",
        "@tailwindcss/postcss": "^4",
        "@testing-library/dom": "^10.4.1",
-        "@testing-library/jest-dom": "^6.6.3",
+        "@testing-library/jest-dom": "^6.8.0",
        "@testing-library/react": "^16.3.0",
        "@types/jest": "^29.5.14",
        "@types/node": "^20",
@ -3597,18 +3597,17 @@
      }
    },
    "node_modules/@testing-library/jest-dom": {
-      "version": "6.6.3",
-      "resolved": "https://registry.npmjs.org/@testing-library/jest-dom/-/jest-dom-6.6.3.tgz",
-      "integrity": "sha512-IteBhl4XqYNkM54f4ejhLRJiZNqcSCoXUOG2CPK7qbD322KjQozM4kHQOfkG2oln9b9HTYqs+Sae8vBATubxxA==",
+      "version": "6.8.0",
+      "resolved": "https://registry.npmjs.org/@testing-library/jest-dom/-/jest-dom-6.8.0.tgz",
+      "integrity": "sha512-WgXcWzVM6idy5JaftTVC8Vs83NKRmGJz4Hqs4oyOuO2J4r/y79vvKZsb+CaGyCSEbUPI6OsewfPd0G1A0/TUZQ==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
        "@adobe/css-tools": "^4.4.0",
        "aria-query": "^5.0.0",
-        "chalk": "^3.0.0",
        "css.escape": "^1.5.1",
        "dom-accessibility-api": "^0.6.3",
-        "lodash": "^4.17.21",
+        "picocolors": "^1.1.1",
        "redent": "^3.0.0"
      },
      "engines": {
@ -3617,20 +3616,6 @@
        "yarn": ">=1"
      }
    },
-    "node_modules/@testing-library/jest-dom/node_modules/chalk": {
-      "version": "3.0.0",
-      "resolved": "https://registry.npmjs.org/chalk/-/chalk-3.0.0.tgz",
-      "integrity": "sha512-4D3B6Wf41KOYRFdszmDqMCGq5VV/uMAB273JILmO+3jAlh8X4qDtdtgCR3fxtbLEMzSx22QdhnDcJvu2u1fVwg==",
-      "dev": true,
-      "license": "MIT",
-      "dependencies": {
-        "ansi-styles": "^4.1.0",
-        "supports-color": "^7.1.0"
-      },
-      "engines": {
-        "node": ">=8"
-      }
-    },
    "node_modules/@testing-library/jest-dom/node_modules/dom-accessibility-api": {
      "version": "0.6.3",
      "resolved": "https://registry.npmjs.org/dom-accessibility-api/-/dom-accessibility-api-0.6.3.tgz",
@ -10021,9 +10006,9 @@
      "license": "MIT"
    },
    "node_modules/llama-stack-client": {
-      "version": "0.2.18",
-      "resolved": "https://registry.npmjs.org/llama-stack-client/-/llama-stack-client-0.2.18.tgz",
-      "integrity": "sha512-k+xQOz/TIU0cINP4Aih8q6xs7f/6qs0fLDMXTTKQr5C0F1jtCjRiwsas7bTsDfpKfYhg/7Xy/wPw/uZgi6aIVg==",
+      "version": "0.2.19",
+      "resolved": "https://registry.npmjs.org/llama-stack-client/-/llama-stack-client-0.2.19.tgz",
+      "integrity": "sha512-sDuAhUdEGlERZ3jlMUzPXcQTgMv/pGbDrPX0ifbE5S+gr7Q+7ohuQYrIXe+hXgIipFjq+y4b2c5laZ76tmAyEA==",
      "license": "MIT",
      "dependencies": {
        "@types/node": "^18.11.18",
@ -10066,13 +10051,6 @@
        "url": "https://github.com/sponsors/sindresorhus"
      }
    },
-    "node_modules/lodash": {
-      "version": "4.17.21",
-      "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
-      "integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==",
-      "dev": true,
-      "license": "MIT"
-    },
    "node_modules/lodash.merge": {
      "version": "4.6.2",
      "resolved": "https://registry.npmjs.org/lodash.merge/-/lodash.merge-4.6.2.tgz",
--- a/llama_stack/ui/package.json
+++ b/llama_stack/ui/package.json
@ -23,7 +23,7 @@
    "class-variance-authority": "^0.7.1",
    "clsx": "^2.1.1",
    "framer-motion": "^11.18.2",
-    "llama-stack-client": "^0.2.18",
+    "llama-stack-client": "^0.2.19",
    "lucide-react": "^0.510.0",
    "next": "15.3.3",
    "next-auth": "^4.24.11",
@ -41,7 +41,7 @@
    "@eslint/eslintrc": "^3",
    "@tailwindcss/postcss": "^4",
    "@testing-library/dom": "^10.4.1",
-    "@testing-library/jest-dom": "^6.6.3",
+    "@testing-library/jest-dom": "^6.8.0",
    "@testing-library/react": "^16.3.0",
    "@types/jest": "^29.5.14",
    "@types/node": "^20",
--- a/pyproject.toml
+++ b/pyproject.toml
@ -7,7 +7,7 @@ required-version = ">=0.7.0"

 [project]
 name = "llama_stack"
-version = "0.2.18"
+version = "0.2.19"
 authors = [{ name = "Meta Llama", email = "llama-oss@meta.com" }]
 description = "Llama Stack"
 readme = "README.md"
@ -31,7 +31,7 @@ dependencies = [
    "huggingface-hub>=0.34.0,<1.0",
    "jinja2>=3.1.6",
    "jsonschema",
-    "llama-stack-client>=0.2.18",
+    "llama-stack-client>=0.2.19",
    "llama-api-client>=0.1.2",
    "openai>=1.99.6,<1.100.0",
    "prompt-toolkit",
@ -56,7 +56,7 @@ dependencies = [
 ui = [
    "streamlit",
    "pandas",
-    "llama-stack-client>=0.2.18",
+    "llama-stack-client>=0.2.19",
    "streamlit-option-menu",
 ]

--- a/tests/integration/recordings/index.sqlite
+++ b/tests/integration/recordings/index.sqlite
--- a/tests/unit/distribution/test_inference_recordings.py
+++ b/tests/unit/distribution/test_inference_recordings.py
@ -4,7 +4,6 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.

-import sqlite3
 import tempfile
 from pathlib import Path
 from unittest.mock import patch
@ -133,7 +132,6 @@ class TestInferenceRecording:
        # Test directory creation
        assert storage.test_dir.exists()
        assert storage.responses_dir.exists()
-        assert storage.db_path.exists()

        # Test storing and retrieving a recording
        request_hash = "test_hash_123"
@ -147,15 +145,6 @@ class TestInferenceRecording:

        storage.store_recording(request_hash, request_data, response_data)

-        # Verify SQLite record
-        with sqlite3.connect(storage.db_path) as conn:
-            result = conn.execute("SELECT * FROM recordings WHERE request_hash = ?", (request_hash,)).fetchone()
-
-        assert result is not None
-        assert result[0] == request_hash  # request_hash
-        assert result[2] == "/v1/chat/completions"  # endpoint
-        assert result[3] == "llama3.2:3b"  # model
-
        # Verify file storage and retrieval
        retrieved = storage.find_recording(request_hash)
        assert retrieved is not None
@ -185,10 +174,7 @@ class TestInferenceRecording:

        # Verify recording was stored
        storage = ResponseStorage(temp_storage_dir)
-        with sqlite3.connect(storage.db_path) as conn:
-            recordings = conn.execute("SELECT COUNT(*) FROM recordings").fetchone()[0]
-
-        assert recordings == 1
+        assert storage.responses_dir.exists()

    async def test_replay_mode(self, temp_storage_dir, real_openai_chat_response):
        """Test that replay mode returns stored responses without making real calls."""
--- a/tests/unit/server/test_replace_env_vars.py
+++ b/tests/unit/server/test_replace_env_vars.py
@ -88,3 +88,10 @@ def test_nested_structures(setup_env_vars):
    }
    expected = {"key1": "test_value", "key2": ["default", "conditional"], "key3": {"nested": None}}
    assert replace_env_vars(data) == expected
+
+
+def test_explicit_strings_preserved(setup_env_vars):
+    # Explicit strings that look like numbers/booleans should remain strings
+    data = {"port": "8080", "enabled": "true", "count": "123", "ratio": "3.14"}
+    expected = {"port": "8080", "enabled": "true", "count": "123", "ratio": "3.14"}
+    assert replace_env_vars(data) == expected
--- a/uv.lock
+++ b/uv.lock
@ -1128,6 +1128,9 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/4f/72/dcbc6dbf838549b7b0c2c18c1365d2580eb7456939e4b608c3ab213fce78/geventhttpclient-2.3.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:9ac30c38d86d888b42bb2ab2738ab9881199609e9fa9a153eb0c66fc9188c6cb", size = 71984, upload-time = "2025-06-11T13:17:09.126Z" },
    { url = "https://files.pythonhosted.org/packages/4c/f9/74aa8c556364ad39b238919c954a0da01a6154ad5e85a1d1ab5f9f5ac186/geventhttpclient-2.3.4-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:4b802000a4fad80fa57e895009671d6e8af56777e3adf0d8aee0807e96188fd9", size = 52631, upload-time = "2025-06-11T13:17:10.061Z" },
    { url = "https://files.pythonhosted.org/packages/11/1a/bc4b70cba8b46be8b2c6ca5b8067c4f086f8c90915eb68086ab40ff6243d/geventhttpclient-2.3.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:461e4d9f4caee481788ec95ac64e0a4a087c1964ddbfae9b6f2dc51715ba706c", size = 51991, upload-time = "2025-06-11T13:17:11.049Z" },
+    { url = "https://files.pythonhosted.org/packages/03/3f/5ce6e003b3b24f7caf3207285831afd1a4f857ce98ac45e1fb7a6815bd58/geventhttpclient-2.3.4-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:b7e41687c74e8fbe6a665458bbaea0c5a75342a95e2583738364a73bcbf1671b", size = 114982, upload-time = "2025-08-24T12:16:50.76Z" },
+    { url = "https://files.pythonhosted.org/packages/60/16/6f9dad141b7c6dd7ee831fbcd72dd02535c57bc1ec3c3282f07e72c31344/geventhttpclient-2.3.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c3ea5da20f4023cf40207ce15f5f4028377ffffdba3adfb60b4c8f34925fce79", size = 115654, upload-time = "2025-08-24T12:16:52.072Z" },
+    { url = "https://files.pythonhosted.org/packages/ba/52/9b516a2ff423d8bd64c319e1950a165ceebb552781c5a88c1e94e93e8713/geventhttpclient-2.3.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:91f19a8a6899c27867dbdace9500f337d3e891a610708e86078915f1d779bf53", size = 121672, upload-time = "2025-08-24T12:16:53.361Z" },
    { url = "https://files.pythonhosted.org/packages/b0/f5/8d0f1e998f6d933c251b51ef92d11f7eb5211e3cd579018973a2b455f7c5/geventhttpclient-2.3.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:41f2dcc0805551ea9d49f9392c3b9296505a89b9387417b148655d0d8251b36e", size = 119012, upload-time = "2025-06-11T13:17:11.956Z" },
    { url = "https://files.pythonhosted.org/packages/ea/0e/59e4ab506b3c19fc72e88ca344d150a9028a00c400b1099637100bec26fc/geventhttpclient-2.3.4-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:62f3a29bf242ecca6360d497304900683fd8f42cbf1de8d0546c871819251dad", size = 124565, upload-time = "2025-06-11T13:17:12.896Z" },
    { url = "https://files.pythonhosted.org/packages/39/5d/dcbd34dfcda0c016b4970bd583cb260cc5ebfc35b33d0ec9ccdb2293587a/geventhttpclient-2.3.4-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8714a3f2c093aeda3ffdb14c03571d349cb3ed1b8b461d9f321890659f4a5dbf", size = 115573, upload-time = "2025-06-11T13:17:13.937Z" },
@ -1141,6 +1144,9 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ff/ad/132fddde6e2dca46d6a86316962437acd2bfaeb264db4e0fae83c529eb04/geventhttpclient-2.3.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:be64c5583884c407fc748dedbcb083475d5b138afb23c6bc0836cbad228402cc", size = 71967, upload-time = "2025-06-11T13:17:22.121Z" },
    { url = "https://files.pythonhosted.org/packages/f4/34/5e77d9a31d93409a8519cf573843288565272ae5a016be9c9293f56c50a1/geventhttpclient-2.3.4-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:15b2567137734183efda18e4d6245b18772e648b6a25adea0eba8b3a8b0d17e8", size = 52632, upload-time = "2025-06-11T13:17:23.016Z" },
    { url = "https://files.pythonhosted.org/packages/47/d2/cf0dbc333304700e68cee9347f654b56e8b0f93a341b8b0d027ee96800d6/geventhttpclient-2.3.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a4bca1151b8cd207eef6d5cb3c720c562b2aa7293cf113a68874e235cfa19c31", size = 51980, upload-time = "2025-06-11T13:17:23.933Z" },
+    { url = "https://files.pythonhosted.org/packages/27/6e/049e685fc43e2e966c83f24b3187f6a6736103f0fc51118140f4ca1793d4/geventhttpclient-2.3.4-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:8a681433e2f3d4b326d8b36b3e05b787b2c6dd2a5660a4a12527622278bf02ed", size = 114998, upload-time = "2025-08-24T12:16:54.72Z" },
+    { url = "https://files.pythonhosted.org/packages/24/13/1d08cf0400bf0fe0bb21e70f3f5fab2130aecef962b4362b7a1eba3cd738/geventhttpclient-2.3.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:736aa8e9609e4da40aeff0dbc02fea69021a034f4ed1e99bf93fc2ca83027b64", size = 115690, upload-time = "2025-08-24T12:16:56.328Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/bc/15d22882983cac573859d274783c5b0a95881e553fc312e7b646be432668/geventhttpclient-2.3.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:9d477ae1f5d42e1ee6abbe520a2e9c7f369781c3b8ca111d1f5283c1453bc825", size = 121681, upload-time = "2025-08-24T12:16:58.344Z" },
    { url = "https://files.pythonhosted.org/packages/ec/5b/c0c30ccd9d06c603add3f2d6abd68bd98430ee9730dc5478815759cf07f7/geventhttpclient-2.3.4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9b50d9daded5d36193d67e2fc30e59752262fcbbdc86e8222c7df6b93af0346a", size = 118987, upload-time = "2025-06-11T13:17:24.97Z" },
    { url = "https://files.pythonhosted.org/packages/4f/56/095a46af86476372064128162eccbd2ba4a7721503759890d32ea701d5fd/geventhttpclient-2.3.4-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:fe705e7656bc6982a463a4ed7f9b1db8c78c08323f1d45d0d1d77063efa0ce96", size = 124519, upload-time = "2025-06-11T13:17:25.933Z" },
    { url = "https://files.pythonhosted.org/packages/ae/12/7c9ba94b58f7954a83d33183152ce6bf5bda10c08ebe47d79a314cd33e29/geventhttpclient-2.3.4-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:69668589359db4cbb9efa327dda5735d1e74145e6f0a9ffa50236d15cf904053", size = 115574, upload-time = "2025-06-11T13:17:27.331Z" },
@ -1151,6 +1157,24 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ca/36/9065bb51f261950c42eddf8718e01a9ff344d8082e31317a8b6677be9bd6/geventhttpclient-2.3.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8d1d0db89c1c8f3282eac9a22fda2b4082e1ed62a2107f70e3f1de1872c7919f", size = 112245, upload-time = "2025-06-11T13:17:32.331Z" },
    { url = "https://files.pythonhosted.org/packages/21/7e/08a615bec095c288f997951e42e48b262d43c6081bef33cfbfad96ab9658/geventhttpclient-2.3.4-cp313-cp313-win32.whl", hash = "sha256:4e492b9ab880f98f8a9cc143b96ea72e860946eae8ad5fb2837cede2a8f45154", size = 48360, upload-time = "2025-06-11T13:17:33.349Z" },
    { url = "https://files.pythonhosted.org/packages/ec/19/ef3cb21e7e95b14cfcd21e3ba7fe3d696e171682dfa43ab8c0a727cac601/geventhttpclient-2.3.4-cp313-cp313-win_amd64.whl", hash = "sha256:72575c5b502bf26ececccb905e4e028bb922f542946be701923e726acf305eb6", size = 48956, upload-time = "2025-06-11T13:17:34.956Z" },
+    { url = "https://files.pythonhosted.org/packages/06/45/c41697c7d0cae17075ba535fb901985c2873461a9012e536de679525e28d/geventhttpclient-2.3.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:503db5dd0aa94d899c853b37e1853390c48c7035132f39a0bab44cbf95d29101", size = 71999, upload-time = "2025-08-24T12:17:00.419Z" },
+    { url = "https://files.pythonhosted.org/packages/5d/f7/1d953cafecf8f1681691977d9da9b647d2e02996c2431fb9b718cfdd3013/geventhttpclient-2.3.4-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:389d3f83316220cfa2010f41401c140215a58ddba548222e7122b2161e25e391", size = 52656, upload-time = "2025-08-24T12:17:01.337Z" },
+    { url = "https://files.pythonhosted.org/packages/5c/ca/4bd19040905e911dd8771a4ab74630eadc9ee9072b01ab504332dada2619/geventhttpclient-2.3.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:20c65d404fa42c95f6682831465467dff317004e53602c01f01fbd5ba1e56628", size = 51978, upload-time = "2025-08-24T12:17:02.282Z" },
+    { url = "https://files.pythonhosted.org/packages/11/01/c457257ee41236347dac027e63289fa3f92f164779458bd244b376122bf6/geventhttpclient-2.3.4-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:2574ee47ff6f379e9ef124e2355b23060b81629f1866013aa975ba35df0ed60b", size = 115033, upload-time = "2025-08-24T12:17:03.272Z" },
+    { url = "https://files.pythonhosted.org/packages/cc/c1/ef3ddc24b402eb3caa19dacbcd08d7129302a53d9b9109c84af1ea74e31a/geventhttpclient-2.3.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fecf1b735591fb21ea124a374c207104a491ad0d772709845a10d5faa07fa833", size = 115762, upload-time = "2025-08-24T12:17:04.288Z" },
+    { url = "https://files.pythonhosted.org/packages/a9/97/8dca246262e9a1ebd639120151db00e34b7d10f60bdbca8481878b91801a/geventhttpclient-2.3.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:44e9ba810c28f9635e5c4c9cf98fc6470bad5a3620d8045d08693f7489493a3c", size = 121757, upload-time = "2025-08-24T12:17:05.273Z" },
+    { url = "https://files.pythonhosted.org/packages/10/7b/41bff3cbdeff3d06d45df3c61fa39cd25e60fa9d21c709ec6aeb58e9b58f/geventhttpclient-2.3.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:501d5c69adecd5eaee3c22302006f6c16aa114139640873b72732aa17dab9ee7", size = 111747, upload-time = "2025-08-24T12:17:06.585Z" },
+    { url = "https://files.pythonhosted.org/packages/64/e6/3732132fda94082ec8793e3ae0d4d7fff6c1cb8e358e9664d1589499f4b1/geventhttpclient-2.3.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:709f557138fb84ed32703d42da68f786459dab77ff2c23524538f2e26878d154", size = 118487, upload-time = "2025-08-24T12:17:07.816Z" },
+    { url = "https://files.pythonhosted.org/packages/93/29/d48d119dee6c42e066330860186df56a80d4e76d2821a6c706ead49006d7/geventhttpclient-2.3.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:b8b86815a30e026c6677b89a5a21ba5fd7b69accf8f0e9b83bac123e4e9f3b31", size = 112198, upload-time = "2025-08-24T12:17:08.867Z" },
+    { url = "https://files.pythonhosted.org/packages/56/48/556adff8de1bd3469b58394f441733bb3c76cb22c2600cf2ee753e73d47f/geventhttpclient-2.3.4-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:4371b1b1afc072ad2b0ff5a8929d73ffd86d582908d3e9e8d7911dc027b1b3a6", size = 72354, upload-time = "2025-08-24T12:17:10.671Z" },
+    { url = "https://files.pythonhosted.org/packages/7c/77/f1b32a91350382978cde0ddfee4089b94e006eb0f3e7297196d9d5451217/geventhttpclient-2.3.4-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:6409fcda1f40d66eab48afc218b4c41e45a95c173738d10c50bc69c7de4261b9", size = 52835, upload-time = "2025-08-24T12:17:12.164Z" },
+    { url = "https://files.pythonhosted.org/packages/d3/06/124f95556e0d5b4c417ec01fc30d91a3e4fe4524a44d2f629a1b1a721984/geventhttpclient-2.3.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:142870c2efb6bd0a593dcd75b83defb58aeb72ceaec4c23186785790bd44a311", size = 52165, upload-time = "2025-08-24T12:17:13.465Z" },
+    { url = "https://files.pythonhosted.org/packages/76/9c/0850256e4461b0a90f2cf5c8156ea8f97e93a826aa76d7be70c9c6d4ba0f/geventhttpclient-2.3.4-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:3a74f7b926badb3b1d47ea987779cb83523a406e89203070b58b20cf95d6f535", size = 117929, upload-time = "2025-08-24T12:17:14.477Z" },
+    { url = "https://files.pythonhosted.org/packages/ca/55/3b54d0c0859efac95ba2649aeb9079a3523cdd7e691549ead2862907dc7d/geventhttpclient-2.3.4-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:2a8cde016e5ea6eb289c039b6af8dcef6c3ee77f5d753e57b48fe2555cdeacca", size = 119584, upload-time = "2025-08-24T12:17:15.709Z" },
+    { url = "https://files.pythonhosted.org/packages/84/df/84ce132a0eb2b6d4f86e68a828e3118419cb0411cae101e4bad256c3f321/geventhttpclient-2.3.4-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:5aa16f2939a508667093b18e47919376f7db9a9acbe858343173c5a58e347869", size = 125388, upload-time = "2025-08-24T12:17:16.915Z" },
+    { url = "https://files.pythonhosted.org/packages/e8/4f/8156b9f6e25e4f18a60149bd2925f56f1ed7a1f8d520acb5a803536adadd/geventhttpclient-2.3.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:ffe87eb7f1956357c2144a56814b5ffc927cbb8932f143a0351c78b93129ebbc", size = 115214, upload-time = "2025-08-24T12:17:17.945Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/5a/b01657605c16ac4555b70339628a33fc7ca41ace58da167637ef72ad0a8e/geventhttpclient-2.3.4-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:5ee758e37215da9519cea53105b2a078d8bc0a32603eef2a1f9ab551e3767dee", size = 121862, upload-time = "2025-08-24T12:17:18.97Z" },
+    { url = "https://files.pythonhosted.org/packages/84/ca/c4e36a9b1bcce9958d8886aa4f7b262c8e9a7c43a284f2d79abfc9ba715d/geventhttpclient-2.3.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:416cc70adb3d34759e782d2e120b4432752399b85ac9758932ecd12274a104c3", size = 114999, upload-time = "2025-08-24T12:17:19.978Z" },
 ]

 [[package]]
@ -1743,7 +1767,7 @@ wheels = [

 [[package]]
 name = "llama-stack"
-version = "0.2.18"
+version = "0.2.19"
 source = { editable = "." }
 dependencies = [
    { name = "aiohttp" },
@ -1881,8 +1905,8 @@ requires-dist = [
    { name = "jinja2", specifier = ">=3.1.6" },
    { name = "jsonschema" },
    { name = "llama-api-client", specifier = ">=0.1.2" },
-    { name = "llama-stack-client", specifier = ">=0.2.18" },
-    { name = "llama-stack-client", marker = "extra == 'ui'", specifier = ">=0.2.18" },
+    { name = "llama-stack-client", specifier = ">=0.2.19" },
+    { name = "llama-stack-client", marker = "extra == 'ui'", specifier = ">=0.2.19" },
    { name = "openai", specifier = ">=1.99.6,<1.100.0" },
    { name = "opentelemetry-exporter-otlp-proto-http", specifier = ">=1.30.0" },
    { name = "opentelemetry-sdk", specifier = ">=1.30.0" },
@ -1989,7 +2013,7 @@ unit = [

 [[package]]
 name = "llama-stack-client"
-version = "0.2.18"
+version = "0.2.19"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "anyio" },
@ -2008,9 +2032,9 @@ dependencies = [
    { name = "tqdm" },
    { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/69/da/5e5a745495f8a2b8ef24fc4d01fe9031aa2277c36447cb22192ec8c8cc1e/llama_stack_client-0.2.18.tar.gz", hash = "sha256:860c885c9e549445178ac55cc9422e6e2a91215ac7aff5aaccfb42f3ce07e79e", size = 277284, upload-time = "2025-08-19T22:12:09.106Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/14/e4/72683c10188ae93e97551ab6eeac725e46f13ec215618532505a7d91bf2b/llama_stack_client-0.2.19.tar.gz", hash = "sha256:6c857e528b83af7821120002ebe4d3db072fd9f7bf867a152a34c70fe606833f", size = 318325, upload-time = "2025-08-26T21:54:20.592Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/0a/e4/e97f8fdd8a07aa1efc7f7e37b5657d84357b664bf70dd1885a437edc0699/llama_stack_client-0.2.18-py3-none-any.whl", hash = "sha256:90f827d5476f7fc15fd993f1863af6a6e72bd064646bf6a99435eb43a1327f70", size = 367586, upload-time = "2025-08-19T22:12:07.899Z" },
+    { url = "https://files.pythonhosted.org/packages/51/51/c8dde9fae58193a539eac700502876d8edde8be354c2784ff7b707a47432/llama_stack_client-0.2.19-py3-none-any.whl", hash = "sha256:478565a54541ca03ca9f8fe2019f4136f93ab6afe9591bdd44bc6dde6ddddbd9", size = 369905, upload-time = "2025-08-26T21:54:18.929Z" },
 ]

 [[package]]
@ -4713,9 +4737,9 @@ dependencies = [
    { name = "typing-extensions", marker = "sys_platform == 'darwin'" },
 ]
 wheels = [
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp312-none-macosx_11_0_arm64.whl", hash = "sha256:a47b7986bee3f61ad217d8a8ce24605809ab425baf349f97de758815edd2ef54" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:fbe2e149c5174ef90d29a5f84a554dfaf28e003cb4f61fa2c8c024c17ec7ca58" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:057efd30a6778d2ee5e2374cd63a63f63311aa6f33321e627c655df60abdd390" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp312-none-macosx_11_0_arm64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl" },
 ]

 [[package]]
@ -4738,19 +4762,19 @@ dependencies = [
    { name = "typing-extensions", marker = "sys_platform != 'darwin'" },
 ]
 wheels = [
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-linux_s390x.whl", hash = "sha256:0e34e276722ab7dd0dffa9e12fe2135a9b34a0e300c456ed7ad6430229404eb5" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:610f600c102386e581327d5efc18c0d6edecb9820b4140d26163354a99cd800d" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:cb9a8ba8137ab24e36bf1742cb79a1294bd374db570f09fc15a5e1318160db4e" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_amd64.whl", hash = "sha256:2be20b2c05a0cce10430cc25f32b689259640d273232b2de357c35729132256d" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_arm64.whl", hash = "sha256:99fc421a5d234580e45957a7b02effbf3e1c884a5dd077afc85352c77bf41434" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-linux_s390x.whl", hash = "sha256:8b5882276633cf91fe3d2d7246c743b94d44a7e660b27f1308007fdb1bb89f7d" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:a5064b5e23772c8d164068cc7c12e01a75faf7b948ecd95a0d4007d7487e5f25" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:8f81dedb4c6076ec325acc3b47525f9c550e5284a18eae1d9061c543f7b6e7de" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_amd64.whl", hash = "sha256:e1ee1b2346ade3ea90306dfbec7e8ff17bc220d344109d189ae09078333b0856" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_arm64.whl", hash = "sha256:64c187345509f2b1bb334feed4666e2c781ca381874bde589182f81247e61f88" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:af81283ac671f434b1b25c95ba295f270e72db1fad48831eb5e4748ff9840041" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:a9dbb6f64f63258bc811e2c0c99640a81e5af93c531ad96e95c5ec777ea46dab" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-win_amd64.whl", hash = "sha256:6d93a7165419bc4b2b907e859ccab0dea5deeab261448ae9a5ec5431f14c0e64" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-linux_s390x.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_aarch64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_amd64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_arm64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-linux_s390x.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_aarch64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_amd64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_arm64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_aarch64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_x86_64.whl" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-win_amd64.whl" },
 ]

 [[package]]