BREAKING CHANGE: Migrate Vector DBs to vector store ID

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-12-17 15:12:35 +00:00 · 2025-08-25 23:58:01 -04:00 · 2025-08-25 23:58:01 -04:00 · 432ec7d20c
commit 432ec7d20c
parent cffc4edf47
49 changed files with 2325 additions and 466 deletions
--- a/.github/workflows/pre-commit.yml
+++ b/.github/workflows/pre-commit.yml
@ -37,7 +37,7 @@ jobs:
            .pre-commit-config.yaml
      - name: Set up Node.js
-        uses: actions/setup-node@39370e3970a6d050c480ffad4ff0ed4d3fdee5af # v4.1.0
+        uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4.4.0
        with:
          node-version: '20'
          cache: 'npm'
--- a/.github/workflows/python-build-test.yml
+++ b/.github/workflows/python-build-test.yml
@ -24,7 +24,7 @@ jobs:
      uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
    - name: Install uv
-      uses: astral-sh/setup-uv@d9e0f98d3fc6adb07d1e3d37f3043649ddad06a1 # v6.5.0
+      uses: astral-sh/setup-uv@4959332f0f014c5280e7eac8b70c90cb574c9f9b # v6.6.0
      with:
        python-version: ${{ matrix.python-version }}
        activate-environment: true
--- a/.github/workflows/semantic-pr.yml
+++ b/.github/workflows/semantic-pr.yml
@ -22,6 +22,6 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Check PR Title's semantic conformance
-        uses: amannn/action-semantic-pull-request@7f33ba792281b034f64e96f4c0b5496782dd3b37 # v6.1.0
+        uses: amannn/action-semantic-pull-request@48f256284bd46cdaab1048c3721360e808335d50 # v6.1.1
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/docs/source/advanced_apis/evaluation_concepts.md
+++ b/docs/source/advanced_apis/evaluation_concepts.md
@ -33,7 +33,7 @@ The list of open-benchmarks we currently support:
 - [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
-You can follow this [contributing guide](https://llama-stack.readthedocs.io/en/latest/references/evals_reference/index.html#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
+You can follow this [contributing guide](../references/evals_reference/index.md#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
 #### Run evaluation on open-benchmarks via CLI
--- a/docs/source/advanced_apis/post_training/inline_huggingface.md
+++ b/docs/source/advanced_apis/post_training/inline_huggingface.md
@ -35,3 +35,6 @@ device: cpu
 ```
 [Find more detailed information here!](huggingface.md)
--- a/docs/source/advanced_apis/post_training/inline_torchtune.md
+++ b/docs/source/advanced_apis/post_training/inline_torchtune.md
@ -22,3 +22,4 @@ checkpoint_format: meta
 ```
 [Find more detailed information here!](torchtune.md)
--- a/docs/source/building_applications/playground/index.md
+++ b/docs/source/building_applications/playground/index.md
@ -88,7 +88,7 @@ Interactive pages for users to play with and explore Llama Stack API capabilitie
 - **API Resources**: Inspect Llama Stack API resources
  - This page allows you to inspect Llama Stack API resources (`models`, `datasets`, `memory_banks`, `benchmarks`, `shields`).
  - Under the hood, it uses Llama Stack's `/<resources>/list` API to get information about each resources.
-  - Please visit [Core Concepts](https://llama-stack.readthedocs.io/en/latest/concepts/index.html) for more details about the resources.
+  - Please visit [Core Concepts](../../concepts/index.md) for more details about the resources.
 ### Starting the Llama Stack Playground
--- a/docs/source/building_applications/responses_vs_agents.md
+++ b/docs/source/building_applications/responses_vs_agents.md
@ -3,7 +3,7 @@
 Llama Stack (LLS) provides two different APIs for building AI applications with tool calling capabilities: the **Agents API** and the **OpenAI Responses API**. While both enable AI systems to use tools, and maintain full conversation history, they serve different use cases and have distinct characteristics.
 ```{note}
-For simple and basic inferencing, you may want to use the [Chat Completions API](https://llama-stack.readthedocs.io/en/latest/providers/index.html#chat-completions) directly, before progressing to Agents or Responses API.
+ **Note:** For simple and basic inferencing, you may want to use the [Chat Completions API](../providers/openai.md#chat-completions) directly, before progressing to Agents or Responses API.
 ```
 ## Overview
@ -173,7 +173,7 @@ Both APIs demonstrate distinct strengths that make them valuable on their own fo
 ## For More Information
- **LLS Agents API**: For detailed information on creating and managing agents, see the [Agents documentation](https://llama-stack.readthedocs.io/en/latest/building_applications/agent.html)
+- **LLS Agents API**: For detailed information on creating and managing agents, see the [Agents documentation](agent.md)
 - **OpenAI Responses API**: For information on using the OpenAI-compatible responses API, see the [OpenAI API documentation](https://platform.openai.com/docs/api-reference/responses)
- **Chat Completions API**: For the default backend API used by Agents, see the [Chat Completions providers documentation](https://llama-stack.readthedocs.io/en/latest/providers/index.html#chat-completions)
+- **Chat Completions API**: For the default backend API used by Agents, see the [Chat Completions providers documentation](../providers/openai.md#chat-completions)
- **Agent Execution Loop**: For understanding how agents process turns and steps in their execution, see the [Agent Execution Loop documentation](https://llama-stack.readthedocs.io/en/latest/building_applications/agent_execution_loop.html)
+- **Agent Execution Loop**: For understanding how agents process turns and steps in their execution, see the [Agent Execution Loop documentation](agent_execution_loop.md)
--- a/docs/source/concepts/distributions.md
+++ b/docs/source/concepts/distributions.md
@ -6,4 +6,4 @@ While there is a lot of flexibility to mix-and-match providers, often users will
 **Locally Hosted Distro**: You may want to run Llama Stack on your own hardware. Typically though, you still need to use Inference via an external service. You can use providers like HuggingFace TGI, Fireworks, Together, etc. for this purpose. Or you may have access to GPUs and can run a [vLLM](https://github.com/vllm-project/vllm) or [NVIDIA NIM](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere&q=llama) instance. If you "just" have a regular desktop machine, you can use [Ollama](https://ollama.com/) for inference. To provide convenient quick access to these options, we provide a number of such pre-configured locally-hosted Distros.
-**On-device Distro**: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/ios_sdk.html) and [Android](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/android_sdk.html)
+**On-device Distro**: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](../distributions/ondevice_distro/ios_sdk.md) and [Android](../distributions/ondevice_distro/android_sdk.md)
--- a/docs/source/contributing/new_api_provider.md
+++ b/docs/source/contributing/new_api_provider.md
@ -14,6 +14,13 @@ Here are some example PRs to help you get started:
   - [Nvidia Inference Implementation](https://github.com/meta-llama/llama-stack/pull/355)
   - [Model context protocol Tool Runtime](https://github.com/meta-llama/llama-stack/pull/665)
 ## Guidelines for creating Internal or External Providers
 |**Type** |Internal (In-tree) |External (out-of-tree)
 |---------|-------------------|---------------------|
 |**Description** |A provider that is directly in the Llama Stack code|A provider that is outside of the Llama stack core codebase but is still accessible and usable by Llama Stack.
 |**Benefits** |Ability to interact with the provider with minimal additional configurations or installations| Contributors do not have to add directly to the code to create providers accessible on Llama Stack. Keep provider-specific code separate from the core Llama Stack code.
 ## Inference Provider Patterns
 When implementing Inference providers for OpenAI-compatible APIs, Llama Stack provides several mixin classes to simplify development and ensure consistent behavior across providers.
--- a/docs/source/distributions/importing_as_library.md
+++ b/docs/source/distributions/importing_as_library.md
@ -27,7 +27,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
 response = client.models.list()
 ```
-If you've created a [custom distribution](https://llama-stack.readthedocs.io/en/latest/distributions/building_distro.html), you can also use the run.yaml configuration file directly:
+If you've created a [custom distribution](building_distro.md), you can also use the run.yaml configuration file directly:
 ```python
 client = LlamaStackAsLibraryClient(config_path)
--- a/docs/source/distributions/k8s/apply.sh
+++ b/docs/source/distributions/k8s/apply.sh
@ -22,17 +22,17 @@ else
 fi
 if [ -z "${GITHUB_CLIENT_ID:-}" ]; then
-  echo "ERROR: GITHUB_CLIENT_ID not set. You need it for Github login to work. Refer to https://llama-stack.readthedocs.io/en/latest/deploying/index.html#kubernetes-deployment-guide"
+  echo "ERROR: GITHUB_CLIENT_ID not set. You need it for Github login to work. See the Kubernetes Deployment Guide in the Llama Stack documentation."
  exit 1
 fi
 if [ -z "${GITHUB_CLIENT_SECRET:-}" ]; then
-  echo "ERROR: GITHUB_CLIENT_SECRET not set. You need it for Github login to work. Refer to https://llama-stack.readthedocs.io/en/latest/deploying/index.html#kubernetes-deployment-guide"
+  echo "ERROR: GITHUB_CLIENT_SECRET not set. You need it for Github login to work. See the Kubernetes Deployment Guide in the Llama Stack documentation."
  exit 1
 fi
 if [ -z "${LLAMA_STACK_UI_URL:-}" ]; then
-  echo "ERROR: LLAMA_STACK_UI_URL not set. Should be set to the external URL of the UI (excluding port). You need it for Github login to work. Refer to https://llama-stack.readthedocs.io/en/latest/deploying/index.html#kubernetes-deployment-guide"
+  echo "ERROR: LLAMA_STACK_UI_URL not set. Should be set to the external URL of the UI (excluding port). You need it for Github login to work. See the Kubernetes Deployment Guide in the Llama Stack documentation."
  exit 1
 fi
--- a/docs/source/distributions/ondevice_distro/android_sdk.md
+++ b/docs/source/distributions/ondevice_distro/android_sdk.md
@ -66,7 +66,7 @@ llama stack run starter --port 5050
 Ensure the Llama Stack server version is the same as the Kotlin SDK Library for maximum compatibility.
-Other inference providers: [Table](https://llama-stack.readthedocs.io/en/latest/index.html#supported-llama-stack-implementations)
+Other inference providers: [Table](../../index.md#supported-llama-stack-implementations)
 How to set remote localhost in Demo App: [Settings](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app#settings)
--- a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
+++ b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
@ -2,7 +2,7 @@
 orphan: true
 ---
 <!-- This file was auto-generated by distro_codegen.py, please edit source -->
-# Meta Reference Distribution
+# Meta Reference GPU Distribution
 ```{toctree}
 :maxdepth: 2
@ -41,7 +41,7 @@ The following environment variables can be configured:
 ## Prerequisite: Downloading Models
-Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
 ```
 $ llama model list --downloaded
--- a/docs/source/providers/post_training/index.md
+++ b/docs/source/providers/post_training/index.md
@ -9,7 +9,6 @@ This section contains documentation for all available providers for the **post_t
 ```{toctree}
 :maxdepth: 1
 inline_huggingface-cpu
 inline_huggingface-gpu
 inline_torchtune-cpu
 inline_torchtune-gpu
--- a/docs/source/references/evals_reference/index.md
+++ b/docs/source/references/evals_reference/index.md
@ -202,7 +202,7 @@ pprint(response)
 Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets.
-In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](https://llama-stack.readthedocs.io/en/latest/playground/index.html) for an interactive interface to upload datasets and run scorings.
+In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../../building_applications/playground/index.md) for an interactive interface to upload datasets and run scorings.
 ```python
 judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8"
--- a/llama_stack/core/build.py
+++ b/llama_stack/core/build.py
@ -80,7 +80,7 @@ def get_provider_dependencies(
    normal_deps = []
    special_deps = []
    for package in deps:
-        if "--no-deps" in package or "--index-url" in package:
+        if any(f in package for f in ["--no-deps", "--index-url", "--extra-index-url"]):
            special_deps.append(package)
        else:
            normal_deps.append(package)
--- a/llama_stack/core/routing_tables/vector_dbs.py
+++ b/llama_stack/core/routing_tables/vector_dbs.py
@ -52,7 +52,6 @@ class VectorDBsRoutingTable(CommonRoutingTableImpl, VectorDBs):
        provider_vector_db_id: str | None = None,
        vector_db_name: str | None = None,
    ) -> VectorDB:
        provider_vector_db_id = provider_vector_db_id or vector_db_id
        if provider_id is None:
            if len(self.impls_by_provider_id) > 0:
                provider_id = list(self.impls_by_provider_id.keys())[0]
@ -69,14 +68,33 @@ class VectorDBsRoutingTable(CommonRoutingTableImpl, VectorDBs):
            raise ModelTypeError(embedding_model, model.model_type, ModelType.embedding)
        if "embedding_dimension" not in model.metadata:
            raise ValueError(f"Model {embedding_model} does not have an embedding dimension")
        provider = self.impls_by_provider_id[provider_id]
        logger.warning(
            "VectorDB is being deprecated in future releases in favor of VectorStore. Please migrate your usage accordingly."
        )
        vector_store = await provider.openai_create_vector_store(
            name=vector_db_name or vector_db_id,
            embedding_model=embedding_model,
            embedding_dimension=model.metadata["embedding_dimension"],
            provider_id=provider_id,
            provider_vector_db_id=provider_vector_db_id,
        )
        vector_store_id = vector_store.id
        actual_provider_vector_db_id = provider_vector_db_id or vector_store_id
        logger.warning(
            f"Ignoring vector_db_id {vector_db_id} and using vector_store_id {vector_store_id} instead. Setting VectorDB {vector_db_id} to VectorDB.vector_db_name"
        )
        vector_db_data = {
-            "identifier": vector_db_id,
+            "identifier": vector_store_id,
            "type": ResourceType.vector_db.value,
            "provider_id": provider_id,
-            "provider_resource_id": provider_vector_db_id,
+            "provider_resource_id": actual_provider_vector_db_id,
            "embedding_model": embedding_model,
            "embedding_dimension": model.metadata["embedding_dimension"],
-            "vector_db_name": vector_db_name,
+            "vector_db_name": vector_store.name,
        }
        vector_db = TypeAdapter(VectorDBWithOwner).validate_python(vector_db_data)
        await self.register_object(vector_db)
--- a/llama_stack/core/stack.py
+++ b/llama_stack/core/stack.py
@ -225,7 +225,10 @@ def replace_env_vars(config: Any, path: str = "") -> Any:
        try:
            result = re.sub(pattern, get_env_var, config)
            # Only apply type conversion if substitution actually happened
            if result != config:
                return _convert_string_to_proper_type(result)
            return result
        except EnvVarError as e:
            raise EnvVarError(e.var_name, e.path) from None
--- a/llama_stack/distributions/ci-tests/build.yaml
+++ b/llama_stack/distributions/ci-tests/build.yaml
@ -34,7 +34,7 @@ distribution_spec:
    telemetry:
    - provider_type: inline::meta-reference
    post_training:
-    - provider_type: inline::huggingface-cpu
+    - provider_type: inline::torchtune-cpu
    eval:
    - provider_type: inline::meta-reference
    datasetio:
--- a/llama_stack/distributions/ci-tests/run.yaml
+++ b/llama_stack/distributions/ci-tests/run.yaml
@ -156,13 +156,10 @@ providers:
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ci-tests}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  post_training:
-  - provider_id: huggingface-cpu
+  - provider_id: torchtune-cpu
-    provider_type: inline::huggingface-cpu
+    provider_type: inline::torchtune-cpu
    config:
-      checkpoint_format: huggingface
+      checkpoint_format: meta
      distributed_backend: null
      device: cpu
      dpo_output_dir: ~/.llama/distributions/ci-tests/dpo_output
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/distributions/meta-reference-gpu/doc_template.md
+++ b/llama_stack/distributions/meta-reference-gpu/doc_template.md
@ -1,7 +1,7 @@
 ---
 orphan: true
 ---
-# Meta Reference Distribution
+# Meta Reference GPU Distribution
 ```{toctree}
 :maxdepth: 2
@ -29,7 +29,7 @@ The following environment variables can be configured:
 ## Prerequisite: Downloading Models
-Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
+Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
 ```
 $ llama model list --downloaded
--- a/llama_stack/distributions/starter-gpu/build.yaml
+++ b/llama_stack/distributions/starter-gpu/build.yaml
@ -35,7 +35,7 @@ distribution_spec:
    telemetry:
    - provider_type: inline::meta-reference
    post_training:
-    - provider_type: inline::torchtune-gpu
+    - provider_type: inline::huggingface-gpu
    eval:
    - provider_type: inline::meta-reference
    datasetio:
--- a/llama_stack/distributions/starter-gpu/run.yaml
+++ b/llama_stack/distributions/starter-gpu/run.yaml
@ -156,10 +156,13 @@ providers:
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter-gpu}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  post_training:
-  - provider_id: torchtune-gpu
+  - provider_id: huggingface-gpu
-    provider_type: inline::torchtune-gpu
+    provider_type: inline::huggingface-gpu
    config:
-      checkpoint_format: meta
+      checkpoint_format: huggingface
      distributed_backend: null
      device: cpu
      dpo_output_dir: ~/.llama/distributions/starter-gpu/dpo_output
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/distributions/starter-gpu/starter_gpu.py
+++ b/llama_stack/distributions/starter-gpu/starter_gpu.py
@ -17,6 +17,6 @@ def get_distribution_template() -> DistributionTemplate:
    template.description = "Quick start template for running Llama Stack with several popular providers. This distribution is intended for GPU-enabled environments."
    template.providers["post_training"] = [
-        BuildProvider(provider_type="inline::torchtune-gpu"),
+        BuildProvider(provider_type="inline::huggingface-gpu"),
    ]
    return template
--- a/llama_stack/distributions/starter/build.yaml
+++ b/llama_stack/distributions/starter/build.yaml
@ -35,7 +35,7 @@ distribution_spec:
    telemetry:
    - provider_type: inline::meta-reference
    post_training:
-    - provider_type: inline::huggingface-cpu
+    - provider_type: inline::torchtune-cpu
    eval:
    - provider_type: inline::meta-reference
    datasetio:
--- a/llama_stack/distributions/starter/run.yaml
+++ b/llama_stack/distributions/starter/run.yaml
@ -156,13 +156,10 @@ providers:
      sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/trace_store.db
      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
  post_training:
-  - provider_id: huggingface-cpu
+  - provider_id: torchtune-cpu
-    provider_type: inline::huggingface-cpu
+    provider_type: inline::torchtune-cpu
    config:
-      checkpoint_format: huggingface
+      checkpoint_format: meta
      distributed_backend: null
      device: cpu
      dpo_output_dir: ~/.llama/distributions/starter/dpo_output
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/distributions/starter/starter.py
+++ b/llama_stack/distributions/starter/starter.py
@ -120,7 +120,7 @@ def get_distribution_template() -> DistributionTemplate:
        ],
        "agents": [BuildProvider(provider_type="inline::meta-reference")],
        "telemetry": [BuildProvider(provider_type="inline::meta-reference")],
-        "post_training": [BuildProvider(provider_type="inline::huggingface-cpu")],
+        "post_training": [BuildProvider(provider_type="inline::torchtune-cpu")],
        "eval": [BuildProvider(provider_type="inline::meta-reference")],
        "datasetio": [
            BuildProvider(provider_type="remote::huggingface"),
--- a/llama_stack/providers/registry/inference.py
+++ b/llama_stack/providers/registry/inference.py
@ -40,8 +40,9 @@ def available_providers() -> list[ProviderSpec]:
        InlineProviderSpec(
            api=Api.inference,
            provider_type="inline::sentence-transformers",
            # CrossEncoder depends on torchao.quantization
            pip_packages=[
-                "torch torchvision --index-url https://download.pytorch.org/whl/cpu",
+                "torch torchvision torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu",
                "sentence-transformers --no-deps",
            ],
            module="llama_stack.providers.inline.inference.sentence_transformers",
--- a/llama_stack/providers/registry/post_training.py
+++ b/llama_stack/providers/registry/post_training.py
@ -13,7 +13,7 @@ from llama_stack.providers.datatypes import AdapterSpec, Api, InlineProviderSpec
 # The CPU version is used for distributions that don't have GPU support -- they result in smaller container images.
 torchtune_def = dict(
    api=Api.post_training,
-    pip_packages=["torchtune==0.5.0", "torchao==0.8.0", "numpy"],
+    pip_packages=["numpy"],
    module="llama_stack.providers.inline.post_training.torchtune",
    config_class="llama_stack.providers.inline.post_training.torchtune.TorchtunePostTrainingConfig",
    api_dependencies=[
@ -23,9 +23,32 @@ torchtune_def = dict(
    description="TorchTune-based post-training provider for fine-tuning and optimizing models using Meta's TorchTune framework.",
 )
-huggingface_def = dict(
+
 def available_providers() -> list[ProviderSpec]:
    return [
        InlineProviderSpec(
            **{  # type: ignore
                **torchtune_def,
                "provider_type": "inline::torchtune-cpu",
                "pip_packages": (
                    cast(list[str], torchtune_def["pip_packages"])
                    + ["torch torchtune>=0.5.0 torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu"]
                ),
            },
        ),
        InlineProviderSpec(
            **{  # type: ignore
                **torchtune_def,
                "provider_type": "inline::torchtune-gpu",
                "pip_packages": (
                    cast(list[str], torchtune_def["pip_packages"]) + ["torch torchtune>=0.5.0 torchao>=0.12.0"]
                ),
            },
        ),
        InlineProviderSpec(
            api=Api.post_training,
-    pip_packages=["trl", "transformers", "peft", "datasets"],
+            provider_type="inline::huggingface-gpu",
            pip_packages=["trl", "transformers", "peft", "datasets", "torch"],
            module="llama_stack.providers.inline.post_training.huggingface",
            config_class="llama_stack.providers.inline.post_training.huggingface.HuggingFacePostTrainingConfig",
            api_dependencies=[
@ -33,46 +56,6 @@ huggingface_def = dict(
                Api.datasets,
            ],
            description="HuggingFace-based post-training provider for fine-tuning models using the HuggingFace ecosystem.",
 )
 def available_providers() -> list[ProviderSpec]:
    return [
        InlineProviderSpec(
            **{
                **torchtune_def,
                "provider_type": "inline::torchtune-cpu",
                "pip_packages": (
                    cast(list[str], torchtune_def["pip_packages"])
                    + ["torch torchtune==0.5.0 torchao==0.8.0 --index-url https://download.pytorch.org/whl/cpu"]
                ),
            },
        ),
        InlineProviderSpec(
            **{
                **huggingface_def,
                "provider_type": "inline::huggingface-cpu",
                "pip_packages": (
                    cast(list[str], huggingface_def["pip_packages"])
                    + ["torch --index-url https://download.pytorch.org/whl/cpu"]
                ),
            },
        ),
        InlineProviderSpec(
            **{
                **torchtune_def,
                "provider_type": "inline::torchtune-gpu",
                "pip_packages": (
                    cast(list[str], torchtune_def["pip_packages"]) + ["torch torchtune==0.5.0 torchao==0.8.0"]
                ),
            },
        ),
        InlineProviderSpec(
            **{
                **huggingface_def,
                "provider_type": "inline::huggingface-gpu",
                "pip_packages": (cast(list[str], huggingface_def["pip_packages"]) + ["torch"]),
            },
        ),
        remote_provider_spec(
            api=Api.post_training,
--- a/llama_stack/testing/inference_recorder.py
+++ b/llama_stack/testing/inference_recorder.py
@ -9,7 +9,6 @@ from __future__ import annotations  # for forward references
 import hashlib
 import json
 import os
 import sqlite3
 from collections.abc import Generator
 from contextlib import contextmanager
 from enum import StrEnum
@ -125,28 +124,13 @@ class ResponseStorage:
    def __init__(self, test_dir: Path):
        self.test_dir = test_dir
        self.responses_dir = self.test_dir / "responses"
        self.db_path = self.test_dir / "index.sqlite"
        self._ensure_directories()
        self._init_database()
    def _ensure_directories(self):
        self.test_dir.mkdir(parents=True, exist_ok=True)
        self.responses_dir.mkdir(exist_ok=True)
    def _init_database(self):
        with sqlite3.connect(self.db_path) as conn:
            conn.execute("""
                CREATE TABLE IF NOT EXISTS recordings (
                    request_hash TEXT PRIMARY KEY,
                    response_file TEXT,
                    endpoint TEXT,
                    model TEXT,
                    timestamp TEXT,
                    is_streaming BOOLEAN
                )
            """)
    def store_recording(self, request_hash: str, request: dict[str, Any], response: dict[str, Any]):
        """Store a request/response pair."""
        # Generate unique response filename
@ -169,34 +153,9 @@ class ResponseStorage:
            f.write("\n")
            f.flush()
        # Update SQLite index
        with sqlite3.connect(self.db_path) as conn:
            conn.execute(
                """
                INSERT OR REPLACE INTO recordings
                (request_hash, response_file, endpoint, model, timestamp, is_streaming)
                VALUES (?, ?, ?, ?, datetime('now'), ?)
            """,
                (
                    request_hash,
                    response_file,
                    request.get("endpoint", ""),
                    request.get("model", ""),
                    response.get("is_streaming", False),
                ),
            )
    def find_recording(self, request_hash: str) -> dict[str, Any] | None:
        """Find a recorded response by request hash."""
-        with sqlite3.connect(self.db_path) as conn:
+        response_file = f"{request_hash[:12]}.json"
            result = conn.execute(
                "SELECT response_file FROM recordings WHERE request_hash = ?", (request_hash,)
            ).fetchone()
        if not result:
            return None
        response_file = result[0]
        response_path = self.responses_dir / response_file
        if not response_path.exists():
--- a/llama_stack/ui/app/chat-playground/chunk-processor.test.tsx
+++ b/llama_stack/ui/app/chat-playground/chunk-processor.test.tsx
@ -0,0 +1,610 @@
 import { describe, test, expect } from "@jest/globals";
 // Extract the exact processChunk function implementation for testing
 function createProcessChunk() {
  return (chunk: unknown): { text: string | null; isToolCall: boolean } => {
    const chunkObj = chunk as Record<string, unknown>;
    // Helper function to check if content contains function call JSON
    const containsToolCall = (content: string): boolean => {
      return (
        content.includes('"type": "function"') ||
        content.includes('"name": "knowledge_search"') ||
        content.includes('"parameters":') ||
        !!content.match(/\{"type":\s*"function".*?\}/)
      );
    };
    // Check if this chunk contains a tool call (function call)
    let isToolCall = false;
    // Check direct chunk content if it's a string
    if (typeof chunk === "string") {
      isToolCall = containsToolCall(chunk);
    }
    // Check delta structures
    if (
      chunkObj?.delta &&
      typeof chunkObj.delta === "object" &&
      chunkObj.delta !== null
    ) {
      const delta = chunkObj.delta as Record<string, unknown>;
      if ("tool_calls" in delta) {
        isToolCall = true;
      }
      if (typeof delta.text === "string") {
        if (containsToolCall(delta.text)) {
          isToolCall = true;
        }
      }
    }
    // Check event structures
    if (
      chunkObj?.event &&
      typeof chunkObj.event === "object" &&
      chunkObj.event !== null
    ) {
      const event = chunkObj.event as Record<string, unknown>;
      // Check event payload
      if (
        event?.payload &&
        typeof event.payload === "object" &&
        event.payload !== null
      ) {
        const payload = event.payload as Record<string, unknown>;
        if (typeof payload.content === "string") {
          if (containsToolCall(payload.content)) {
            isToolCall = true;
          }
        }
        // Check payload delta
        if (
          payload?.delta &&
          typeof payload.delta === "object" &&
          payload.delta !== null
        ) {
          const delta = payload.delta as Record<string, unknown>;
          if (typeof delta.text === "string") {
            if (containsToolCall(delta.text)) {
              isToolCall = true;
            }
          }
        }
      }
      // Check event delta
      if (
        event?.delta &&
        typeof event.delta === "object" &&
        event.delta !== null
      ) {
        const delta = event.delta as Record<string, unknown>;
        if (typeof delta.text === "string") {
          if (containsToolCall(delta.text)) {
            isToolCall = true;
          }
        }
        if (typeof delta.content === "string") {
          if (containsToolCall(delta.content)) {
            isToolCall = true;
          }
        }
      }
    }
    // if it's a tool call, skip it (don't display in chat)
    if (isToolCall) {
      return { text: null, isToolCall: true };
    }
    // Extract text content from various chunk formats
    let text: string | null = null;
    // Helper function to extract clean text content, filtering out function calls
    const extractCleanText = (content: string): string | null => {
      if (containsToolCall(content)) {
        try {
          // Try to parse and extract non-function call parts
          const jsonMatch = content.match(
            /\{"type":\s*"function"[^}]*\}[^}]*\}/
          );
          if (jsonMatch) {
            const jsonPart = jsonMatch[0];
            const parsedJson = JSON.parse(jsonPart);
            // If it's a function call, extract text after JSON
            if (parsedJson.type === "function") {
              const textAfterJson = content
                .substring(content.indexOf(jsonPart) + jsonPart.length)
                .trim();
              return textAfterJson || null;
            }
          }
          // If we can't parse it properly, skip the whole thing
          return null;
        } catch {
          return null;
        }
      }
      return content;
    };
    // Try direct delta text
    if (
      chunkObj?.delta &&
      typeof chunkObj.delta === "object" &&
      chunkObj.delta !== null
    ) {
      const delta = chunkObj.delta as Record<string, unknown>;
      if (typeof delta.text === "string") {
        text = extractCleanText(delta.text);
      }
    }
    // Try event structures
    if (
      !text &&
      chunkObj?.event &&
      typeof chunkObj.event === "object" &&
      chunkObj.event !== null
    ) {
      const event = chunkObj.event as Record<string, unknown>;
      // Try event payload content
      if (
        event?.payload &&
        typeof event.payload === "object" &&
        event.payload !== null
      ) {
        const payload = event.payload as Record<string, unknown>;
        // Try direct payload content
        if (typeof payload.content === "string") {
          text = extractCleanText(payload.content);
        }
        // Try turn_complete event structure: payload.turn.output_message.content
        if (
          !text &&
          payload?.turn &&
          typeof payload.turn === "object" &&
          payload.turn !== null
        ) {
          const turn = payload.turn as Record<string, unknown>;
          if (
            turn?.output_message &&
            typeof turn.output_message === "object" &&
            turn.output_message !== null
          ) {
            const outputMessage = turn.output_message as Record<
              string,
              unknown
            >;
            if (typeof outputMessage.content === "string") {
              text = extractCleanText(outputMessage.content);
            }
          }
          // Fallback to model_response in steps if no output_message
          if (
            !text &&
            turn?.steps &&
            Array.isArray(turn.steps) &&
            turn.steps.length > 0
          ) {
            for (const step of turn.steps) {
              if (step && typeof step === "object" && step !== null) {
                const stepObj = step as Record<string, unknown>;
                if (
                  stepObj?.model_response &&
                  typeof stepObj.model_response === "object" &&
                  stepObj.model_response !== null
                ) {
                  const modelResponse = stepObj.model_response as Record<
                    string,
                    unknown
                  >;
                  if (typeof modelResponse.content === "string") {
                    text = extractCleanText(modelResponse.content);
                    break;
                  }
                }
              }
            }
          }
        }
        // Try payload delta
        if (
          !text &&
          payload?.delta &&
          typeof payload.delta === "object" &&
          payload.delta !== null
        ) {
          const delta = payload.delta as Record<string, unknown>;
          if (typeof delta.text === "string") {
            text = extractCleanText(delta.text);
          }
        }
      }
      // Try event delta
      if (
        !text &&
        event?.delta &&
        typeof event.delta === "object" &&
        event.delta !== null
      ) {
        const delta = event.delta as Record<string, unknown>;
        if (typeof delta.text === "string") {
          text = extractCleanText(delta.text);
        }
        if (!text && typeof delta.content === "string") {
          text = extractCleanText(delta.content);
        }
      }
    }
    // Try choices structure (ChatML format)
    if (
      !text &&
      chunkObj?.choices &&
      Array.isArray(chunkObj.choices) &&
      chunkObj.choices.length > 0
    ) {
      const choice = chunkObj.choices[0] as Record<string, unknown>;
      if (
        choice?.delta &&
        typeof choice.delta === "object" &&
        choice.delta !== null
      ) {
        const delta = choice.delta as Record<string, unknown>;
        if (typeof delta.content === "string") {
          text = extractCleanText(delta.content);
        }
      }
    }
    // Try direct string content
    if (!text && typeof chunk === "string") {
      text = extractCleanText(chunk);
    }
    return { text, isToolCall: false };
  };
 }
 describe("Chunk Processor", () => {
  const processChunk = createProcessChunk();
  describe("Real Event Structures", () => {
    test("handles turn_complete event with cancellation policy response", () => {
      const chunk = {
        event: {
          payload: {
            event_type: "turn_complete",
            turn: {
              turn_id: "50a2d6b7-49ed-4d1e-b1c2-6d68b3f726db",
              session_id: "e7f62b8e-518c-4450-82df-e65fe49f27a3",
              input_messages: [
                {
                  role: "user",
                  content: "nice, what's the cancellation policy?",
                  context: null,
                },
              ],
              steps: [
                {
                  turn_id: "50a2d6b7-49ed-4d1e-b1c2-6d68b3f726db",
                  step_id: "54074310-af42-414c-9ffe-fba5b2ead0ad",
                  started_at: "2025-08-27T18:15:25.870703Z",
                  completed_at: "2025-08-27T18:15:51.288993Z",
                  step_type: "inference",
                  model_response: {
                    role: "assistant",
                    content:
                      "According to the search results, the cancellation policy for Red Hat Summit is as follows:\n\n* Cancellations must be received by 5 PM EDT on April 18, 2025 for a 50% refund of the registration fee.\n* No refunds will be given for cancellations received after 5 PM EDT on April 18, 2025.\n* Cancellation of travel reservations and hotel reservations are the responsibility of the registrant.",
                    stop_reason: "end_of_turn",
                    tool_calls: [],
                  },
                },
              ],
              output_message: {
                role: "assistant",
                content:
                  "According to the search results, the cancellation policy for Red Hat Summit is as follows:\n\n* Cancellations must be received by 5 PM EDT on April 18, 2025 for a 50% refund of the registration fee.\n* No refunds will be given for cancellations received after 5 PM EDT on April 18, 2025.\n* Cancellation of travel reservations and hotel reservations are the responsibility of the registrant.",
                stop_reason: "end_of_turn",
                tool_calls: [],
              },
              output_attachments: [],
              started_at: "2025-08-27T18:15:25.868548Z",
              completed_at: "2025-08-27T18:15:51.289262Z",
            },
          },
        },
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toContain(
        "According to the search results, the cancellation policy for Red Hat Summit is as follows:"
      );
      expect(result.text).toContain("5 PM EDT on April 18, 2025");
    });
    test("handles turn_complete event with address response", () => {
      const chunk = {
        event: {
          payload: {
            event_type: "turn_complete",
            turn: {
              turn_id: "2f4a1520-8ecc-4cb7-bb7b-886939e042b0",
              session_id: "e7f62b8e-518c-4450-82df-e65fe49f27a3",
              input_messages: [
                {
                  role: "user",
                  content: "what's francisco's address",
                  context: null,
                },
              ],
              steps: [
                {
                  turn_id: "2f4a1520-8ecc-4cb7-bb7b-886939e042b0",
                  step_id: "c13dd277-1acb-4419-8fbf-d5e2f45392ea",
                  started_at: "2025-08-27T18:14:52.558761Z",
                  completed_at: "2025-08-27T18:15:11.306032Z",
                  step_type: "inference",
                  model_response: {
                    role: "assistant",
                    content:
                      "Francisco Arceo's address is:\n\nRed Hat\nUnited States\n17 Primrose Ln \nBasking Ridge New Jersey 07920",
                    stop_reason: "end_of_turn",
                    tool_calls: [],
                  },
                },
              ],
              output_message: {
                role: "assistant",
                content:
                  "Francisco Arceo's address is:\n\nRed Hat\nUnited States\n17 Primrose Ln \nBasking Ridge New Jersey 07920",
                stop_reason: "end_of_turn",
                tool_calls: [],
              },
              output_attachments: [],
              started_at: "2025-08-27T18:14:52.553707Z",
              completed_at: "2025-08-27T18:15:11.306729Z",
            },
          },
        },
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toContain("Francisco Arceo's address is:");
      expect(result.text).toContain("17 Primrose Ln");
      expect(result.text).toContain("Basking Ridge New Jersey 07920");
    });
    test("handles turn_complete event with ticket cost response", () => {
      const chunk = {
        event: {
          payload: {
            event_type: "turn_complete",
            turn: {
              turn_id: "7ef244a3-efee-42ca-a9c8-942865251002",
              session_id: "e7f62b8e-518c-4450-82df-e65fe49f27a3",
              input_messages: [
                {
                  role: "user",
                  content: "what was the ticket cost for summit?",
                  context: null,
                },
              ],
              steps: [
                {
                  turn_id: "7ef244a3-efee-42ca-a9c8-942865251002",
                  step_id: "7651dda0-315a-472d-b1c1-3c2725f55bc5",
                  started_at: "2025-08-27T18:14:21.710611Z",
                  completed_at: "2025-08-27T18:14:39.706452Z",
                  step_type: "inference",
                  model_response: {
                    role: "assistant",
                    content:
                      "The ticket cost for the Red Hat Summit was $999.00 for a conference pass.",
                    stop_reason: "end_of_turn",
                    tool_calls: [],
                  },
                },
              ],
              output_message: {
                role: "assistant",
                content:
                  "The ticket cost for the Red Hat Summit was $999.00 for a conference pass.",
                stop_reason: "end_of_turn",
                tool_calls: [],
              },
              output_attachments: [],
              started_at: "2025-08-27T18:14:21.705289Z",
              completed_at: "2025-08-27T18:14:39.706752Z",
            },
          },
        },
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe(
        "The ticket cost for the Red Hat Summit was $999.00 for a conference pass."
      );
    });
  });
  describe("Function Call Detection", () => {
    test("detects function calls in direct string chunks", () => {
      const chunk =
        '{"type": "function", "name": "knowledge_search", "parameters": {"query": "test"}}';
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(true);
      expect(result.text).toBe(null);
    });
    test("detects function calls in event payload content", () => {
      const chunk = {
        event: {
          payload: {
            content:
              '{"type": "function", "name": "knowledge_search", "parameters": {"query": "test"}}',
          },
        },
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(true);
      expect(result.text).toBe(null);
    });
    test("detects tool_calls in delta structure", () => {
      const chunk = {
        delta: {
          tool_calls: [{ function: { name: "knowledge_search" } }],
        },
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(true);
      expect(result.text).toBe(null);
    });
    test("detects function call in mixed content but skips it", () => {
      const chunk =
        '{"type": "function", "name": "knowledge_search", "parameters": {"query": "test"}} Based on the search results, here is your answer.';
      const result = processChunk(chunk);
      // This is detected as a tool call and skipped entirely - the implementation prioritizes safety
      expect(result.isToolCall).toBe(true);
      expect(result.text).toBe(null);
    });
  });
  describe("Text Extraction", () => {
    test("extracts text from direct string chunks", () => {
      const chunk = "Hello, this is a normal response.";
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe("Hello, this is a normal response.");
    });
    test("extracts text from delta structure", () => {
      const chunk = {
        delta: {
          text: "Hello, this is a normal response.",
        },
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe("Hello, this is a normal response.");
    });
    test("extracts text from choices structure", () => {
      const chunk = {
        choices: [
          {
            delta: {
              content: "Hello, this is a normal response.",
            },
          },
        ],
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe("Hello, this is a normal response.");
    });
    test("prioritizes output_message over model_response in turn structure", () => {
      const chunk = {
        event: {
          payload: {
            turn: {
              steps: [
                {
                  model_response: {
                    content: "Model response content.",
                  },
                },
              ],
              output_message: {
                content: "Final output message content.",
              },
            },
          },
        },
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe("Final output message content.");
    });
    test("falls back to model_response when no output_message", () => {
      const chunk = {
        event: {
          payload: {
            turn: {
              steps: [
                {
                  model_response: {
                    content: "This is from the model response.",
                  },
                },
              ],
            },
          },
        },
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe("This is from the model response.");
    });
  });
  describe("Edge Cases", () => {
    test("handles empty chunks", () => {
      const result = processChunk("");
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe("");
    });
    test("handles null chunks", () => {
      const result = processChunk(null);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe(null);
    });
    test("handles undefined chunks", () => {
      const result = processChunk(undefined);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe(null);
    });
    test("handles chunks with no text content", () => {
      const chunk = {
        event: {
          metadata: {
            timestamp: "2024-01-01",
          },
        },
      };
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(false);
      expect(result.text).toBe(null);
    });
    test("handles malformed JSON in function calls gracefully", () => {
      const chunk =
        '{"type": "function", "name": "knowledge_search"} incomplete json';
      const result = processChunk(chunk);
      expect(result.isToolCall).toBe(true);
      expect(result.text).toBe(null);
    });
  });
 });
--- a/llama_stack/ui/app/chat-playground/page.test.tsx
+++ b/llama_stack/ui/app/chat-playground/page.test.tsx
@ -31,6 +31,9 @@ const mockClient = {
  toolgroups: {
    list: jest.fn(),
  },
  vectorDBs: {
    list: jest.fn(),
  },
 };
 jest.mock("@/hooks/use-auth-client", () => ({
@ -164,7 +167,7 @@ describe("ChatPlaygroundPage", () => {
      session_name: "Test Session",
      started_at: new Date().toISOString(),
      turns: [],
-    }); // No turns by default
+    });
    mockClient.agents.retrieve.mockResolvedValue({
      agent_id: "test-agent",
      agent_config: {
@ -417,7 +420,6 @@ describe("ChatPlaygroundPage", () => {
      });
      await waitFor(() => {
        // first agent should be auto-selected
        expect(mockClient.agents.session.create).toHaveBeenCalledWith(
          "agent_123",
          { session_name: "Default Session" }
@ -464,7 +466,7 @@ describe("ChatPlaygroundPage", () => {
      });
    });
-    test("hides delete button when only one agent exists", async () => {
+    test("shows delete button even when only one agent exists", async () => {
      mockClient.agents.list.mockResolvedValue({
        data: [mockAgents[0]],
      });
@ -474,9 +476,7 @@ describe("ChatPlaygroundPage", () => {
      });
      await waitFor(() => {
-        expect(
+        expect(screen.getByTitle("Delete current agent")).toBeInTheDocument();
          screen.queryByTitle("Delete current agent")
        ).not.toBeInTheDocument();
      });
    });
@ -505,7 +505,7 @@ describe("ChatPlaygroundPage", () => {
      await waitFor(() => {
        expect(mockClient.agents.delete).toHaveBeenCalledWith("agent_123");
        expect(global.confirm).toHaveBeenCalledWith(
-          "Are you sure you want to delete this agent? This action cannot be undone and will delete all associated sessions."
+          "Are you sure you want to delete this agent? This action cannot be undone and will delete the agent and all its sessions."
        );
      });
@ -584,4 +584,207 @@ describe("ChatPlaygroundPage", () => {
      consoleSpy.mockRestore();
    });
  });
  describe("RAG File Upload", () => {
    let mockFileReader: {
      readAsDataURL: jest.Mock;
      readAsText: jest.Mock;
      result: string | null;
      onload: (() => void) | null;
      onerror: (() => void) | null;
    };
    let mockRAGTool: {
      insert: jest.Mock;
    };
    beforeEach(() => {
      mockFileReader = {
        readAsDataURL: jest.fn(),
        readAsText: jest.fn(),
        result: null,
        onload: null,
        onerror: null,
      };
      global.FileReader = jest.fn(() => mockFileReader);
      mockRAGTool = {
        insert: jest.fn().mockResolvedValue({}),
      };
      mockClient.toolRuntime = {
        ragTool: mockRAGTool,
      };
    });
    afterEach(() => {
      jest.clearAllMocks();
    });
    test("handles text file upload", async () => {
      new File(["Hello, world!"], "test.txt", {
        type: "text/plain",
      });
      mockClient.agents.retrieve.mockResolvedValue({
        agent_id: "test-agent",
        agent_config: {
          toolgroups: [
            {
              name: "builtin::rag/knowledge_search",
              args: { vector_db_ids: ["test-vector-db"] },
            },
          ],
        },
      });
      await act(async () => {
        render(<ChatPlaygroundPage />);
      });
      await waitFor(() => {
        expect(screen.getByTestId("chat-component")).toBeInTheDocument();
      });
      const chatComponent = screen.getByTestId("chat-component");
      chatComponent.getAttribute("data-onragfileupload");
      // this is a simplified test
      expect(mockRAGTool.insert).not.toHaveBeenCalled();
    });
    test("handles PDF file upload with FileReader", async () => {
      new File([new ArrayBuffer(1000)], "test.pdf", {
        type: "application/pdf",
      });
      const mockDataURL = "data:application/pdf;base64,JVBERi0xLjQK";
      mockFileReader.result = mockDataURL;
      mockClient.agents.retrieve.mockResolvedValue({
        agent_id: "test-agent",
        agent_config: {
          toolgroups: [
            {
              name: "builtin::rag/knowledge_search",
              args: { vector_db_ids: ["test-vector-db"] },
            },
          ],
        },
      });
      await act(async () => {
        render(<ChatPlaygroundPage />);
      });
      await waitFor(() => {
        expect(screen.getByTestId("chat-component")).toBeInTheDocument();
      });
      expect(global.FileReader).toBeDefined();
    });
    test("handles different file types correctly", () => {
      const getContentType = (filename: string): string => {
        const ext = filename.toLowerCase().split(".").pop();
        switch (ext) {
          case "pdf":
            return "application/pdf";
          case "txt":
            return "text/plain";
          case "md":
            return "text/markdown";
          case "html":
            return "text/html";
          case "csv":
            return "text/csv";
          case "json":
            return "application/json";
          case "docx":
            return "application/vnd.openxmlformats-officedocument.wordprocessingml.document";
          case "doc":
            return "application/msword";
          default:
            return "application/octet-stream";
        }
      };
      expect(getContentType("test.pdf")).toBe("application/pdf");
      expect(getContentType("test.txt")).toBe("text/plain");
      expect(getContentType("test.md")).toBe("text/markdown");
      expect(getContentType("test.html")).toBe("text/html");
      expect(getContentType("test.csv")).toBe("text/csv");
      expect(getContentType("test.json")).toBe("application/json");
      expect(getContentType("test.docx")).toBe(
        "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
      );
      expect(getContentType("test.doc")).toBe("application/msword");
      expect(getContentType("test.unknown")).toBe("application/octet-stream");
    });
    test("determines text vs binary file types correctly", () => {
      const isTextFile = (mimeType: string): boolean => {
        return (
          mimeType.startsWith("text/") ||
          mimeType === "application/json" ||
          mimeType === "text/markdown" ||
          mimeType === "text/html" ||
          mimeType === "text/csv"
        );
      };
      expect(isTextFile("text/plain")).toBe(true);
      expect(isTextFile("text/markdown")).toBe(true);
      expect(isTextFile("text/html")).toBe(true);
      expect(isTextFile("text/csv")).toBe(true);
      expect(isTextFile("application/json")).toBe(true);
      expect(isTextFile("application/pdf")).toBe(false);
      expect(isTextFile("application/msword")).toBe(false);
      expect(
        isTextFile(
          "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
        )
      ).toBe(false);
      expect(isTextFile("application/octet-stream")).toBe(false);
    });
    test("handles FileReader error gracefully", async () => {
      const pdfFile = new File([new ArrayBuffer(1000)], "test.pdf", {
        type: "application/pdf",
      });
      mockFileReader.onerror = jest.fn();
      const mockError = new Error("FileReader failed");
      const fileReaderPromise = new Promise<string>((resolve, reject) => {
        const reader = new FileReader();
        reader.onload = () => resolve(reader.result as string);
        reader.onerror = () => reject(reader.error || mockError);
        reader.readAsDataURL(pdfFile);
        setTimeout(() => {
          reader.onerror?.(new ProgressEvent("error"));
        }, 0);
      });
      await expect(fileReaderPromise).rejects.toBeDefined();
    });
    test("handles large file upload with FileReader approach", () => {
      // create a large file
      const largeFile = new File(
        [new ArrayBuffer(10 * 1024 * 1024)],
        "large.pdf",
        {
          type: "application/pdf",
        }
      );
      expect(largeFile.size).toBe(10 * 1024 * 1024); // 10MB
      expect(global.FileReader).toBeDefined();
      const reader = new FileReader();
      expect(reader.readAsDataURL).toBeDefined();
    });
  });
 });
--- a/llama_stack/ui/app/chat-playground/page.tsx
+++ b/llama_stack/ui/app/chat-playground/page.tsx
--- a/llama_stack/ui/components/chat-playground/chat.tsx
+++ b/llama_stack/ui/components/chat-playground/chat.tsx
@ -35,6 +35,7 @@ interface ChatPropsBase {
  ) => void;
  setMessages?: (messages: Message[]) => void;
  transcribeAudio?: (blob: Blob) => Promise<string>;
  onRAGFileUpload?: (file: File) => Promise<void>;
 }
 interface ChatPropsWithoutSuggestions extends ChatPropsBase {
@ -62,6 +63,7 @@ export function Chat({
  onRateResponse,
  setMessages,
  transcribeAudio,
  onRAGFileUpload,
 }: ChatProps) {
  const lastMessage = messages.at(-1);
  const isEmpty = messages.length === 0;
@ -226,16 +228,17 @@ export function Chat({
            isPending={isGenerating || isTyping}
            handleSubmit={handleSubmit}
          >
-            {({ files, setFiles }) => (
+            {() => (
              <MessageInput
                value={input}
                onChange={handleInputChange}
-                allowAttachments
+                allowAttachments={true}
-                files={files}
+                files={null}
-                setFiles={setFiles}
+                setFiles={() => {}}
                stop={handleStop}
                isGenerating={isGenerating}
                transcribeAudio={transcribeAudio}
                onRAGFileUpload={onRAGFileUpload}
              />
            )}
          </ChatForm>
--- a/llama_stack/ui/components/chat-playground/conversations.tsx
+++ b/llama_stack/ui/components/chat-playground/conversations.tsx
@ -14,6 +14,7 @@ import { Card } from "@/components/ui/card";
 import { Trash2 } from "lucide-react";
 import type { Message } from "@/components/chat-playground/chat-message";
 import { useAuthClient } from "@/hooks/use-auth-client";
 import { cleanMessageContent } from "@/lib/message-content-utils";
 import type {
  Session,
  SessionCreateParams,
@ -219,10 +220,7 @@ export function Conversations({
            messages.push({
              id: `${turn.turn_id}-assistant-${messages.length}`,
              role: "assistant",
-              content:
+              content: cleanMessageContent(turn.output_message.content),
                typeof turn.output_message.content === "string"
                  ? turn.output_message.content
                  : JSON.stringify(turn.output_message.content),
              createdAt: new Date(
                turn.completed_at || turn.started_at || Date.now()
              ),
@ -271,7 +269,7 @@ export function Conversations({
  );
  const deleteSession = async (sessionId: string) => {
-    if (sessions.length <= 1 || !selectedAgentId) {
+    if (!selectedAgentId) {
      return;
    }
@ -324,7 +322,6 @@ export function Conversations({
    }
  }, [currentSession]);
  // Don't render if no agent is selected
  if (!selectedAgentId) {
    return null;
  }
@ -357,7 +354,7 @@ export function Conversations({
          + New
        </Button>
-        {currentSession && sessions.length > 1 && (
+        {currentSession && (
          <Button
            onClick={() => deleteSession(currentSession.id)}
            variant="outline"
--- a/llama_stack/ui/components/chat-playground/message-input.tsx
+++ b/llama_stack/ui/components/chat-playground/message-input.tsx
@ -21,6 +21,7 @@ interface MessageInputBaseProps
  isGenerating: boolean;
  enableInterrupt?: boolean;
  transcribeAudio?: (blob: Blob) => Promise<string>;
  onRAGFileUpload?: (file: File) => Promise<void>;
 }
 interface MessageInputWithoutAttachmentProps extends MessageInputBaseProps {
@ -213,8 +214,13 @@ export function MessageInput({
              className
            )}
            {...(props.allowAttachments
-              ? omit(props, ["allowAttachments", "files", "setFiles"])
+              ? omit(props, [
-              : omit(props, ["allowAttachments"]))}
+                  "allowAttachments",
                  "files",
                  "setFiles",
                  "onRAGFileUpload",
                ])
              : omit(props, ["allowAttachments", "onRAGFileUpload"]))}
          />
          {props.allowAttachments && (
@ -254,11 +260,19 @@ export function MessageInput({
            size="icon"
            variant="outline"
            className="h-8 w-8"
-            aria-label="Attach a file"
+            aria-label="Upload file to RAG"
-            disabled={true}
+            disabled={false}
            onClick={async () => {
-              const files = await showFileUploadDialog();
+              const input = document.createElement("input");
-              addFiles(files);
+              input.type = "file";
              input.accept = ".pdf,.txt,.md,.html,.csv,.json";
              input.onchange = async e => {
                const file = (e.target as HTMLInputElement).files?.[0];
                if (file && props.onRAGFileUpload) {
                  await props.onRAGFileUpload(file);
                }
              };
              input.click();
            }}
          >
            <Paperclip className="h-4 w-4" />
@ -337,28 +351,6 @@ function FileUploadOverlay({ isDragging }: FileUploadOverlayProps) {
  );
 }
 function showFileUploadDialog() {
  const input = document.createElement("input");
  input.type = "file";
  input.multiple = true;
  input.accept = "*/*";
  input.click();
  return new Promise<File[] | null>(resolve => {
    input.onchange = e => {
      const files = (e.currentTarget as HTMLInputElement).files;
      if (files) {
        resolve(Array.from(files));
        return;
      }
      resolve(null);
    };
  });
 }
 function TranscribingOverlay() {
  return (
    <motion.div
--- a/llama_stack/ui/components/chat-playground/vector-db-creator.tsx
+++ b/llama_stack/ui/components/chat-playground/vector-db-creator.tsx
@ -0,0 +1,243 @@
 "use client";
 import { useState, useEffect } from "react";
 import { Button } from "@/components/ui/button";
 import { Input } from "@/components/ui/input";
 import { Card } from "@/components/ui/card";
 import {
  Select,
  SelectContent,
  SelectItem,
  SelectTrigger,
  SelectValue,
 } from "@/components/ui/select";
 import { useAuthClient } from "@/hooks/use-auth-client";
 import type { Model } from "llama-stack-client/resources/models";
 interface VectorDBCreatorProps {
  models: Model[];
  onVectorDBCreated?: (vectorDbId: string) => void;
  onCancel?: () => void;
 }
 interface VectorDBProvider {
  api: string;
  provider_id: string;
  provider_type: string;
 }
 export function VectorDBCreator({
  models,
  onVectorDBCreated,
  onCancel,
 }: VectorDBCreatorProps) {
  const [vectorDbName, setVectorDbName] = useState("");
  const [selectedEmbeddingModel, setSelectedEmbeddingModel] = useState("");
  const [selectedProvider, setSelectedProvider] = useState("faiss");
  const [availableProviders, setAvailableProviders] = useState<
    VectorDBProvider[]
  >([]);
  const [isCreating, setIsCreating] = useState(false);
  const [isLoadingProviders, setIsLoadingProviders] = useState(false);
  const [error, setError] = useState<string | null>(null);
  const client = useAuthClient();
  const embeddingModels = models.filter(
    model => model.model_type === "embedding"
  );
  useEffect(() => {
    const fetchProviders = async () => {
      setIsLoadingProviders(true);
      try {
        const providersResponse = await client.providers.list();
        const vectorIoProviders = providersResponse.filter(
          (provider: VectorDBProvider) => provider.api === "vector_io"
        );
        setAvailableProviders(vectorIoProviders);
        if (vectorIoProviders.length > 0) {
          const faissProvider = vectorIoProviders.find(
            (p: VectorDBProvider) => p.provider_id === "faiss"
          );
          setSelectedProvider(
            faissProvider?.provider_id || vectorIoProviders[0].provider_id
          );
        }
      } catch (err) {
        console.error("Error fetching providers:", err);
        setAvailableProviders([
          {
            api: "vector_io",
            provider_id: "faiss",
            provider_type: "inline::faiss",
          },
        ]);
      } finally {
        setIsLoadingProviders(false);
      }
    };
    fetchProviders();
  }, [client]);
  const handleCreate = async () => {
    if (!vectorDbName.trim() || !selectedEmbeddingModel) {
      setError("Please provide a name and select an embedding model");
      return;
    }
    setIsCreating(true);
    setError(null);
    try {
      const embeddingModel = embeddingModels.find(
        m => m.identifier === selectedEmbeddingModel
      );
      if (!embeddingModel) {
        throw new Error("Selected embedding model not found");
      }
      const embeddingDimension = embeddingModel.metadata
        ?.embedding_dimension as number;
      if (!embeddingDimension) {
        throw new Error("Embedding dimension not available for selected model");
      }
      const vectorDbId = vectorDbName.trim() || `vector_db_${Date.now()}`;
      const response = await client.vectorDBs.register({
        vector_db_id: vectorDbId,
        embedding_model: selectedEmbeddingModel,
        embedding_dimension: embeddingDimension,
        provider_id: selectedProvider,
      });
      onVectorDBCreated?.(response.identifier || vectorDbId);
    } catch (err) {
      console.error("Error creating vector DB:", err);
      setError(
        err instanceof Error ? err.message : "Failed to create vector DB"
      );
    } finally {
      setIsCreating(false);
    }
  };
  return (
    <Card className="p-6 space-y-4">
      <h3 className="text-lg font-semibold">Create Vector Database</h3>
      <div className="space-y-4">
        <div>
          <label className="text-sm font-medium block mb-2">
            Vector DB Name
          </label>
          <Input
            value={vectorDbName}
            onChange={e => setVectorDbName(e.target.value)}
            placeholder="My Vector Database"
          />
        </div>
        <div>
          <label className="text-sm font-medium block mb-2">
            Embedding Model
          </label>
          <Select
            value={selectedEmbeddingModel}
            onValueChange={setSelectedEmbeddingModel}
          >
            <SelectTrigger>
              <SelectValue placeholder="Select Embedding Model" />
            </SelectTrigger>
            <SelectContent>
              {embeddingModels.map(model => (
                <SelectItem key={model.identifier} value={model.identifier}>
                  {model.identifier}
                </SelectItem>
              ))}
            </SelectContent>
          </Select>
          {selectedEmbeddingModel && (
            <p className="text-xs text-muted-foreground mt-1">
              Dimension:{" "}
              {embeddingModels.find(
                m => m.identifier === selectedEmbeddingModel
              )?.metadata?.embedding_dimension || "Unknown"}
            </p>
          )}
        </div>
        <div>
          <label className="text-sm font-medium block mb-2">
            Vector Database Provider
          </label>
          <Select
            value={selectedProvider}
            onValueChange={setSelectedProvider}
            disabled={isLoadingProviders}
          >
            <SelectTrigger>
              <SelectValue
                placeholder={
                  isLoadingProviders
                    ? "Loading providers..."
                    : "Select Provider"
                }
              />
            </SelectTrigger>
            <SelectContent>
              {availableProviders.map(provider => (
                <SelectItem
                  key={provider.provider_id}
                  value={provider.provider_id}
                >
                  {provider.provider_id}
                </SelectItem>
              ))}
            </SelectContent>
          </Select>
          {selectedProvider && (
            <p className="text-xs text-muted-foreground mt-1">
              Selected provider: {selectedProvider}
            </p>
          )}
        </div>
        {error && (
          <div className="text-destructive text-sm bg-destructive/10 p-2 rounded">
            {error}
          </div>
        )}
        <div className="flex gap-2 pt-2">
          <Button
            onClick={handleCreate}
            disabled={
              isCreating || !vectorDbName.trim() || !selectedEmbeddingModel
            }
            className="flex-1"
          >
            {isCreating ? "Creating..." : "Create Vector DB"}
          </Button>
          {onCancel && (
            <Button variant="outline" onClick={onCancel} className="flex-1">
              Cancel
            </Button>
          )}
        </div>
      </div>
      <div className="text-xs text-muted-foreground bg-muted/50 p-3 rounded">
        <strong>Note:</strong> This will create a new vector database that can
        be used with RAG tools. After creation, you&apos;ll be able to upload
        documents and use it for knowledge search in your agent conversations.
      </div>
    </Card>
  );
 }
--- a/llama_stack/ui/lib/message-content-utils.ts
+++ b/llama_stack/ui/lib/message-content-utils.ts
@ -0,0 +1,51 @@
 // check if content contains function call JSON
 export const containsToolCall = (content: string): boolean => {
  return (
    content.includes('"type": "function"') ||
    content.includes('"name": "knowledge_search"') ||
    content.includes('"parameters":') ||
    !!content.match(/\{"type":\s*"function".*?\}/)
  );
 };
 export const extractCleanText = (content: string): string | null => {
  if (containsToolCall(content)) {
    try {
      // parse and extract non-function call parts
      const jsonMatch = content.match(/\{"type":\s*"function"[^}]*\}[^}]*\}/);
      if (jsonMatch) {
        const jsonPart = jsonMatch[0];
        const parsedJson = JSON.parse(jsonPart);
        // if function call, extract text after JSON
        if (parsedJson.type === "function") {
          const textAfterJson = content
            .substring(content.indexOf(jsonPart) + jsonPart.length)
            .trim();
          return textAfterJson || null;
        }
      }
      return null;
    } catch {
      return null;
    }
  }
  return content;
 };
 // removes function call JSON handling different content types
 export const cleanMessageContent = (
  content: string | unknown[] | unknown
 ): string => {
  if (typeof content === "string") {
    const cleaned = extractCleanText(content);
    return cleaned || "";
  } else if (Array.isArray(content)) {
    return content
      .filter((item: { type: string }) => item.type === "text")
      .map((item: { text: string }) => item.text)
      .join("");
  } else {
    return JSON.stringify(content);
  }
 };
--- a/llama_stack/ui/package-lock.json
+++ b/llama_stack/ui/package-lock.json
@ -18,7 +18,7 @@
        "class-variance-authority": "^0.7.1",
        "clsx": "^2.1.1",
        "framer-motion": "^11.18.2",
-        "llama-stack-client": "^0.2.18",
+        "llama-stack-client": "^0.2.19",
        "lucide-react": "^0.510.0",
        "next": "15.3.3",
        "next-auth": "^4.24.11",
@ -27,7 +27,7 @@
        "react-dom": "^19.0.0",
        "react-markdown": "^10.1.0",
        "remark-gfm": "^4.0.1",
-        "remeda": "^2.26.1",
+        "remeda": "^2.30.0",
        "shiki": "^1.29.2",
        "sonner": "^2.0.6",
        "tailwind-merge": "^3.3.1"
@ -35,8 +35,8 @@
      "devDependencies": {
        "@eslint/eslintrc": "^3",
        "@tailwindcss/postcss": "^4",
-        "@testing-library/dom": "^10.4.0",
+        "@testing-library/dom": "^10.4.1",
-        "@testing-library/jest-dom": "^6.6.3",
+        "@testing-library/jest-dom": "^6.8.0",
        "@testing-library/react": "^16.3.0",
        "@types/jest": "^29.5.14",
        "@types/node": "^20",
@ -45,7 +45,7 @@
        "eslint": "^9",
        "eslint-config-next": "15.3.2",
        "eslint-config-prettier": "^10.1.8",
-        "eslint-plugin-prettier": "^5.4.0",
+        "eslint-plugin-prettier": "^5.5.4",
        "jest": "^29.7.0",
        "jest-environment-jsdom": "^29.7.0",
        "prettier": "3.5.3",
@ -2041,9 +2041,9 @@
      }
    },
    "node_modules/@pkgr/core": {
-      "version": "0.2.4",
+      "version": "0.2.9",
-      "resolved": "https://registry.npmjs.org/@pkgr/core/-/core-0.2.4.tgz",
+      "resolved": "https://registry.npmjs.org/@pkgr/core/-/core-0.2.9.tgz",
-      "integrity": "sha512-ROFF39F6ZrnzSUEmQQZUar0Jt4xVoP9WnDRdWwF4NNcXs3xBTLgBUDoOwW141y1jP+S8nahIbdxbFC7IShw9Iw==",
+      "integrity": "sha512-QNqXyfVS2wm9hweSYD2O7F0G06uurj9kZ96TRQE5Y9hU7+tgdZwIkbAKc5Ocy1HxEY2kuDQa6cQ1WRs/O5LFKA==",
      "dev": true,
      "license": "MIT",
      "engines": {
@ -3567,9 +3567,9 @@
      }
    },
    "node_modules/@testing-library/dom": {
-      "version": "10.4.0",
+      "version": "10.4.1",
-      "resolved": "https://registry.npmjs.org/@testing-library/dom/-/dom-10.4.0.tgz",
+      "resolved": "https://registry.npmjs.org/@testing-library/dom/-/dom-10.4.1.tgz",
-      "integrity": "sha512-pemlzrSESWbdAloYml3bAJMEfNh1Z7EduzqPKprCH5S341frlpYnUEW0H72dLxa6IsYr+mPno20GiSm+h9dEdQ==",
+      "integrity": "sha512-o4PXJQidqJl82ckFaXUeoAW+XysPLauYI43Abki5hABd853iMhitooc6znOnczgbTYmEP6U6/y1ZyKAIsvMKGg==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
@ -3577,9 +3577,9 @@
        "@babel/runtime": "^7.12.5",
        "@types/aria-query": "^5.0.1",
        "aria-query": "5.3.0",
        "chalk": "^4.1.0",
        "dom-accessibility-api": "^0.5.9",
        "lz-string": "^1.5.0",
        "picocolors": "1.1.1",
        "pretty-format": "^27.0.2"
      },
      "engines": {
@ -3597,18 +3597,17 @@
      }
    },
    "node_modules/@testing-library/jest-dom": {
-      "version": "6.6.3",
+      "version": "6.8.0",
-      "resolved": "https://registry.npmjs.org/@testing-library/jest-dom/-/jest-dom-6.6.3.tgz",
+      "resolved": "https://registry.npmjs.org/@testing-library/jest-dom/-/jest-dom-6.8.0.tgz",
-      "integrity": "sha512-IteBhl4XqYNkM54f4ejhLRJiZNqcSCoXUOG2CPK7qbD322KjQozM4kHQOfkG2oln9b9HTYqs+Sae8vBATubxxA==",
+      "integrity": "sha512-WgXcWzVM6idy5JaftTVC8Vs83NKRmGJz4Hqs4oyOuO2J4r/y79vvKZsb+CaGyCSEbUPI6OsewfPd0G1A0/TUZQ==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
        "@adobe/css-tools": "^4.4.0",
        "aria-query": "^5.0.0",
        "chalk": "^3.0.0",
        "css.escape": "^1.5.1",
        "dom-accessibility-api": "^0.6.3",
-        "lodash": "^4.17.21",
+        "picocolors": "^1.1.1",
        "redent": "^3.0.0"
      },
      "engines": {
@ -3617,20 +3616,6 @@
        "yarn": ">=1"
      }
    },
    "node_modules/@testing-library/jest-dom/node_modules/chalk": {
      "version": "3.0.0",
      "resolved": "https://registry.npmjs.org/chalk/-/chalk-3.0.0.tgz",
      "integrity": "sha512-4D3B6Wf41KOYRFdszmDqMCGq5VV/uMAB273JILmO+3jAlh8X4qDtdtgCR3fxtbLEMzSx22QdhnDcJvu2u1fVwg==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
        "ansi-styles": "^4.1.0",
        "supports-color": "^7.1.0"
      },
      "engines": {
        "node": ">=8"
      }
    },
    "node_modules/@testing-library/jest-dom/node_modules/dom-accessibility-api": {
      "version": "0.6.3",
      "resolved": "https://registry.npmjs.org/dom-accessibility-api/-/dom-accessibility-api-0.6.3.tgz",
@ -6661,14 +6646,14 @@
      }
    },
    "node_modules/eslint-plugin-prettier": {
-      "version": "5.4.0",
+      "version": "5.5.4",
-      "resolved": "https://registry.npmjs.org/eslint-plugin-prettier/-/eslint-plugin-prettier-5.4.0.tgz",
+      "resolved": "https://registry.npmjs.org/eslint-plugin-prettier/-/eslint-plugin-prettier-5.5.4.tgz",
-      "integrity": "sha512-BvQOvUhkVQM1i63iMETK9Hjud9QhqBnbtT1Zc642p9ynzBuCe5pybkOnvqZIBypXmMlsGcnU4HZ8sCTPfpAexA==",
+      "integrity": "sha512-swNtI95SToIz05YINMA6Ox5R057IMAmWZ26GqPxusAp1TZzj+IdY9tXNWWD3vkF/wEqydCONcwjTFpxybBqZsg==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
        "prettier-linter-helpers": "^1.0.0",
-        "synckit": "^0.11.0"
+        "synckit": "^0.11.7"
      },
      "engines": {
        "node": "^14.18.0 || >=16.0.0"
@ -10021,9 +10006,9 @@
      "license": "MIT"
    },
    "node_modules/llama-stack-client": {
-      "version": "0.2.18",
+      "version": "0.2.19",
-      "resolved": "https://registry.npmjs.org/llama-stack-client/-/llama-stack-client-0.2.18.tgz",
+      "resolved": "https://registry.npmjs.org/llama-stack-client/-/llama-stack-client-0.2.19.tgz",
-      "integrity": "sha512-k+xQOz/TIU0cINP4Aih8q6xs7f/6qs0fLDMXTTKQr5C0F1jtCjRiwsas7bTsDfpKfYhg/7Xy/wPw/uZgi6aIVg==",
+      "integrity": "sha512-sDuAhUdEGlERZ3jlMUzPXcQTgMv/pGbDrPX0ifbE5S+gr7Q+7ohuQYrIXe+hXgIipFjq+y4b2c5laZ76tmAyEA==",
      "license": "MIT",
      "dependencies": {
        "@types/node": "^18.11.18",
@ -10066,13 +10051,6 @@
        "url": "https://github.com/sponsors/sindresorhus"
      }
    },
    "node_modules/lodash": {
      "version": "4.17.21",
      "resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
      "integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==",
      "dev": true,
      "license": "MIT"
    },
    "node_modules/lodash.merge": {
      "version": "4.6.2",
      "resolved": "https://registry.npmjs.org/lodash.merge/-/lodash.merge-4.6.2.tgz",
@ -12602,9 +12580,9 @@
      }
    },
    "node_modules/remeda": {
-      "version": "2.26.1",
+      "version": "2.30.0",
-      "resolved": "https://registry.npmjs.org/remeda/-/remeda-2.26.1.tgz",
+      "resolved": "https://registry.npmjs.org/remeda/-/remeda-2.30.0.tgz",
-      "integrity": "sha512-hpiLfhUwkJhiMS3Z7dRrygcRdkMRZASw5qUdNdi33x1/Y9y/J5q5TyLyf8btDoVLIcsg/4fzPdaGXDTbnl+ixw==",
+      "integrity": "sha512-TcRpI1ecqnMer3jHhFtMerGvHFCDlCHljUp0/9A4HxHOh5bSY3kP1l8nQDFMnWYJKl3MSarDNY1tb0Bs/bCmvw==",
      "license": "MIT",
      "dependencies": {
        "type-fest": "^4.41.0"
@ -13567,14 +13545,13 @@
      "license": "MIT"
    },
    "node_modules/synckit": {
-      "version": "0.11.5",
+      "version": "0.11.11",
-      "resolved": "https://registry.npmjs.org/synckit/-/synckit-0.11.5.tgz",
+      "resolved": "https://registry.npmjs.org/synckit/-/synckit-0.11.11.tgz",
-      "integrity": "sha512-frqvfWyDA5VPVdrWfH24uM6SI/O8NLpVbIIJxb8t/a3YGsp4AW9CYgSKC0OaSEfexnp7Y1pVh2Y6IHO8ggGDmA==",
+      "integrity": "sha512-MeQTA1r0litLUf0Rp/iisCaL8761lKAZHaimlbGK4j0HysC4PLfqygQj9srcs0m2RdtDYnF8UuYyKpbjHYp7Jw==",
      "dev": true,
      "license": "MIT",
      "dependencies": {
-        "@pkgr/core": "^0.2.4",
+        "@pkgr/core": "^0.2.9"
        "tslib": "^2.8.1"
      },
      "engines": {
        "node": "^14.18.0 || >=16.0.0"
--- a/llama_stack/ui/package.json
+++ b/llama_stack/ui/package.json
@ -23,7 +23,7 @@
    "class-variance-authority": "^0.7.1",
    "clsx": "^2.1.1",
    "framer-motion": "^11.18.2",
-    "llama-stack-client": "^0.2.18",
+    "llama-stack-client": "^0.2.19",
    "lucide-react": "^0.510.0",
    "next": "15.3.3",
    "next-auth": "^4.24.11",
@ -32,7 +32,7 @@
    "react-dom": "^19.0.0",
    "react-markdown": "^10.1.0",
    "remark-gfm": "^4.0.1",
-    "remeda": "^2.26.1",
+    "remeda": "^2.30.0",
    "shiki": "^1.29.2",
    "sonner": "^2.0.6",
    "tailwind-merge": "^3.3.1"
@ -40,8 +40,8 @@
  "devDependencies": {
    "@eslint/eslintrc": "^3",
    "@tailwindcss/postcss": "^4",
-    "@testing-library/dom": "^10.4.0",
+    "@testing-library/dom": "^10.4.1",
-    "@testing-library/jest-dom": "^6.6.3",
+    "@testing-library/jest-dom": "^6.8.0",
    "@testing-library/react": "^16.3.0",
    "@types/jest": "^29.5.14",
    "@types/node": "^20",
@ -50,7 +50,7 @@
    "eslint": "^9",
    "eslint-config-next": "15.3.2",
    "eslint-config-prettier": "^10.1.8",
-    "eslint-plugin-prettier": "^5.4.0",
+    "eslint-plugin-prettier": "^5.5.4",
    "jest": "^29.7.0",
    "jest-environment-jsdom": "^29.7.0",
    "prettier": "3.5.3",
--- a/pyproject.toml
+++ b/pyproject.toml
@ -7,7 +7,7 @@ required-version = ">=0.7.0"
 [project]
 name = "llama_stack"
-version = "0.2.18"
+version = "0.2.19"
 authors = [{ name = "Meta Llama", email = "llama-oss@meta.com" }]
 description = "Llama Stack"
 readme = "README.md"
@ -31,7 +31,7 @@ dependencies = [
    "huggingface-hub>=0.34.0,<1.0",
    "jinja2>=3.1.6",
    "jsonschema",
-    "llama-stack-client>=0.2.18",
+    "llama-stack-client>=0.2.19",
    "llama-api-client>=0.1.2",
    "openai>=1.99.6,<1.100.0",
    "prompt-toolkit",
@ -56,7 +56,7 @@ dependencies = [
 ui = [
    "streamlit",
    "pandas",
-    "llama-stack-client>=0.2.18",
+    "llama-stack-client>=0.2.19",
    "streamlit-option-menu",
 ]
--- a/tests/integration/recordings/index.sqlite
+++ b/tests/integration/recordings/index.sqlite
--- a/tests/integration/vector_io/test_vector_io.py
+++ b/tests/integration/vector_io/test_vector_io.py
@ -47,34 +47,45 @@ def client_with_empty_registry(client_with_models):
 def test_vector_db_retrieve(client_with_empty_registry, embedding_model_id, embedding_dimension):
-    # Register a memory bank first
+    vector_db_name = "test_vector_db"
-    vector_db_id = "test_vector_db"
+    register_response = client_with_empty_registry.vector_dbs.register(
-    client_with_empty_registry.vector_dbs.register(
+        vector_db_id=vector_db_name,
        vector_db_id=vector_db_id,
        embedding_model=embedding_model_id,
        embedding_dimension=embedding_dimension,
    )
    actual_vector_db_id = register_response.identifier
    # Retrieve the memory bank and validate its properties
-    response = client_with_empty_registry.vector_dbs.retrieve(vector_db_id=vector_db_id)
+    response = client_with_empty_registry.vector_dbs.retrieve(vector_db_id=actual_vector_db_id)
    assert response is not None
-    assert response.identifier == vector_db_id
+    assert response.identifier == actual_vector_db_id
    assert response.embedding_model == embedding_model_id
-    assert response.provider_resource_id == vector_db_id
+    assert response.identifier.startswith("vs_")
 def test_vector_db_register(client_with_empty_registry, embedding_model_id, embedding_dimension):
-    vector_db_id = "test_vector_db"
+    vector_db_name = "test_vector_db"
-    client_with_empty_registry.vector_dbs.register(
+    response = client_with_empty_registry.vector_dbs.register(
-        vector_db_id=vector_db_id,
+        vector_db_id=vector_db_name,
        embedding_model=embedding_model_id,
        embedding_dimension=embedding_dimension,
    )
-    vector_dbs_after_register = [vector_db.identifier for vector_db in client_with_empty_registry.vector_dbs.list()]
+    actual_vector_db_id = response.identifier
-    assert vector_dbs_after_register == [vector_db_id]
+    assert actual_vector_db_id.startswith("vs_")
    assert actual_vector_db_id != vector_db_name
-    client_with_empty_registry.vector_dbs.unregister(vector_db_id=vector_db_id)
+    vector_dbs_after_register = [vector_db.identifier for vector_db in client_with_empty_registry.vector_dbs.list()]
    assert vector_dbs_after_register == [actual_vector_db_id]
    vector_stores = client_with_empty_registry.vector_stores.list()
    assert len(vector_stores.data) == 1
    vector_store = vector_stores.data[0]
    assert vector_store.id == actual_vector_db_id
    assert vector_store.name == vector_db_name
    client_with_empty_registry.vector_dbs.unregister(vector_db_id=actual_vector_db_id)
    vector_dbs = [vector_db.identifier for vector_db in client_with_empty_registry.vector_dbs.list()]
    assert len(vector_dbs) == 0
@ -91,20 +102,22 @@ def test_vector_db_register(client_with_empty_registry, embedding_model_id, embe
    ],
 )
 def test_insert_chunks(client_with_empty_registry, embedding_model_id, embedding_dimension, sample_chunks, test_case):
-    vector_db_id = "test_vector_db"
+    vector_db_name = "test_vector_db"
-    client_with_empty_registry.vector_dbs.register(
+    register_response = client_with_empty_registry.vector_dbs.register(
-        vector_db_id=vector_db_id,
+        vector_db_id=vector_db_name,
        embedding_model=embedding_model_id,
        embedding_dimension=embedding_dimension,
    )
    actual_vector_db_id = register_response.identifier
    client_with_empty_registry.vector_io.insert(
-        vector_db_id=vector_db_id,
+        vector_db_id=actual_vector_db_id,
        chunks=sample_chunks,
    )
    response = client_with_empty_registry.vector_io.query(
-        vector_db_id=vector_db_id,
+        vector_db_id=actual_vector_db_id,
        query="What is the capital of France?",
    )
    assert response is not None
@ -113,7 +126,7 @@ def test_insert_chunks(client_with_empty_registry, embedding_model_id, embedding
    query, expected_doc_id = test_case
    response = client_with_empty_registry.vector_io.query(
-        vector_db_id=vector_db_id,
+        vector_db_id=actual_vector_db_id,
        query=query,
    )
    assert response is not None
@ -128,13 +141,15 @@ def test_insert_chunks_with_precomputed_embeddings(client_with_empty_registry, e
        "remote::qdrant": {"score_threshold": -1.0},
        "inline::qdrant": {"score_threshold": -1.0},
    }
-    vector_db_id = "test_precomputed_embeddings_db"
+    vector_db_name = "test_precomputed_embeddings_db"
-    client_with_empty_registry.vector_dbs.register(
+    register_response = client_with_empty_registry.vector_dbs.register(
-        vector_db_id=vector_db_id,
+        vector_db_id=vector_db_name,
        embedding_model=embedding_model_id,
        embedding_dimension=embedding_dimension,
    )
    actual_vector_db_id = register_response.identifier
    chunks_with_embeddings = [
        Chunk(
            content="This is a test chunk with precomputed embedding.",
@ -144,13 +159,13 @@ def test_insert_chunks_with_precomputed_embeddings(client_with_empty_registry, e
    ]
    client_with_empty_registry.vector_io.insert(
-        vector_db_id=vector_db_id,
+        vector_db_id=actual_vector_db_id,
        chunks=chunks_with_embeddings,
    )
    provider = [p.provider_id for p in client_with_empty_registry.providers.list() if p.api == "vector_io"][0]
    response = client_with_empty_registry.vector_io.query(
-        vector_db_id=vector_db_id,
+        vector_db_id=actual_vector_db_id,
        query="precomputed embedding test",
        params=vector_io_provider_params_dict.get(provider, None),
    )
@ -173,13 +188,15 @@ def test_query_returns_valid_object_when_identical_to_embedding_in_vdb(
        "remote::qdrant": {"score_threshold": 0.0},
        "inline::qdrant": {"score_threshold": 0.0},
    }
-    vector_db_id = "test_precomputed_embeddings_db"
+    vector_db_name = "test_precomputed_embeddings_db"
-    client_with_empty_registry.vector_dbs.register(
+    register_response = client_with_empty_registry.vector_dbs.register(
-        vector_db_id=vector_db_id,
+        vector_db_id=vector_db_name,
        embedding_model=embedding_model_id,
        embedding_dimension=embedding_dimension,
    )
    actual_vector_db_id = register_response.identifier
    chunks_with_embeddings = [
        Chunk(
            content="duplicate",
@ -189,13 +206,13 @@ def test_query_returns_valid_object_when_identical_to_embedding_in_vdb(
    ]
    client_with_empty_registry.vector_io.insert(
-        vector_db_id=vector_db_id,
+        vector_db_id=actual_vector_db_id,
        chunks=chunks_with_embeddings,
    )
    provider = [p.provider_id for p in client_with_empty_registry.providers.list() if p.api == "vector_io"][0]
    response = client_with_empty_registry.vector_io.query(
-        vector_db_id=vector_db_id,
+        vector_db_id=actual_vector_db_id,
        query="duplicate",
        params=vector_io_provider_params_dict.get(provider, None),
    )
--- a/tests/unit/distribution/routers/test_routing_tables.py
+++ b/tests/unit/distribution/routers/test_routing_tables.py
@ -146,6 +146,20 @@ class VectorDBImpl(Impl):
    async def unregister_vector_db(self, vector_db_id: str):
        return vector_db_id
    async def openai_create_vector_store(self, **kwargs):
        import time
        import uuid
        from llama_stack.apis.vector_io.vector_io import VectorStoreFileCounts, VectorStoreObject
        vector_store_id = kwargs.get("provider_vector_db_id") or f"vs_{uuid.uuid4()}"
        return VectorStoreObject(
            id=vector_store_id,
            name=kwargs.get("name", vector_store_id),
            created_at=int(time.time()),
            file_counts=VectorStoreFileCounts(completed=0, cancelled=0, failed=0, in_progress=0, total=0),
        )
 async def test_models_routing_table(cached_disk_dist_registry):
    table = ModelsRoutingTable({"test_provider": InferenceImpl()}, cached_disk_dist_registry, {})
@ -247,17 +261,21 @@ async def test_vectordbs_routing_table(cached_disk_dist_registry):
    )
    # Register multiple vector databases and verify listing
-    await table.register_vector_db(vector_db_id="test-vectordb", embedding_model="test_provider/test-model")
+    vdb1 = await table.register_vector_db(vector_db_id="test-vectordb", embedding_model="test_provider/test-model")
-    await table.register_vector_db(vector_db_id="test-vectordb-2", embedding_model="test_provider/test-model")
+    vdb2 = await table.register_vector_db(vector_db_id="test-vectordb-2", embedding_model="test_provider/test-model")
    vector_dbs = await table.list_vector_dbs()
    assert len(vector_dbs.data) == 2
    vector_db_ids = {v.identifier for v in vector_dbs.data}
-    assert "test-vectordb" in vector_db_ids
+    assert vdb1.identifier in vector_db_ids
-    assert "test-vectordb-2" in vector_db_ids
+    assert vdb2.identifier in vector_db_ids
-    await table.unregister_vector_db(vector_db_id="test-vectordb")
+    # Verify they have UUID-based identifiers
-    await table.unregister_vector_db(vector_db_id="test-vectordb-2")
+    assert vdb1.identifier.startswith("vs_")
    assert vdb2.identifier.startswith("vs_")
    await table.unregister_vector_db(vector_db_id=vdb1.identifier)
    await table.unregister_vector_db(vector_db_id=vdb2.identifier)
    vector_dbs = await table.list_vector_dbs()
    assert len(vector_dbs.data) == 0
--- a/tests/unit/distribution/routing_tables/test_vector_dbs.py
+++ b/tests/unit/distribution/routing_tables/test_vector_dbs.py
@ -7,6 +7,7 @@
 # Unit tests for the routing tables vector_dbs
 import time
 import uuid
 from unittest.mock import AsyncMock
 import pytest
@ -34,6 +35,7 @@ from tests.unit.distribution.routers.test_routing_tables import Impl, InferenceI
 class VectorDBImpl(Impl):
    def __init__(self):
        super().__init__(Api.vector_io)
        self.vector_stores = {}
    async def register_vector_db(self, vector_db: VectorDB):
        return vector_db
@ -114,8 +116,35 @@ class VectorDBImpl(Impl):
    async def openai_delete_vector_store_file(self, vector_store_id, file_id):
        return VectorStoreFileDeleteResponse(id=file_id, deleted=True)
    async def openai_create_vector_store(
        self,
        name=None,
        embedding_model=None,
        embedding_dimension=None,
        provider_id=None,
        provider_vector_db_id=None,
        **kwargs,
    ):
        vector_store_id = provider_vector_db_id or f"vs_{uuid.uuid4()}"
        vector_store = VectorStoreObject(
            id=vector_store_id,
            name=name or vector_store_id,
            created_at=int(time.time()),
            file_counts=VectorStoreFileCounts(completed=0, cancelled=0, failed=0, in_progress=0, total=0),
        )
        self.vector_stores[vector_store_id] = vector_store
        return vector_store
    async def openai_list_vector_stores(self, **kwargs):
        from llama_stack.apis.vector_io.vector_io import VectorStoreListResponse
        return VectorStoreListResponse(
            data=list(self.vector_stores.values()), has_more=False, first_id=None, last_id=None
        )
 async def test_vectordbs_routing_table(cached_disk_dist_registry):
    n = 10
    table = VectorDBsRoutingTable({"test_provider": VectorDBImpl()}, cached_disk_dist_registry, {})
    await table.initialize()
@ -129,22 +158,98 @@ async def test_vectordbs_routing_table(cached_disk_dist_registry):
    )
    # Register multiple vector databases and verify listing
-    await table.register_vector_db(vector_db_id="test-vectordb", embedding_model="test-model")
+    vdb_dict = {}
-    await table.register_vector_db(vector_db_id="test-vectordb-2", embedding_model="test-model")
+    for i in range(n):
        vdb_dict[i] = await table.register_vector_db(vector_db_id=f"test-vectordb-{i}", embedding_model="test-model")
    vector_dbs = await table.list_vector_dbs()
-    assert len(vector_dbs.data) == 2
+    assert len(vector_dbs.data) == len(vdb_dict)
    vector_db_ids = {v.identifier for v in vector_dbs.data}
-    assert "test-vectordb" in vector_db_ids
+    for k in vdb_dict:
-    assert "test-vectordb-2" in vector_db_ids
+        assert vdb_dict[k].identifier in vector_db_ids
-
+    for k in vdb_dict:
-    await table.unregister_vector_db(vector_db_id="test-vectordb")
+        await table.unregister_vector_db(vector_db_id=vdb_dict[k].identifier)
    await table.unregister_vector_db(vector_db_id="test-vectordb-2")
    vector_dbs = await table.list_vector_dbs()
    assert len(vector_dbs.data) == 0
 async def test_vector_db_and_vector_store_id_mapping(cached_disk_dist_registry):
    n = 10
    impl = VectorDBImpl()
    table = VectorDBsRoutingTable({"test_provider": impl}, cached_disk_dist_registry, {})
    await table.initialize()
    m_table = ModelsRoutingTable({"test_provider": InferenceImpl()}, cached_disk_dist_registry, {})
    await m_table.initialize()
    await m_table.register_model(
        model_id="test-model",
        provider_id="test_provider",
        metadata={"embedding_dimension": 128},
        model_type=ModelType.embedding,
    )
    vdb_dict = {}
    for i in range(n):
        vdb_dict[i] = await table.register_vector_db(vector_db_id=f"test-vectordb-{i}", embedding_model="test-model")
    vector_dbs = await table.list_vector_dbs()
    vector_db_ids = {v.identifier for v in vector_dbs.data}
    vector_stores = await impl.openai_list_vector_stores()
    vector_store_ids = {v.id for v in vector_stores.data}
    assert vector_db_ids == vector_store_ids, (
        f"Vector DB IDs {vector_db_ids} don't match vector store IDs {vector_store_ids}"
    )
    for vector_store in vector_stores.data:
        vector_db = await table.get_vector_db(vector_store.id)
        assert vector_store.name == vector_db.vector_db_name, (
            f"Vector store name {vector_store.name} doesn't match vector store ID {vector_store.id}"
        )
    for vector_db_id in vector_db_ids:
        await table.unregister_vector_db(vector_db_id)
    assert len((await table.list_vector_dbs()).data) == 0
 async def test_vector_db_id_becomes_vector_store_name(cached_disk_dist_registry):
    impl = VectorDBImpl()
    table = VectorDBsRoutingTable({"test_provider": impl}, cached_disk_dist_registry, {})
    await table.initialize()
    m_table = ModelsRoutingTable({"test_provider": InferenceImpl()}, cached_disk_dist_registry, {})
    await m_table.initialize()
    await m_table.register_model(
        model_id="test-model",
        provider_id="test_provider",
        metadata={"embedding_dimension": 128},
        model_type=ModelType.embedding,
    )
    user_provided_id = "my-custom-vector-db"
    await table.register_vector_db(vector_db_id=user_provided_id, embedding_model="test-model")
    vector_stores = await impl.openai_list_vector_stores()
    assert len(vector_stores.data) == 1
    vector_store = vector_stores.data[0]
    assert vector_store.name == user_provided_id
    assert vector_store.id.startswith("vs_")
    assert vector_store.id != user_provided_id
    vector_dbs = await table.list_vector_dbs()
    assert len(vector_dbs.data) == 1
    assert vector_dbs.data[0].identifier == vector_store.id
    await table.unregister_vector_db(vector_store.id)
 async def test_openai_vector_stores_routing_table_roles(cached_disk_dist_registry):
    impl = VectorDBImpl()
    impl.openai_retrieve_vector_store = AsyncMock(return_value="OK")
@ -164,7 +269,8 @@ async def test_openai_vector_stores_routing_table_roles(cached_disk_dist_registr
    authorized_user = User(principal="alice", attributes={"roles": [authorized_team]})
    with request_provider_data_context({}, authorized_user):
-        _ = await table.register_vector_db(vector_db_id="vs1", embedding_model="test-model")
+        registered_vdb = await table.register_vector_db(vector_db_id="vs1", embedding_model="test-model")
        authorized_table = registered_vdb.identifier  # Use the actual generated ID
    # Authorized reader
    with request_provider_data_context({}, authorized_user):
@ -227,7 +333,8 @@ async def test_openai_vector_stores_routing_table_actions(cached_disk_dist_regis
    )
    with request_provider_data_context({}, admin_user):
-        await table.register_vector_db(vector_db_id=vector_db_id, embedding_model="test-model")
+        registered_vdb = await table.register_vector_db(vector_db_id=vector_db_id, embedding_model="test-model")
        vector_db_id = registered_vdb.identifier  # Use the actual generated ID
    read_methods = [
        (table.openai_retrieve_vector_store, (vector_db_id,), {}),
--- a/tests/unit/distribution/test_inference_recordings.py
+++ b/tests/unit/distribution/test_inference_recordings.py
@ -4,7 +4,6 @@
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
 import sqlite3
 import tempfile
 from pathlib import Path
 from unittest.mock import patch
@ -133,7 +132,6 @@ class TestInferenceRecording:
        # Test directory creation
        assert storage.test_dir.exists()
        assert storage.responses_dir.exists()
        assert storage.db_path.exists()
        # Test storing and retrieving a recording
        request_hash = "test_hash_123"
@ -147,15 +145,6 @@ class TestInferenceRecording:
        storage.store_recording(request_hash, request_data, response_data)
        # Verify SQLite record
        with sqlite3.connect(storage.db_path) as conn:
            result = conn.execute("SELECT * FROM recordings WHERE request_hash = ?", (request_hash,)).fetchone()
        assert result is not None
        assert result[0] == request_hash  # request_hash
        assert result[2] == "/v1/chat/completions"  # endpoint
        assert result[3] == "llama3.2:3b"  # model
        # Verify file storage and retrieval
        retrieved = storage.find_recording(request_hash)
        assert retrieved is not None
@ -185,10 +174,7 @@ class TestInferenceRecording:
        # Verify recording was stored
        storage = ResponseStorage(temp_storage_dir)
-        with sqlite3.connect(storage.db_path) as conn:
+        assert storage.responses_dir.exists()
            recordings = conn.execute("SELECT COUNT(*) FROM recordings").fetchone()[0]
        assert recordings == 1
    async def test_replay_mode(self, temp_storage_dir, real_openai_chat_response):
        """Test that replay mode returns stored responses without making real calls."""
--- a/tests/unit/server/test_replace_env_vars.py
+++ b/tests/unit/server/test_replace_env_vars.py
@ -88,3 +88,10 @@ def test_nested_structures(setup_env_vars):
    }
    expected = {"key1": "test_value", "key2": ["default", "conditional"], "key3": {"nested": None}}
    assert replace_env_vars(data) == expected
 def test_explicit_strings_preserved(setup_env_vars):
    # Explicit strings that look like numbers/booleans should remain strings
    data = {"port": "8080", "enabled": "true", "count": "123", "ratio": "3.14"}
    expected = {"port": "8080", "enabled": "true", "count": "123", "ratio": "3.14"}
    assert replace_env_vars(data) == expected
--- a/uv.lock
+++ b/uv.lock
@ -1128,6 +1128,9 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/4f/72/dcbc6dbf838549b7b0c2c18c1365d2580eb7456939e4b608c3ab213fce78/geventhttpclient-2.3.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:9ac30c38d86d888b42bb2ab2738ab9881199609e9fa9a153eb0c66fc9188c6cb", size = 71984, upload-time = "2025-06-11T13:17:09.126Z" },
    { url = "https://files.pythonhosted.org/packages/4c/f9/74aa8c556364ad39b238919c954a0da01a6154ad5e85a1d1ab5f9f5ac186/geventhttpclient-2.3.4-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:4b802000a4fad80fa57e895009671d6e8af56777e3adf0d8aee0807e96188fd9", size = 52631, upload-time = "2025-06-11T13:17:10.061Z" },
    { url = "https://files.pythonhosted.org/packages/11/1a/bc4b70cba8b46be8b2c6ca5b8067c4f086f8c90915eb68086ab40ff6243d/geventhttpclient-2.3.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:461e4d9f4caee481788ec95ac64e0a4a087c1964ddbfae9b6f2dc51715ba706c", size = 51991, upload-time = "2025-06-11T13:17:11.049Z" },
    { url = "https://files.pythonhosted.org/packages/03/3f/5ce6e003b3b24f7caf3207285831afd1a4f857ce98ac45e1fb7a6815bd58/geventhttpclient-2.3.4-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:b7e41687c74e8fbe6a665458bbaea0c5a75342a95e2583738364a73bcbf1671b", size = 114982, upload-time = "2025-08-24T12:16:50.76Z" },
    { url = "https://files.pythonhosted.org/packages/60/16/6f9dad141b7c6dd7ee831fbcd72dd02535c57bc1ec3c3282f07e72c31344/geventhttpclient-2.3.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c3ea5da20f4023cf40207ce15f5f4028377ffffdba3adfb60b4c8f34925fce79", size = 115654, upload-time = "2025-08-24T12:16:52.072Z" },
    { url = "https://files.pythonhosted.org/packages/ba/52/9b516a2ff423d8bd64c319e1950a165ceebb552781c5a88c1e94e93e8713/geventhttpclient-2.3.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:91f19a8a6899c27867dbdace9500f337d3e891a610708e86078915f1d779bf53", size = 121672, upload-time = "2025-08-24T12:16:53.361Z" },
    { url = "https://files.pythonhosted.org/packages/b0/f5/8d0f1e998f6d933c251b51ef92d11f7eb5211e3cd579018973a2b455f7c5/geventhttpclient-2.3.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:41f2dcc0805551ea9d49f9392c3b9296505a89b9387417b148655d0d8251b36e", size = 119012, upload-time = "2025-06-11T13:17:11.956Z" },
    { url = "https://files.pythonhosted.org/packages/ea/0e/59e4ab506b3c19fc72e88ca344d150a9028a00c400b1099637100bec26fc/geventhttpclient-2.3.4-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:62f3a29bf242ecca6360d497304900683fd8f42cbf1de8d0546c871819251dad", size = 124565, upload-time = "2025-06-11T13:17:12.896Z" },
    { url = "https://files.pythonhosted.org/packages/39/5d/dcbd34dfcda0c016b4970bd583cb260cc5ebfc35b33d0ec9ccdb2293587a/geventhttpclient-2.3.4-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8714a3f2c093aeda3ffdb14c03571d349cb3ed1b8b461d9f321890659f4a5dbf", size = 115573, upload-time = "2025-06-11T13:17:13.937Z" },
@ -1141,6 +1144,9 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ff/ad/132fddde6e2dca46d6a86316962437acd2bfaeb264db4e0fae83c529eb04/geventhttpclient-2.3.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:be64c5583884c407fc748dedbcb083475d5b138afb23c6bc0836cbad228402cc", size = 71967, upload-time = "2025-06-11T13:17:22.121Z" },
    { url = "https://files.pythonhosted.org/packages/f4/34/5e77d9a31d93409a8519cf573843288565272ae5a016be9c9293f56c50a1/geventhttpclient-2.3.4-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:15b2567137734183efda18e4d6245b18772e648b6a25adea0eba8b3a8b0d17e8", size = 52632, upload-time = "2025-06-11T13:17:23.016Z" },
    { url = "https://files.pythonhosted.org/packages/47/d2/cf0dbc333304700e68cee9347f654b56e8b0f93a341b8b0d027ee96800d6/geventhttpclient-2.3.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a4bca1151b8cd207eef6d5cb3c720c562b2aa7293cf113a68874e235cfa19c31", size = 51980, upload-time = "2025-06-11T13:17:23.933Z" },
    { url = "https://files.pythonhosted.org/packages/27/6e/049e685fc43e2e966c83f24b3187f6a6736103f0fc51118140f4ca1793d4/geventhttpclient-2.3.4-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:8a681433e2f3d4b326d8b36b3e05b787b2c6dd2a5660a4a12527622278bf02ed", size = 114998, upload-time = "2025-08-24T12:16:54.72Z" },
    { url = "https://files.pythonhosted.org/packages/24/13/1d08cf0400bf0fe0bb21e70f3f5fab2130aecef962b4362b7a1eba3cd738/geventhttpclient-2.3.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:736aa8e9609e4da40aeff0dbc02fea69021a034f4ed1e99bf93fc2ca83027b64", size = 115690, upload-time = "2025-08-24T12:16:56.328Z" },
    { url = "https://files.pythonhosted.org/packages/fd/bc/15d22882983cac573859d274783c5b0a95881e553fc312e7b646be432668/geventhttpclient-2.3.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:9d477ae1f5d42e1ee6abbe520a2e9c7f369781c3b8ca111d1f5283c1453bc825", size = 121681, upload-time = "2025-08-24T12:16:58.344Z" },
    { url = "https://files.pythonhosted.org/packages/ec/5b/c0c30ccd9d06c603add3f2d6abd68bd98430ee9730dc5478815759cf07f7/geventhttpclient-2.3.4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9b50d9daded5d36193d67e2fc30e59752262fcbbdc86e8222c7df6b93af0346a", size = 118987, upload-time = "2025-06-11T13:17:24.97Z" },
    { url = "https://files.pythonhosted.org/packages/4f/56/095a46af86476372064128162eccbd2ba4a7721503759890d32ea701d5fd/geventhttpclient-2.3.4-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:fe705e7656bc6982a463a4ed7f9b1db8c78c08323f1d45d0d1d77063efa0ce96", size = 124519, upload-time = "2025-06-11T13:17:25.933Z" },
    { url = "https://files.pythonhosted.org/packages/ae/12/7c9ba94b58f7954a83d33183152ce6bf5bda10c08ebe47d79a314cd33e29/geventhttpclient-2.3.4-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:69668589359db4cbb9efa327dda5735d1e74145e6f0a9ffa50236d15cf904053", size = 115574, upload-time = "2025-06-11T13:17:27.331Z" },
@ -1151,6 +1157,24 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ca/36/9065bb51f261950c42eddf8718e01a9ff344d8082e31317a8b6677be9bd6/geventhttpclient-2.3.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8d1d0db89c1c8f3282eac9a22fda2b4082e1ed62a2107f70e3f1de1872c7919f", size = 112245, upload-time = "2025-06-11T13:17:32.331Z" },
    { url = "https://files.pythonhosted.org/packages/21/7e/08a615bec095c288f997951e42e48b262d43c6081bef33cfbfad96ab9658/geventhttpclient-2.3.4-cp313-cp313-win32.whl", hash = "sha256:4e492b9ab880f98f8a9cc143b96ea72e860946eae8ad5fb2837cede2a8f45154", size = 48360, upload-time = "2025-06-11T13:17:33.349Z" },
    { url = "https://files.pythonhosted.org/packages/ec/19/ef3cb21e7e95b14cfcd21e3ba7fe3d696e171682dfa43ab8c0a727cac601/geventhttpclient-2.3.4-cp313-cp313-win_amd64.whl", hash = "sha256:72575c5b502bf26ececccb905e4e028bb922f542946be701923e726acf305eb6", size = 48956, upload-time = "2025-06-11T13:17:34.956Z" },
    { url = "https://files.pythonhosted.org/packages/06/45/c41697c7d0cae17075ba535fb901985c2873461a9012e536de679525e28d/geventhttpclient-2.3.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:503db5dd0aa94d899c853b37e1853390c48c7035132f39a0bab44cbf95d29101", size = 71999, upload-time = "2025-08-24T12:17:00.419Z" },
    { url = "https://files.pythonhosted.org/packages/5d/f7/1d953cafecf8f1681691977d9da9b647d2e02996c2431fb9b718cfdd3013/geventhttpclient-2.3.4-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:389d3f83316220cfa2010f41401c140215a58ddba548222e7122b2161e25e391", size = 52656, upload-time = "2025-08-24T12:17:01.337Z" },
    { url = "https://files.pythonhosted.org/packages/5c/ca/4bd19040905e911dd8771a4ab74630eadc9ee9072b01ab504332dada2619/geventhttpclient-2.3.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:20c65d404fa42c95f6682831465467dff317004e53602c01f01fbd5ba1e56628", size = 51978, upload-time = "2025-08-24T12:17:02.282Z" },
    { url = "https://files.pythonhosted.org/packages/11/01/c457257ee41236347dac027e63289fa3f92f164779458bd244b376122bf6/geventhttpclient-2.3.4-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:2574ee47ff6f379e9ef124e2355b23060b81629f1866013aa975ba35df0ed60b", size = 115033, upload-time = "2025-08-24T12:17:03.272Z" },
    { url = "https://files.pythonhosted.org/packages/cc/c1/ef3ddc24b402eb3caa19dacbcd08d7129302a53d9b9109c84af1ea74e31a/geventhttpclient-2.3.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fecf1b735591fb21ea124a374c207104a491ad0d772709845a10d5faa07fa833", size = 115762, upload-time = "2025-08-24T12:17:04.288Z" },
    { url = "https://files.pythonhosted.org/packages/a9/97/8dca246262e9a1ebd639120151db00e34b7d10f60bdbca8481878b91801a/geventhttpclient-2.3.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:44e9ba810c28f9635e5c4c9cf98fc6470bad5a3620d8045d08693f7489493a3c", size = 121757, upload-time = "2025-08-24T12:17:05.273Z" },
    { url = "https://files.pythonhosted.org/packages/10/7b/41bff3cbdeff3d06d45df3c61fa39cd25e60fa9d21c709ec6aeb58e9b58f/geventhttpclient-2.3.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:501d5c69adecd5eaee3c22302006f6c16aa114139640873b72732aa17dab9ee7", size = 111747, upload-time = "2025-08-24T12:17:06.585Z" },
    { url = "https://files.pythonhosted.org/packages/64/e6/3732132fda94082ec8793e3ae0d4d7fff6c1cb8e358e9664d1589499f4b1/geventhttpclient-2.3.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:709f557138fb84ed32703d42da68f786459dab77ff2c23524538f2e26878d154", size = 118487, upload-time = "2025-08-24T12:17:07.816Z" },
    { url = "https://files.pythonhosted.org/packages/93/29/d48d119dee6c42e066330860186df56a80d4e76d2821a6c706ead49006d7/geventhttpclient-2.3.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:b8b86815a30e026c6677b89a5a21ba5fd7b69accf8f0e9b83bac123e4e9f3b31", size = 112198, upload-time = "2025-08-24T12:17:08.867Z" },
    { url = "https://files.pythonhosted.org/packages/56/48/556adff8de1bd3469b58394f441733bb3c76cb22c2600cf2ee753e73d47f/geventhttpclient-2.3.4-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:4371b1b1afc072ad2b0ff5a8929d73ffd86d582908d3e9e8d7911dc027b1b3a6", size = 72354, upload-time = "2025-08-24T12:17:10.671Z" },
    { url = "https://files.pythonhosted.org/packages/7c/77/f1b32a91350382978cde0ddfee4089b94e006eb0f3e7297196d9d5451217/geventhttpclient-2.3.4-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:6409fcda1f40d66eab48afc218b4c41e45a95c173738d10c50bc69c7de4261b9", size = 52835, upload-time = "2025-08-24T12:17:12.164Z" },
    { url = "https://files.pythonhosted.org/packages/d3/06/124f95556e0d5b4c417ec01fc30d91a3e4fe4524a44d2f629a1b1a721984/geventhttpclient-2.3.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:142870c2efb6bd0a593dcd75b83defb58aeb72ceaec4c23186785790bd44a311", size = 52165, upload-time = "2025-08-24T12:17:13.465Z" },
    { url = "https://files.pythonhosted.org/packages/76/9c/0850256e4461b0a90f2cf5c8156ea8f97e93a826aa76d7be70c9c6d4ba0f/geventhttpclient-2.3.4-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:3a74f7b926badb3b1d47ea987779cb83523a406e89203070b58b20cf95d6f535", size = 117929, upload-time = "2025-08-24T12:17:14.477Z" },
    { url = "https://files.pythonhosted.org/packages/ca/55/3b54d0c0859efac95ba2649aeb9079a3523cdd7e691549ead2862907dc7d/geventhttpclient-2.3.4-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:2a8cde016e5ea6eb289c039b6af8dcef6c3ee77f5d753e57b48fe2555cdeacca", size = 119584, upload-time = "2025-08-24T12:17:15.709Z" },
    { url = "https://files.pythonhosted.org/packages/84/df/84ce132a0eb2b6d4f86e68a828e3118419cb0411cae101e4bad256c3f321/geventhttpclient-2.3.4-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:5aa16f2939a508667093b18e47919376f7db9a9acbe858343173c5a58e347869", size = 125388, upload-time = "2025-08-24T12:17:16.915Z" },
    { url = "https://files.pythonhosted.org/packages/e8/4f/8156b9f6e25e4f18a60149bd2925f56f1ed7a1f8d520acb5a803536adadd/geventhttpclient-2.3.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:ffe87eb7f1956357c2144a56814b5ffc927cbb8932f143a0351c78b93129ebbc", size = 115214, upload-time = "2025-08-24T12:17:17.945Z" },
    { url = "https://files.pythonhosted.org/packages/f6/5a/b01657605c16ac4555b70339628a33fc7ca41ace58da167637ef72ad0a8e/geventhttpclient-2.3.4-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:5ee758e37215da9519cea53105b2a078d8bc0a32603eef2a1f9ab551e3767dee", size = 121862, upload-time = "2025-08-24T12:17:18.97Z" },
    { url = "https://files.pythonhosted.org/packages/84/ca/c4e36a9b1bcce9958d8886aa4f7b262c8e9a7c43a284f2d79abfc9ba715d/geventhttpclient-2.3.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:416cc70adb3d34759e782d2e120b4432752399b85ac9758932ecd12274a104c3", size = 114999, upload-time = "2025-08-24T12:17:19.978Z" },
 ]
 [[package]]
@ -1743,7 +1767,7 @@ wheels = [
 [[package]]
 name = "llama-stack"
-version = "0.2.18"
+version = "0.2.19"
 source = { editable = "." }
 dependencies = [
    { name = "aiohttp" },
@ -1881,8 +1905,8 @@ requires-dist = [
    { name = "jinja2", specifier = ">=3.1.6" },
    { name = "jsonschema" },
    { name = "llama-api-client", specifier = ">=0.1.2" },
-    { name = "llama-stack-client", specifier = ">=0.2.18" },
+    { name = "llama-stack-client", specifier = ">=0.2.19" },
-    { name = "llama-stack-client", marker = "extra == 'ui'", specifier = ">=0.2.18" },
+    { name = "llama-stack-client", marker = "extra == 'ui'", specifier = ">=0.2.19" },
    { name = "openai", specifier = ">=1.99.6,<1.100.0" },
    { name = "opentelemetry-exporter-otlp-proto-http", specifier = ">=1.30.0" },
    { name = "opentelemetry-sdk", specifier = ">=1.30.0" },
@ -1989,7 +2013,7 @@ unit = [
 [[package]]
 name = "llama-stack-client"
-version = "0.2.18"
+version = "0.2.19"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "anyio" },
@ -2008,9 +2032,9 @@ dependencies = [
    { name = "tqdm" },
    { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/69/da/5e5a745495f8a2b8ef24fc4d01fe9031aa2277c36447cb22192ec8c8cc1e/llama_stack_client-0.2.18.tar.gz", hash = "sha256:860c885c9e549445178ac55cc9422e6e2a91215ac7aff5aaccfb42f3ce07e79e", size = 277284, upload-time = "2025-08-19T22:12:09.106Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/14/e4/72683c10188ae93e97551ab6eeac725e46f13ec215618532505a7d91bf2b/llama_stack_client-0.2.19.tar.gz", hash = "sha256:6c857e528b83af7821120002ebe4d3db072fd9f7bf867a152a34c70fe606833f", size = 318325, upload-time = "2025-08-26T21:54:20.592Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/0a/e4/e97f8fdd8a07aa1efc7f7e37b5657d84357b664bf70dd1885a437edc0699/llama_stack_client-0.2.18-py3-none-any.whl", hash = "sha256:90f827d5476f7fc15fd993f1863af6a6e72bd064646bf6a99435eb43a1327f70", size = 367586, upload-time = "2025-08-19T22:12:07.899Z" },
+    { url = "https://files.pythonhosted.org/packages/51/51/c8dde9fae58193a539eac700502876d8edde8be354c2784ff7b707a47432/llama_stack_client-0.2.19-py3-none-any.whl", hash = "sha256:478565a54541ca03ca9f8fe2019f4136f93ab6afe9591bdd44bc6dde6ddddbd9", size = 369905, upload-time = "2025-08-26T21:54:18.929Z" },
 ]
 [[package]]
@ -4713,9 +4737,9 @@ dependencies = [
    { name = "typing-extensions", marker = "sys_platform == 'darwin'" },
 ]
 wheels = [
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp312-none-macosx_11_0_arm64.whl", hash = "sha256:a47b7986bee3f61ad217d8a8ce24605809ab425baf349f97de758815edd2ef54" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp312-none-macosx_11_0_arm64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:fbe2e149c5174ef90d29a5f84a554dfaf28e003cb4f61fa2c8c024c17ec7ca58" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:057efd30a6778d2ee5e2374cd63a63f63311aa6f33321e627c655df60abdd390" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl" },
 ]
 [[package]]
@ -4738,19 +4762,19 @@ dependencies = [
    { name = "typing-extensions", marker = "sys_platform != 'darwin'" },
 ]
 wheels = [
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-linux_s390x.whl", hash = "sha256:0e34e276722ab7dd0dffa9e12fe2135a9b34a0e300c456ed7ad6430229404eb5" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-linux_s390x.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:610f600c102386e581327d5efc18c0d6edecb9820b4140d26163354a99cd800d" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_aarch64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:cb9a8ba8137ab24e36bf1742cb79a1294bd374db570f09fc15a5e1318160db4e" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_amd64.whl", hash = "sha256:2be20b2c05a0cce10430cc25f32b689259640d273232b2de357c35729132256d" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_amd64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_arm64.whl", hash = "sha256:99fc421a5d234580e45957a7b02effbf3e1c884a5dd077afc85352c77bf41434" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_arm64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-linux_s390x.whl", hash = "sha256:8b5882276633cf91fe3d2d7246c743b94d44a7e660b27f1308007fdb1bb89f7d" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-linux_s390x.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:a5064b5e23772c8d164068cc7c12e01a75faf7b948ecd95a0d4007d7487e5f25" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_aarch64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:8f81dedb4c6076ec325acc3b47525f9c550e5284a18eae1d9061c543f7b6e7de" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_amd64.whl", hash = "sha256:e1ee1b2346ade3ea90306dfbec7e8ff17bc220d344109d189ae09078333b0856" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_amd64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_arm64.whl", hash = "sha256:64c187345509f2b1bb334feed4666e2c781ca381874bde589182f81247e61f88" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_arm64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:af81283ac671f434b1b25c95ba295f270e72db1fad48831eb5e4748ff9840041" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_aarch64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:a9dbb6f64f63258bc811e2c0c99640a81e5af93c531ad96e95c5ec777ea46dab" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_x86_64.whl" },
-    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-win_amd64.whl", hash = "sha256:6d93a7165419bc4b2b907e859ccab0dea5deeab261448ae9a5ec5431f14c0e64" },
+    { url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-win_amd64.whl" },
 ]
 [[package]]
`@ -35,3 +35,6 @@ device: cpu`

	```	```

		`[Find more detailed information here!](huggingface.md)`
`@ -22,3 +22,4 @@ checkpoint_format: meta`

	```	```

		`[Find more detailed information here!](torchtune.md)`
`@ -6,4 +6,4 @@ While there is a lot of flexibility to mix-and-match providers, often users will`

	Locally Hosted Distro: You may want to run Llama Stack on your own hardware. Typically though, you still need to use Inference via an external service. You can use providers like HuggingFace TGI, Fireworks, Together, etc. for this purpose. Or you may have access to GPUs and can run a [vLLM](https://github.com/vllm-project/vllm) or [NVIDIA NIM](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere&q=llama) instance. If you "just" have a regular desktop machine, you can use [Ollama](https://ollama.com/) for inference. To provide convenient quick access to these options, we provide a number of such pre-configured locally-hosted Distros.	Locally Hosted Distro: You may want to run Llama Stack on your own hardware. Typically though, you still need to use Inference via an external service. You can use providers like HuggingFace TGI, Fireworks, Together, etc. for this purpose. Or you may have access to GPUs and can run a [vLLM](https://github.com/vllm-project/vllm) or [NVIDIA NIM](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere&q=llama) instance. If you "just" have a regular desktop machine, you can use [Ollama](https://ollama.com/) for inference. To provide convenient quick access to these options, we provide a number of such pre-configured locally-hosted Distros.

	`On-device Distro: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/ios_sdk.html) and [Android](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/android_sdk.html)`	`On-device Distro: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](../distributions/ondevice_distro/ios_sdk.md) and [Android](../distributions/ondevice_distro/android_sdk.md)`