chore(misc): make tests and starter faster (#3042)

A bunch of miscellaneous cleanup focusing on tests, but ended up speeding up starter distro substantially. - Pulled llama stack client init for tests into `pytest_sessionstart` so it does not clobber output - Profiling of that told me where we were doing lots of heavy imports for starter, so lazied them - starter now starts 20seconds+ faster on my Mac - A few other smallish refactors for `compat_client`
2025-12-03 18:00:36 +00:00 · 2025-08-05 14:55:05 -07:00 · 2025-08-05 14:55:05 -07:00 · 7f834339ba
commit 7f834339ba
parent e12524af85
45 changed files with 2897 additions and 1688 deletions
--- a/docs/source/providers/files/inline_localfs.md
+++ b/docs/source/providers/files/inline_localfs.md
@ -8,7 +8,7 @@ Local filesystem-based file storage provider for managing files and documents lo
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `storage_dir` | `<class 'str'>` | No | PydanticUndefined | Directory to store uploaded files |
+| `storage_dir` | `<class 'str'>` | No |  | Directory to store uploaded files |
 | `metadata_store` | `utils.sqlstore.sqlstore.SqliteSqlStoreConfig \| utils.sqlstore.sqlstore.PostgresSqlStoreConfig` | No | sqlite | SQL store configuration for file metadata |
 | `ttl_secs` | `<class 'int'>` | No | 31536000 |  |
--- a/docs/source/providers/inference/remote_hf_endpoint.md
+++ b/docs/source/providers/inference/remote_hf_endpoint.md
@ -8,7 +8,7 @@ HuggingFace Inference Endpoints provider for dedicated model serving.
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `endpoint_name` | `<class 'str'>` | No | PydanticUndefined | The name of the Hugging Face Inference Endpoint in the format of '{namespace}/{endpoint_name}' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
+| `endpoint_name` | `<class 'str'>` | No |  | The name of the Hugging Face Inference Endpoint in the format of '{namespace}/{endpoint_name}' (e.g. 'my-cool-org/meta-llama-3-1-8b-instruct-rce'). Namespace is optional and will default to the user account if not provided. |
 | `api_token` | `pydantic.types.SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |
 ## Sample Configuration
--- a/docs/source/providers/inference/remote_hf_serverless.md
+++ b/docs/source/providers/inference/remote_hf_serverless.md
@ -8,7 +8,7 @@ HuggingFace Inference API serverless provider for on-demand model inference.
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `huggingface_repo` | `<class 'str'>` | No | PydanticUndefined | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
+| `huggingface_repo` | `<class 'str'>` | No |  | The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct') |
 | `api_token` | `pydantic.types.SecretStr \| None` | No |  | Your Hugging Face user access token (will default to locally saved token if not provided) |
 ## Sample Configuration
--- a/docs/source/providers/inference/remote_tgi.md
+++ b/docs/source/providers/inference/remote_tgi.md
@ -8,7 +8,7 @@ Text Generation Inference (TGI) provider for HuggingFace model serving.
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `url` | `<class 'str'>` | No | PydanticUndefined | The URL for the TGI serving endpoint |
+| `url` | `<class 'str'>` | No |  | The URL for the TGI serving endpoint |
 ## Sample Configuration
--- a/docs/source/providers/post_training/inline_huggingface.md
+++ b/docs/source/providers/post_training/inline_huggingface.md
@ -27,7 +27,7 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin
 | `dpo_beta` | `<class 'float'>` | No | 0.1 |  |
 | `use_reference_model` | `<class 'bool'>` | No | True |  |
 | `dpo_loss_type` | `Literal['sigmoid', 'hinge', 'ipo', 'kto_pair'` | No | sigmoid |  |
-| `dpo_output_dir` | `<class 'str'>` | No | ./checkpoints/dpo |  |
+| `dpo_output_dir` | `<class 'str'>` | No |  |  |
 ## Sample Configuration
@ -35,6 +35,7 @@ HuggingFace-based post-training provider for fine-tuning models using the Huggin
 checkpoint_format: huggingface
 distributed_backend: null
 device: cpu
 dpo_output_dir: ~/.llama/dummy/dpo_output
 ```
--- a/docs/source/providers/vector_io/inline_chromadb.md
+++ b/docs/source/providers/vector_io/inline_chromadb.md
@ -41,7 +41,7 @@ See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introducti
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `db_path` | `<class 'str'>` | No | PydanticUndefined |  |
+| `db_path` | `<class 'str'>` | No |  |  |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |
 ## Sample Configuration
--- a/docs/source/providers/vector_io/inline_milvus.md
+++ b/docs/source/providers/vector_io/inline_milvus.md
@ -10,7 +10,7 @@ Please refer to the remote provider documentation.
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `db_path` | `<class 'str'>` | No | PydanticUndefined |  |
+| `db_path` | `<class 'str'>` | No |  |  |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend (SQLite only for now) |
 | `consistency_level` | `<class 'str'>` | No | Strong | The consistency level of the Milvus server |
--- a/docs/source/providers/vector_io/inline_qdrant.md
+++ b/docs/source/providers/vector_io/inline_qdrant.md
@ -50,7 +50,7 @@ See the [Qdrant documentation](https://qdrant.tech/documentation/) for more deta
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `path` | `<class 'str'>` | No | PydanticUndefined |  |
+| `path` | `<class 'str'>` | No |  |  |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite |  |
 ## Sample Configuration
--- a/docs/source/providers/vector_io/inline_sqlite-vec.md
+++ b/docs/source/providers/vector_io/inline_sqlite-vec.md
@ -205,7 +205,7 @@ See [sqlite-vec's GitHub repo](https://github.com/asg017/sqlite-vec/tree/main) f
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `db_path` | `<class 'str'>` | No | PydanticUndefined | Path to the SQLite database file |
+| `db_path` | `<class 'str'>` | No |  | Path to the SQLite database file |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend (SQLite only for now) |
 ## Sample Configuration
--- a/docs/source/providers/vector_io/inline_sqlite_vec.md
+++ b/docs/source/providers/vector_io/inline_sqlite_vec.md
@ -10,7 +10,7 @@ Please refer to the sqlite-vec provider documentation.
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `db_path` | `<class 'str'>` | No | PydanticUndefined | Path to the SQLite database file |
+| `db_path` | `<class 'str'>` | No |  | Path to the SQLite database file |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend (SQLite only for now) |
 ## Sample Configuration
--- a/docs/source/providers/vector_io/remote_chromadb.md
+++ b/docs/source/providers/vector_io/remote_chromadb.md
@ -40,7 +40,7 @@ See [Chroma's documentation](https://docs.trychroma.com/docs/overview/introducti
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `url` | `str \| None` | No | PydanticUndefined |  |
+| `url` | `str \| None` | No |  |  |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |
 ## Sample Configuration
--- a/docs/source/providers/vector_io/remote_milvus.md
+++ b/docs/source/providers/vector_io/remote_milvus.md
@ -111,8 +111,8 @@ For more details on TLS configuration, refer to the [TLS setup guide](https://mi
 | Field | Type | Required | Default | Description |
 |-------|------|----------|---------|-------------|
-| `uri` | `<class 'str'>` | No | PydanticUndefined | The URI of the Milvus server |
+| `uri` | `<class 'str'>` | No |  | The URI of the Milvus server |
-| `token` | `str \| None` | No | PydanticUndefined | The token of the Milvus server |
+| `token` | `str \| None` | No |  | The token of the Milvus server |
 | `consistency_level` | `<class 'str'>` | No | Strong | The consistency level of the Milvus server |
 | `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |
 | `config` | `dict` | No | {} | This configuration allows additional fields to be passed through to the underlying Milvus client. See the [Milvus](https://milvus.io/docs/install-overview.md) documentation for more details about Milvus in general. |
--- a/llama_stack/distributions/ci-tests/run.yaml
+++ b/llama_stack/distributions/ci-tests/run.yaml
@ -154,6 +154,7 @@ providers:
      checkpoint_format: huggingface
      distributed_backend: null
      device: cpu
      dpo_output_dir: ~/.llama/distributions/ci-tests/dpo_output
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/distributions/starter/run.yaml
+++ b/llama_stack/distributions/starter/run.yaml
@ -154,6 +154,7 @@ providers:
      checkpoint_format: huggingface
      distributed_backend: null
      device: cpu
      dpo_output_dir: ~/.llama/distributions/starter/dpo_output
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
--- a/llama_stack/providers/inline/datasetio/localfs/datasetio.py
+++ b/llama_stack/providers/inline/datasetio/localfs/datasetio.py
@ -5,8 +5,6 @@
 # the root directory of this source tree.
 from typing import Any
 import pandas
 from llama_stack.apis.common.responses import PaginatedResponse
 from llama_stack.apis.datasetio import DatasetIO
 from llama_stack.apis.datasets import Dataset
@ -44,6 +42,8 @@ class PandasDataframeDataset:
        if self.dataset_def.source.type == "uri":
            self.df = await get_dataframe_from_uri(self.dataset_def.source.uri)
        elif self.dataset_def.source.type == "rows":
            import pandas
            self.df = pandas.DataFrame(self.dataset_def.source.rows)
        else:
            raise ValueError(f"Unsupported dataset source type: {self.dataset_def.source.type}")
@ -103,6 +103,8 @@ class LocalFSDatasetIOImpl(DatasetIO, DatasetsProtocolPrivate):
        return paginate_records(records, start_index, limit)
    async def append_rows(self, dataset_id: str, rows: list[dict[str, Any]]) -> None:
        import pandas
        dataset_def = self.dataset_infos[dataset_id]
        dataset_impl = PandasDataframeDataset(dataset_def)
        await dataset_impl.load()
--- a/llama_stack/providers/inline/post_training/huggingface/config.py
+++ b/llama_stack/providers/inline/post_training/huggingface/config.py
@ -71,8 +71,13 @@ class HuggingFacePostTrainingConfig(BaseModel):
    dpo_beta: float = 0.1
    use_reference_model: bool = True
    dpo_loss_type: Literal["sigmoid", "hinge", "ipo", "kto_pair"] = "sigmoid"
-    dpo_output_dir: str = "./checkpoints/dpo"
+    dpo_output_dir: str
    @classmethod
    def sample_run_config(cls, __distro_dir__: str, **kwargs: Any) -> dict[str, Any]:
-        return {"checkpoint_format": "huggingface", "distributed_backend": None, "device": "cpu"}
+        return {
            "checkpoint_format": "huggingface",
            "distributed_backend": None,
            "device": "cpu",
            "dpo_output_dir": __distro_dir__ + "/dpo_output",
        }
--- a/llama_stack/providers/inline/post_training/huggingface/post_training.py
+++ b/llama_stack/providers/inline/post_training/huggingface/post_training.py
@ -22,15 +22,8 @@ from llama_stack.apis.post_training import (
 from llama_stack.providers.inline.post_training.huggingface.config import (
    HuggingFacePostTrainingConfig,
 )
 from llama_stack.providers.inline.post_training.huggingface.recipes.finetune_single_device import (
    HFFinetuningSingleDevice,
 )
 from llama_stack.providers.inline.post_training.huggingface.recipes.finetune_single_device_dpo import (
    HFDPOAlignmentSingleDevice,
 )
 from llama_stack.providers.utils.scheduler import JobArtifact, Scheduler
 from llama_stack.providers.utils.scheduler import JobStatus as SchedulerJobStatus
 from llama_stack.schema_utils import webmethod
 class TrainingArtifactType(Enum):
@ -85,6 +78,10 @@ class HuggingFacePostTrainingImpl:
        algorithm_config: AlgorithmConfig | None = None,
    ) -> PostTrainingJob:
        async def handler(on_log_message_cb, on_status_change_cb, on_artifact_collected_cb):
            from llama_stack.providers.inline.post_training.huggingface.recipes.finetune_single_device import (
                HFFinetuningSingleDevice,
            )
            on_log_message_cb("Starting HF finetuning")
            recipe = HFFinetuningSingleDevice(
@ -124,6 +121,10 @@ class HuggingFacePostTrainingImpl:
        logger_config: dict[str, Any],
    ) -> PostTrainingJob:
        async def handler(on_log_message_cb, on_status_change_cb, on_artifact_collected_cb):
            from llama_stack.providers.inline.post_training.huggingface.recipes.finetune_single_device_dpo import (
                HFDPOAlignmentSingleDevice,
            )
            on_log_message_cb("Starting HF DPO alignment")
            recipe = HFDPOAlignmentSingleDevice(
@ -168,7 +169,6 @@ class HuggingFacePostTrainingImpl:
        data = cls._get_artifacts_metadata_by_type(job, TrainingArtifactType.RESOURCES_STATS.value)
        return data[0] if data else None
    @webmethod(route="/post-training/job/status")
    async def get_training_job_status(self, job_uuid: str) -> PostTrainingJobStatusResponse | None:
        job = self._scheduler.get_job(job_uuid)
@ -195,16 +195,13 @@ class HuggingFacePostTrainingImpl:
            resources_allocated=self._get_resources_allocated(job),
        )
    @webmethod(route="/post-training/job/cancel")
    async def cancel_training_job(self, job_uuid: str) -> None:
        self._scheduler.cancel(job_uuid)
    @webmethod(route="/post-training/job/artifacts")
    async def get_training_job_artifacts(self, job_uuid: str) -> PostTrainingJobArtifactsResponse | None:
        job = self._scheduler.get_job(job_uuid)
        return PostTrainingJobArtifactsResponse(job_uuid=job_uuid, checkpoints=self._get_checkpoints(job))
    @webmethod(route="/post-training/jobs", method="GET")
    async def get_training_jobs(self) -> ListPostTrainingJobsResponse:
        return ListPostTrainingJobsResponse(
            data=[PostTrainingJob(job_uuid=job.id) for job in self._scheduler.get_jobs()]
--- a/llama_stack/providers/inline/post_training/torchtune/post_training.py
+++ b/llama_stack/providers/inline/post_training/torchtune/post_training.py
@ -23,12 +23,8 @@ from llama_stack.apis.post_training import (
 from llama_stack.providers.inline.post_training.torchtune.config import (
    TorchtunePostTrainingConfig,
 )
 from llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device import (
    LoraFinetuningSingleDevice,
 )
 from llama_stack.providers.utils.scheduler import JobArtifact, Scheduler
 from llama_stack.providers.utils.scheduler import JobStatus as SchedulerJobStatus
 from llama_stack.schema_utils import webmethod
 class TrainingArtifactType(Enum):
@ -84,6 +80,10 @@ class TorchtunePostTrainingImpl:
        if isinstance(algorithm_config, LoraFinetuningConfig):
            async def handler(on_log_message_cb, on_status_change_cb, on_artifact_collected_cb):
                from llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device import (
                    LoraFinetuningSingleDevice,
                )
                on_log_message_cb("Starting Lora finetuning")
                recipe = LoraFinetuningSingleDevice(
@ -144,7 +144,6 @@ class TorchtunePostTrainingImpl:
        data = cls._get_artifacts_metadata_by_type(job, TrainingArtifactType.RESOURCES_STATS.value)
        return data[0] if data else None
    @webmethod(route="/post-training/job/status")
    async def get_training_job_status(self, job_uuid: str) -> PostTrainingJobStatusResponse | None:
        job = self._scheduler.get_job(job_uuid)
@ -171,11 +170,9 @@ class TorchtunePostTrainingImpl:
            resources_allocated=self._get_resources_allocated(job),
        )
    @webmethod(route="/post-training/job/cancel")
    async def cancel_training_job(self, job_uuid: str) -> None:
        self._scheduler.cancel(job_uuid)
    @webmethod(route="/post-training/job/artifacts")
    async def get_training_job_artifacts(self, job_uuid: str) -> PostTrainingJobArtifactsResponse | None:
        job = self._scheduler.get_job(job_uuid)
        return PostTrainingJobArtifactsResponse(job_uuid=job_uuid, checkpoints=self._get_checkpoints(job))
--- a/llama_stack/providers/remote/datasetio/huggingface/huggingface.py
+++ b/llama_stack/providers/remote/datasetio/huggingface/huggingface.py
@ -6,8 +6,6 @@
 from typing import Any
 from urllib.parse import parse_qs, urlparse
 import datasets as hf_datasets
 from llama_stack.apis.common.responses import PaginatedResponse
 from llama_stack.apis.datasetio import DatasetIO
 from llama_stack.apis.datasets import Dataset
@ -73,6 +71,8 @@ class HuggingfaceDatasetIOImpl(DatasetIO, DatasetsProtocolPrivate):
        start_index: int | None = None,
        limit: int | None = None,
    ) -> PaginatedResponse:
        import datasets as hf_datasets
        dataset_def = self.dataset_infos[dataset_id]
        path, params = parse_hf_params(dataset_def)
        loaded_dataset = hf_datasets.load_dataset(path, **params)
@ -81,6 +81,8 @@ class HuggingfaceDatasetIOImpl(DatasetIO, DatasetsProtocolPrivate):
        return paginate_records(records, start_index, limit)
    async def append_rows(self, dataset_id: str, rows: list[dict[str, Any]]) -> None:
        import datasets as hf_datasets
        dataset_def = self.dataset_infos[dataset_id]
        path, params = parse_hf_params(dataset_def)
        loaded_dataset = hf_datasets.load_dataset(path, **params)
--- a/llama_stack/providers/remote/inference/ollama/ollama.py
+++ b/llama_stack/providers/remote/inference/ollama/ollama.py
@ -112,7 +112,8 @@ class OllamaInferenceAdapter(
    @property
    def openai_client(self) -> AsyncOpenAI:
        if self._openai_client is None:
-            self._openai_client = AsyncOpenAI(base_url=f"{self.config.url}/v1", api_key="ollama")
+            url = self.config.url.rstrip("/")
            self._openai_client = AsyncOpenAI(base_url=f"{url}/v1", api_key="ollama")
        return self._openai_client
    async def initialize(self) -> None:
--- a/llama_stack/providers/utils/datasetio/url_utils.py
+++ b/llama_stack/providers/utils/datasetio/url_utils.py
@ -9,12 +9,12 @@ import base64
 import io
 from urllib.parse import unquote
 import pandas
 from llama_stack.providers.utils.memory.vector_store import parse_data_url
 async def get_dataframe_from_uri(uri: str):
    import pandas
    df = None
    if uri.endswith(".csv"):
        # Moving to its own thread to avoid io from blocking the eventloop
--- a/scripts/provider_codegen.py
+++ b/scripts/provider_codegen.py
@ -10,6 +10,7 @@ import sys
 from pathlib import Path
 from typing import Any
 from pydantic_core import PydanticUndefined
 from rich.progress import Progress, SpinnerColumn, TextColumn
 from llama_stack.core.distribution import get_provider_registry
@ -59,6 +60,8 @@ def get_config_class_info(config_class_path: str) -> dict[str, Any]:
        if hasattr(config_class, "model_fields"):
            for field_name, field in config_class.model_fields.items():
                field_type = str(field.annotation) if field.annotation else "Any"
                # this string replace is ridiculous
                field_type = field_type.replace("typing.", "").replace("Optional[", "").replace("]", "")
                field_type = field_type.replace("Annotated[", "").replace("FieldInfo(", "").replace(")", "")
                field_type = field_type.replace("llama_stack.apis.inference.inference.", "")
@ -77,7 +80,7 @@ def get_config_class_info(config_class_path: str) -> dict[str, Any]:
                                default_value = f"~/.llama/{path_part}"
                    except Exception:
                        default_value = ""
-                elif field.default is None:
+                elif field.default is None or field.default is PydanticUndefined:
                    default_value = ""
                field_info = {
--- a/tests/integration/agents/test_openai_responses.py
+++ b/tests/integration/agents/test_openai_responses.py
@ -9,12 +9,6 @@ from openai import BadRequestError, OpenAI
 from llama_stack.core.library_client import LlamaStackAsLibraryClient
@pytest.fixture
 def openai_client(client_with_models):
    base_url = f"{client_with_models.base_url}/v1/openai/v1"
    return OpenAI(base_url=base_url, api_key="bar")
@pytest.mark.parametrize(
    "stream",
    [
@ -41,15 +35,14 @@ def openai_client(client_with_models):
        ],
    ],
 )
-def test_responses_store(openai_client, client_with_models, text_model_id, stream, tools):
+def test_responses_store(compat_client, text_model_id, stream, tools):
-    if isinstance(client_with_models, LlamaStackAsLibraryClient):
+    if not isinstance(compat_client, OpenAI):
-        pytest.skip("OpenAI responses are not supported when testing with library client yet.")
+        pytest.skip("OpenAI client is required until responses.delete() exists in llama-stack-client")
    client = openai_client
    message = "What's the weather in Tokyo?" + (
        " YOU MUST USE THE get_weather function to get the weather." if tools else ""
    )
-    response = client.responses.create(
+    response = compat_client.responses.create(
        model=text_model_id,
        input=[
            {
@ -78,14 +71,8 @@ def test_responses_store(openai_client, client_with_models, text_model_id, strea
        if output_type == "message":
            content = response.output[0].content[0].text
    # list responses - use the underlying HTTP client for endpoints not in SDK
    list_response = client._client.get("/responses")
    assert list_response.status_code == 200
    data = list_response.json()["data"]
    assert response_id in [r["id"] for r in data]
    # test retrieve response
-    retrieved_response = client.responses.retrieve(response_id)
+    retrieved_response = compat_client.responses.retrieve(response_id)
    assert retrieved_response.id == response_id
    assert retrieved_response.model == text_model_id
    assert retrieved_response.output[0].type == output_type, retrieved_response
@ -93,23 +80,19 @@ def test_responses_store(openai_client, client_with_models, text_model_id, strea
        assert retrieved_response.output[0].content[0].text == content
    # Delete the response
-    delete_response = client.responses.delete(response_id)
+    delete_response = compat_client.responses.delete(response_id)
    assert delete_response is None
    with pytest.raises(BadRequestError):
-        client.responses.retrieve(response_id)
+        compat_client.responses.retrieve(response_id)
-def test_list_response_input_items(openai_client, client_with_models, text_model_id):
+def test_list_response_input_items(compat_client, text_model_id):
    """Test the new list_openai_response_input_items endpoint."""
    if isinstance(client_with_models, LlamaStackAsLibraryClient):
        pytest.skip("OpenAI responses are not supported when testing with library client yet.")
    client = openai_client
    message = "What is the capital of France?"
    # Create a response first
-    response = client.responses.create(
+    response = compat_client.responses.create(
        model=text_model_id,
        input=[
            {
@ -123,7 +106,7 @@ def test_list_response_input_items(openai_client, client_with_models, text_model
    response_id = response.id
    # Test the new list input items endpoint
-    input_items_response = client.responses.input_items.list(response_id=response_id)
+    input_items_response = compat_client.responses.input_items.list(response_id=response_id)
    # Verify the structure follows OpenAI API spec
    assert input_items_response.object == "list"
--- a/tests/integration/conftest.py
+++ b/tests/integration/conftest.py
@ -9,12 +9,15 @@ import os
 import platform
 import textwrap
 import time
 import warnings
 import pytest
 from dotenv import load_dotenv
 from llama_stack.log import get_logger
 from .fixtures.common import instantiate_llama_stack_client
 logger = get_logger(__name__, category="tests")
@ -27,6 +30,20 @@ def pytest_runtest_makereport(item, call):
        item.was_xfail = getattr(report, "wasxfail", False)
 def pytest_sessionstart(session):
    # stop macOS from complaining about duplicate OpenMP libraries
    os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
    # pull client instantiation to session start so all the complex logs during initialization
    # don't clobber the test one-liner outputs
    print("instantiating llama_stack_client")
    start_time = time.time()
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", category=DeprecationWarning)
        session._llama_stack_client = instantiate_llama_stack_client(session)
    print(f"llama_stack_client instantiated in {time.time() - start_time:.3f}s")
 def pytest_runtest_teardown(item):
    # Check if the test actually ran and passed or failed, but was not skipped or an expected failure (xfail)
    outcome = getattr(item, "execution_outcome", None)
--- a/tests/integration/fixtures/common.py
+++ b/tests/integration/fixtures/common.py
@ -82,8 +82,7 @@ def wait_for_server_ready(base_url: str, timeout: int = 30, process: subprocess.
    return False
-@pytest.fixture(scope="session")
+def get_provider_data():
 def provider_data():
    # TODO: this needs to be generalized so each provider can have a sample provider data just
    # like sample run config on which we can do replace_env_vars()
    keymap = {
@ -178,8 +177,14 @@ def skip_if_no_model(request):
@pytest.fixture(scope="session")
-def llama_stack_client(request, provider_data):
+def llama_stack_client(request):
-    config = request.config.getoption("--stack-config")
+    client = request.session._llama_stack_client
    assert client is not None, "llama_stack_client not found in session cache"
    return client
 def instantiate_llama_stack_client(session):
    config = session.config.getoption("--stack-config")
    if not config:
        config = get_env_or_fail("LLAMA_STACK_CONFIG")
@ -212,13 +217,13 @@ def llama_stack_client(request, provider_data):
            print(f"Server is ready at {base_url}")
            # Store process for potential cleanup (pytest will handle termination at session end)
-            request.session._llama_stack_server_process = server_process
+            session._llama_stack_server_process = server_process
        else:
            print(f"Port {port} is already in use, assuming server is already running...")
        return LlamaStackClient(
            base_url=base_url,
-            provider_data=provider_data,
+            provider_data=get_provider_data(),
            timeout=int(os.environ.get("LLAMA_STACK_CLIENT_TIMEOUT", "30")),
        )
@ -228,7 +233,7 @@ def llama_stack_client(request, provider_data):
        if parsed_url.scheme and parsed_url.netloc:
            return LlamaStackClient(
                base_url=config,
-                provider_data=provider_data,
+                provider_data=get_provider_data(),
            )
    except Exception:
        # If URL parsing fails, treat as non-URL config
@ -243,7 +248,7 @@ def llama_stack_client(request, provider_data):
    client = LlamaStackAsLibraryClient(
        config,
-        provider_data=provider_data,
+        provider_data=get_provider_data(),
        skip_logger_removal=True,
    )
    if not client.initialize():
@ -258,8 +263,17 @@ def openai_client(client_with_models):
    return OpenAI(base_url=base_url, api_key="fake")
-@pytest.fixture(params=["openai_client", "llama_stack_client"])
+@pytest.fixture(params=["openai_client", "client_with_models"])
-def compat_client(request):
+def compat_client(request, client_with_models):
    if isinstance(client_with_models, LlamaStackAsLibraryClient):
        # OpenAI client expects a server, so unless we also rewrite OpenAI client's requests
        # to go via the Stack library client (which itself rewrites requests to be served inline),
        # we cannot do this.
        #
        # This means when we are using Stack as a library, we will test only via the Llama Stack client.
        # When we are using a server setup, we can exercise both OpenAI and Llama Stack clients.
        pytest.skip("(OpenAI) Compat client cannot be used with Stack library client")
    return request.getfixturevalue(request.param)
--- a/tests/integration/inference/test_openai_completion.py
+++ b/tests/integration/inference/test_openai_completion.py
@ -6,9 +6,6 @@
 import pytest
 from openai import OpenAI
 from llama_stack.core.library_client import LlamaStackAsLibraryClient
 from ..test_cases.test_case import TestCase
@ -59,9 +56,6 @@ def skip_if_model_doesnt_support_suffix(client_with_models, model_id):
 def skip_if_model_doesnt_support_openai_chat_completion(client_with_models, model_id):
    if isinstance(client_with_models, LlamaStackAsLibraryClient):
        pytest.skip("OpenAI chat completions are not supported when testing with library client yet.")
    provider = provider_from_model(client_with_models, model_id)
    if provider.provider_type in (
        "inline::meta-reference",
@ -90,17 +84,6 @@ def skip_if_provider_isnt_openai(client_with_models, model_id):
        )
@pytest.fixture
 def openai_client(client_with_models):
    base_url = f"{client_with_models.base_url}/v1/openai/v1"
    return OpenAI(base_url=base_url, api_key="bar")
@pytest.fixture(params=["openai_client", "llama_stack_client"])
 def compat_client(request):
    return request.getfixturevalue(request.param)
@pytest.mark.parametrize(
    "test_case",
    [
--- a/tests/integration/recordings/index.sqlite
+++ b/tests/integration/recordings/index.sqlite
--- a/tests/integration/recordings/responses/4a3a4447b16b.json
+++ b/tests/integration/recordings/responses/4a3a4447b16b.json
@ -14,7 +14,7 @@
        "models": [
          {
            "model": "nomic-embed-text:latest",
-            "modified_at": "2025-08-04T15:54:50.584357-07:00",
+            "modified_at": "2025-08-05T14:04:07.946926-07:00",
            "digest": "0a109f422b47e3a30ba2b10eca18548e944e8a23073ee3f3e947efcf3c45e59f",
            "size": 274302450,
            "details": {
@ -28,9 +28,41 @@
              "quantization_level": "F16"
            }
          },
          {
            "model": "llama3.2-vision:11b",
            "modified_at": "2025-07-30T18:45:02.517873-07:00",
            "digest": "6f2f9757ae97e8a3f8ea33d6adb2b11d93d9a35bef277cd2c0b1b5af8e8d0b1e",
            "size": 7816589186,
            "details": {
              "parent_model": "",
              "format": "gguf",
              "family": "mllama",
              "families": [
                "mllama"
              ],
              "parameter_size": "10.7B",
              "quantization_level": "Q4_K_M"
            }
          },
          {
            "model": "llama3.2-vision:latest",
            "modified_at": "2025-07-29T20:18:47.920468-07:00",
            "digest": "6f2f9757ae97e8a3f8ea33d6adb2b11d93d9a35bef277cd2c0b1b5af8e8d0b1e",
            "size": 7816589186,
            "details": {
              "parent_model": "",
              "format": "gguf",
              "family": "mllama",
              "families": [
                "mllama"
              ],
              "parameter_size": "10.7B",
              "quantization_level": "Q4_K_M"
            }
          },
          {
            "model": "llama-guard3:1b",
-            "modified_at": "2025-08-01T15:46:28.963517-07:00",
+            "modified_at": "2025-07-25T14:39:44.978630-07:00",
            "digest": "494147e06bf99e10dbe67b63a07ac81c162f18ef3341aa3390007ac828571b3b",
            "size": 1600181919,
            "details": {
@ -46,7 +78,7 @@
          },
          {
            "model": "all-minilm:l6-v2",
-            "modified_at": "2025-07-29T15:07:06.295748-07:00",
+            "modified_at": "2025-07-24T15:15:11.129290-07:00",
            "digest": "1b226e2802dbb772b5fc32a58f103ca1804ef7501331012de126ab22f67475ef",
            "size": 45960996,
            "details": {
@ -61,26 +93,10 @@
            }
          },
          {
-            "model": "all-minilm:latest",
+            "model": "llama3.2:1b",
-            "modified_at": "2025-06-04T12:06:43.990073-07:00",
+            "modified_at": "2025-07-17T22:02:24.953208-07:00",
-            "digest": "1b226e2802dbb772b5fc32a58f103ca1804ef7501331012de126ab22f67475ef",
+            "digest": "baf6a787fdffd633537aa2eb51cfd54cb93ff08e28040095462bb63daf552878",
-            "size": 45960996,
+            "size": 1321098329,
            "details": {
              "parent_model": "",
              "format": "gguf",
              "family": "bert",
              "families": [
                "bert"
              ],
              "parameter_size": "23M",
              "quantization_level": "F16"
            }
          },
          {
            "model": "llama3.1:8b-instruct-fp16",
            "modified_at": "2025-02-14T15:23:24.865395-08:00",
            "digest": "4aacac4194543ff7f70dab3f2ebc169c132d5319bb36f7a7e99c4ff525ebcc09",
            "size": 16068910253,
            "details": {
              "parent_model": "",
              "format": "gguf",
@ -88,13 +104,45 @@
              "families": [
                "llama"
              ],
-              "parameter_size": "8.0B",
+              "parameter_size": "1.2B",
              "quantization_level": "Q8_0"
            }
          },
          {
            "model": "all-minilm:latest",
            "modified_at": "2025-06-03T16:50:10.946583-07:00",
            "digest": "1b226e2802dbb772b5fc32a58f103ca1804ef7501331012de126ab22f67475ef",
            "size": 45960996,
            "details": {
              "parent_model": "",
              "format": "gguf",
              "family": "bert",
              "families": [
                "bert"
              ],
              "parameter_size": "23M",
              "quantization_level": "F16"
            }
          },
          {
            "model": "llama3.2:3b",
            "modified_at": "2025-05-01T11:15:23.797447-07:00",
            "digest": "a80c4f17acd55265feec403c7aef86be0c25983ab279d83f3bcd3abbcb5b8b72",
            "size": 2019393189,
            "details": {
              "parent_model": "",
              "format": "gguf",
              "family": "llama",
              "families": [
                "llama"
              ],
              "parameter_size": "3.2B",
              "quantization_level": "Q4_K_M"
            }
          },
          {
            "model": "llama3.2:3b-instruct-fp16",
-            "modified_at": "2025-01-21T13:46:43.514008-08:00",
+            "modified_at": "2025-04-30T15:33:48.939665-07:00",
            "digest": "195a8c01d91ec3cb1e0aad4624a51f2602c51fa7d96110f8ab5a20c84081804d",
            "size": 6433703586,
            "details": {
--- a/tests/integration/recordings/responses/b44cc7a7afc8.json
+++ b/tests/integration/recordings/responses/b44cc7a7afc8.json
--- a/tests/integration/recordings/responses/d0ac68cbde69.json
+++ b/tests/integration/recordings/responses/d0ac68cbde69.json
@ -16,9 +16,9 @@
            "model": "llama3.2:3b-instruct-fp16",
            "name": "llama3.2:3b-instruct-fp16",
            "digest": "195a8c01d91ec3cb1e0aad4624a51f2602c51fa7d96110f8ab5a20c84081804d",
-            "expires_at": "2025-08-04T16:00:57.955349-07:00",
+            "expires_at": "2025-08-05T14:12:18.480323-07:00",
-            "size": 8581748736,
+            "size": 7919570944,
-            "size_vram": 8581748736,
+            "size_vram": 7919570944,
            "details": {
              "parent_model": "",
              "format": "gguf",
@ -29,6 +29,24 @@
              "parameter_size": "3.2B",
              "quantization_level": "F16"
            }
          },
          {
            "model": "all-minilm:l6-v2",
            "name": "all-minilm:l6-v2",
            "digest": "1b226e2802dbb772b5fc32a58f103ca1804ef7501331012de126ab22f67475ef",
            "expires_at": "2025-08-05T14:10:20.883978-07:00",
            "size": 590204928,
            "size_vram": 590204928,
            "details": {
              "parent_model": "",
              "format": "gguf",
              "family": "bert",
              "families": [
                "bert"
              ],
              "parameter_size": "23M",
              "quantization_level": "F16"
            }
          }
        ]
      }
--- a/tests/integration/recordings/responses/da531c71e64f.json
+++ b/tests/integration/recordings/responses/da531c71e64f.json
@ -0,0 +1,421 @@
 {
  "request": {
    "method": "POST",
    "url": "http://localhost:11434/api/embeddings",
    "headers": {},
    "body": {
      "model": "all-minilm:l6-v2",
      "input": [
        "Python programming language"
      ]
    },
    "endpoint": "/api/embeddings",
    "model": "all-minilm:l6-v2"
  },
  "response": {
    "body": {
      "__type__": "ollama._types.EmbedResponse",
      "__data__": {
        "model": "all-minilm:l6-v2",
        "created_at": null,
        "done": null,
        "done_reason": null,
        "total_duration": 105895041,
        "load_duration": 91634666,
        "prompt_eval_count": 3,
        "prompt_eval_duration": null,
        "eval_count": null,
        "eval_duration": null,
        "embeddings": [
          [
            -0.063880146,
            0.013411989,
            -0.054502595,
            0.01193493,
            -0.074262686,
            -0.13344447,
            0.04294062,
            0.045387108,
            -0.06949706,
            -0.035939943,
            0.01200873,
            0.0068830596,
            0.08886977,
            0.0026030506,
            0.032482542,
            -0.007821568,
            -0.05044649,
            0.006662123,
            0.027794942,
            -0.12791364,
            0.00062353734,
            0.045270294,
            -0.03605076,
            0.044243146,
            0.0129354475,
            -0.0092799105,
            0.011904844,
            0.026060482,
            0.020055141,
            -0.03368774,
            -0.028043076,
            0.087557025,
            0.059002083,
            0.053893365,
            0.02027196,
            0.06840361,
            -0.03180594,
            -0.087597735,
            -0.11277839,
            0.022651086,
            -0.09037903,
            -0.0033202847,
            -0.040132593,
            -0.034084503,
            -0.032953303,
            0.02925268,
            -0.03903928,
            0.04551951,
            -0.0331016,
            -0.006518362,
            -0.09629851,
            -0.011739161,
            -0.052575007,
            -0.064773224,
            0.031043475,
            -0.012586444,
            0.09737276,
            0.005224713,
            -0.035071153,
            -0.1404299,
            -0.06678175,
            0.03654573,
            -0.039277818,
            0.07014256,
            -0.0010227569,
            -0.026846789,
            -0.0175696,
            0.03044068,
            0.06403526,
            -0.031643596,
            -0.14598879,
            -0.045400888,
            -0.018469285,
            0.06689445,
            0.030553635,
            -0.12255281,
            0.061046645,
            -0.05678168,
            -0.005118667,
            -0.0087622,
            0.006514719,
            -0.016424034,
            -0.033650044,
            0.08491301,
            -0.00029260007,
            -0.07339515,
            0.038627055,
            0.15695965,
            0.010035773,
            0.025318887,
            -0.0021428047,
            -0.04613549,
            0.06244243,
            -0.019905778,
            -0.05471386,
            0.09796629,
            0.0384793,
            -0.072424814,
            -0.038704097,
            0.07158691,
            0.007360897,
            -0.05120446,
            0.0313513,
            -0.032230332,
            0.039326303,
            -0.009643992,
            0.069905065,
            -0.052026685,
            0.049440835,
            -0.04272916,
            -0.0037707465,
            -0.04155246,
            -0.0561972,
            -0.03340213,
            0.05105359,
            0.038616214,
            -0.0029470131,
            0.08188407,
            -0.0035886324,
            0.04530431,
            0.0068888925,
            0.016499842,
            0.016347302,
            0.007283021,
            -0.021663606,
            -0.0046215886,
            -0.007931065,
            -4.1536508e-33,
            -0.045777988,
            -0.050903402,
            -0.038634304,
            0.0100991195,
            0.070007294,
            -0.025182785,
            0.1050647,
            -0.0049731904,
            -0.064141616,
            -0.047639705,
            0.012718577,
            0.05198462,
            -0.016051587,
            0.08170543,
            0.024008816,
            -0.020879291,
            0.045706064,
            0.091577366,
            0.02512945,
            0.019055998,
            0.048144504,
            0.097951256,
            0.034154113,
            0.03543114,
            0.011410896,
            -0.043446988,
            -0.0041784984,
            -0.05564714,
            0.01147717,
            0.0071039577,
            -0.06426582,
            -0.020623188,
            -0.0045247558,
            -0.012943628,
            0.02658834,
            -0.012385487,
            0.008399212,
            -0.06824828,
            0.04683057,
            -0.04165085,
            -0.025662417,
            -0.0038799767,
            0.05007075,
            -0.008117481,
            -0.023308154,
            0.023914568,
            0.0015741173,
            0.046142872,
            -0.06898886,
            0.041611847,
            0.0045286645,
            -0.047628563,
            0.054236773,
            0.06972688,
            -0.016889753,
            0.04806098,
            0.012714234,
            0.0022186628,
            -0.006355918,
            -0.031550523,
            0.023726372,
            0.06859327,
            0.077228814,
            -0.01227583,
            0.03901903,
            0.034360897,
            0.03032876,
            0.058690928,
            0.08030179,
            0.06976231,
            -0.09047136,
            0.02376998,
            -0.008751518,
            0.038334776,
            -0.02751323,
            0.023137644,
            0.027101006,
            -0.08135271,
            -0.010334998,
            0.04730408,
            -0.02033998,
            -0.026008504,
            -0.017415512,
            -0.0035714875,
            -0.018727385,
            -0.037389226,
            0.041064497,
            0.05317889,
            -0.0055602547,
            -0.058561854,
            -0.072036326,
            -0.075019896,
            0.04825644,
            0.011348427,
            -0.02259257,
            1.3515749e-33,
            0.006240622,
            0.031606406,
            -0.036119435,
            -0.0016494404,
            -0.08255665,
            -0.06069396,
            0.059934463,
            0.014492232,
            0.059514895,
            0.027053975,
            -0.011601325,
            -0.057609312,
            0.10365583,
            -0.002784741,
            0.07693759,
            0.019432511,
            -0.052210074,
            0.015158053,
            -0.0012768542,
            0.027789148,
            -0.115292676,
            0.047323048,
            -0.07599195,
            -0.074344486,
            -0.029194841,
            -0.020079462,
            -0.034749795,
            -0.05769437,
            -0.0301632,
            0.04749987,
            0.012206333,
            0.011497502,
            -0.051970575,
            0.05972769,
            0.03281016,
            0.0013676677,
            0.057720944,
            -0.041179247,
            -0.02150875,
            -0.0067487382,
            0.1419711,
            0.05795878,
            0.010094941,
            0.09603845,
            0.014521089,
            0.02133803,
            -0.07551916,
            0.07887724,
            -0.04273237,
            -0.06601746,
            -0.038729392,
            -0.008161129,
            0.015012324,
            -0.049418066,
            -0.037083283,
            -0.02378242,
            0.03743137,
            0.008194503,
            -0.086978436,
            -0.05960285,
            -0.07732487,
            -0.056507926,
            0.029065313,
            0.0073954053,
            -0.077878684,
            0.0026059505,
            -0.10405392,
            -0.04738624,
            -0.015872862,
            -0.11591199,
            0.09724705,
            0.0049243565,
            -0.010273523,
            0.0066429917,
            -0.060295314,
            0.02550513,
            -0.052950058,
            -0.0038489713,
            -0.050250847,
            0.07679287,
            0.046089787,
            0.007386997,
            0.0046740095,
            0.07385862,
            -0.07792065,
            0.0013675193,
            0.013730894,
            0.05658653,
            0.021934126,
            0.007195913,
            0.0076705213,
            0.10221154,
            0.060060997,
            0.036779005,
            -0.037765697,
            -1.187368e-08,
            -0.00885571,
            0.01760442,
            0.062224448,
            0.032051455,
            -0.011581793,
            0.051908698,
            -0.011685676,
            -0.06391574,
            -0.029866237,
            0.03258576,
            0.0055078953,
            -0.012040446,
            -0.054406017,
            -0.056690563,
            -0.030638037,
            0.14276367,
            0.028526368,
            -0.028743364,
            0.019917691,
            0.025652615,
            0.073813364,
            -0.0066998666,
            0.0061508445,
            0.09610696,
            -0.08799916,
            -0.0089272335,
            0.03823298,
            0.04832936,
            0.018829934,
            -0.10534708,
            0.048226915,
            -0.02225069,
            0.020491786,
            0.014641141,
            0.030794447,
            -0.029119467,
            0.008283775,
            -0.04506887,
            0.0025344177,
            0.021756247,
            -0.008108281,
            0.00904927,
            -0.013340866,
            -0.014037631,
            0.06845187,
            0.045173325,
            -0.034587316,
            -0.07275669,
            -0.004159724,
            -0.058231864,
            -0.033032075,
            0.0040235794,
            -0.019985583,
            -0.020122562,
            0.055365406,
            0.10250875,
            -0.10799118,
            -0.013780294,
            -0.009652406,
            0.015592658,
            -0.031221472,
            0.1329332,
            0.15243866,
            -0.022426173
          ]
        ]
      }
    },
    "is_streaming": false
  }
 }
--- a/tests/integration/recordings/responses/dbc41d2417e1.json
+++ b/tests/integration/recordings/responses/dbc41d2417e1.json
@ -0,0 +1,674 @@
 {
  "request": {
    "method": "POST",
    "url": "http://localhost:11434/v1/chat/completions",
    "headers": {},
    "body": {
      "model": "llama3.2:3b-instruct-fp16",
      "messages": [
        {
          "role": "user",
          "content": "Hello, world!"
        }
      ],
      "stream": true
    },
    "endpoint": "/chat/completions",
    "model": "llama3.2:3b-instruct-fp16"
  },
  "response": {
    "body": [
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": "Hello",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": "!",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " It",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": "'s",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " nice",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " to",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " meet",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " you",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": ".",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " Is",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " there",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " something",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " I",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " can",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422171,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " help",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " you",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " with",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " or",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " would",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " you",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " like",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " to",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": " chat",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": "?",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": null,
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      },
      {
        "__type__": "openai.types.chat.chat_completion_chunk.ChatCompletionChunk",
        "__data__": {
          "id": "chatcmpl-698",
          "choices": [
            {
              "delta": {
                "content": "",
                "function_call": null,
                "refusal": null,
                "role": "assistant",
                "tool_calls": null
              },
              "finish_reason": "stop",
              "index": 0,
              "logprobs": null
            }
          ],
          "created": 1754422172,
          "model": "llama3.2:3b-instruct-fp16",
          "object": "chat.completion.chunk",
          "service_tier": null,
          "system_fingerprint": "fp_ollama",
          "usage": null
        }
      }
    ],
    "is_streaming": true
  }
 }
--- a/tests/integration/recordings/responses/f1ea938b0b0d.json
+++ b/tests/integration/recordings/responses/f1ea938b0b0d.json
@ -0,0 +1,56 @@
 {
  "request": {
    "method": "POST",
    "url": "http://localhost:11434/v1/chat/completions",
    "headers": {},
    "body": {
      "model": "llama3.2:3b-instruct-fp16",
      "messages": [
        {
          "role": "user",
          "content": "Hello, world!"
        }
      ],
      "stream": false
    },
    "endpoint": "/chat/completions",
    "model": "llama3.2:3b-instruct-fp16"
  },
  "response": {
    "body": {
      "__type__": "openai.types.chat.chat_completion.ChatCompletion",
      "__data__": {
        "id": "chatcmpl-796",
        "choices": [
          {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
              "content": "Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?",
              "refusal": null,
              "role": "assistant",
              "annotations": null,
              "audio": null,
              "function_call": null,
              "tool_calls": null
            }
          }
        ],
        "created": 1754422173,
        "model": "llama3.2:3b-instruct-fp16",
        "object": "chat.completion",
        "service_tier": null,
        "system_fingerprint": "fp_ollama",
        "usage": {
          "completion_tokens": 26,
          "prompt_tokens": 29,
          "total_tokens": 55,
          "completion_tokens_details": null,
          "prompt_tokens_details": null
        }
      }
    },
    "is_streaming": false
  }
 }
--- a/tests/integration/vector_io/test_openai_vector_stores.py
+++ b/tests/integration/vector_io/test_openai_vector_stores.py
@ -11,10 +11,8 @@ from io import BytesIO
 import pytest
 from llama_stack_client import BadRequestError, LlamaStackClient
 from openai import BadRequestError as OpenAIBadRequestError
 from openai import OpenAI
 from llama_stack.apis.vector_io import Chunk
 from llama_stack.core.library_client import LlamaStackAsLibraryClient
 logger = logging.getLogger(__name__)
@ -69,19 +67,6 @@ def skip_if_provider_doesnt_support_openai_vector_stores_search(client_with_mode
    )
@pytest.fixture
 def openai_client(client_with_models):
    base_url = f"{client_with_models.base_url}/v1/openai/v1"
    return OpenAI(base_url=base_url, api_key="fake")
@pytest.fixture(params=["openai_client", "llama_stack_client"])
 def compat_client(request, client_with_models):
    if request.param == "openai_client" and isinstance(client_with_models, LlamaStackAsLibraryClient):
        pytest.skip("OpenAI client tests not supported with library client")
    return request.getfixturevalue(request.param)
@pytest.fixture(scope="session")
 def sample_chunks():
    return [