# What does this PR do?
## Test Plan
# What does this PR do?
## Test Plan
# What does this PR do?
## Test Plan
Completes the refactoring started in previous commit by:
1. **Fix library client** (critical): Add logic to detect Pydantic model parameters
and construct them properly from request bodies. The key fix is to NOT exclude
any params when converting the body for Pydantic models - we need all fields
to pass to the Pydantic constructor.
Before: _convert_body excluded all params, leaving body empty for Pydantic construction
After: Check for Pydantic params first, skip exclusion, construct model with full body
2. **Update remaining providers** to use new Pydantic-based signatures:
- litellm_openai_mixin: Extract extra fields via __pydantic_extra__
- databricks: Use TYPE_CHECKING import for params type
- llama_openai_compat: Use TYPE_CHECKING import for params type
- sentence_transformers: Update method signatures to use params
3. **Update unit tests** to use new Pydantic signature:
- test_openai_mixin.py: Use OpenAIChatCompletionRequestParams
This fixes test failures where the library client was trying to construct
Pydantic models with empty dictionaries.
The previous fix had a bug: it called _convert_body() which only keeps fields
that match function parameter names. For Pydantic methods with signature:
openai_chat_completion(params: OpenAIChatCompletionRequestParams)
The signature only has 'params', but the body has 'model', 'messages', etc.
So _convert_body() returned an empty dict.
Fix: Skip _convert_body() entirely for Pydantic params. Use the raw body
directly to construct the Pydantic model (after stripping NOT_GIVENs).
This properly fixes the ValidationError where required fields were missing.
The streaming code path (_call_streaming) had the same issue as non-streaming:
it called _convert_body() which returned empty dict for Pydantic params.
Applied the same fix as commit 7476c0ae:
- Detect Pydantic model parameters before body conversion
- Skip _convert_body() for Pydantic params
- Construct Pydantic model directly from raw body (after stripping NOT_GIVENs)
This fixes streaming endpoints like openai_chat_completion with stream=True.
The streaming code path (_call_streaming) had the same issue as non-streaming:
it called _convert_body() which returned empty dict for Pydantic params.
Applied the same fix as commit 7476c0ae:
- Detect Pydantic model parameters before body conversion
- Skip _convert_body() for Pydantic params
- Construct Pydantic model directly from raw body (after stripping NOT_GIVENs)
This fixes streaming endpoints like openai_chat_completion with stream=True.
Propagate test IDs from client to server via HTTP headers to maintain
proper test isolation when running with server-based stack configs.
Without
this, recorded/replayed inference requests in server mode would leak
across
tests.
Changes:
- Patch client _prepare_request to inject test ID into provider data
header
- Sync test context from provider data on server side before storage
operations
- Set LLAMA_STACK_TEST_STACK_CONFIG_TYPE env var based on stack config
- Configure console width for cleaner log output in CI
- Add SQLITE_STORE_DIR temp directory for test data isolation
Uses test_id in request hashes and test-scoped subdirectories to prevent
cross-test contamination. Model list endpoints exclude test_id to enable
merging recordings from different servers.
Additionally, this PR adds a `record-if-missing` mode (which we will use
instead of `record` which records everything) which is very useful.
🤖 Co-authored with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
# What does this PR do?
the nvidia datastore tests were running when the datastore was not
configured. they would always fail.
this introduces a skip when the nvidia datastore is not configured.
## Test Plan
ci
# What does this PR do?
This PR is generated with AI and reviewed by me.
Refactors the AuthorizedSqlStore class to store the access policy as an
instance variable rather than passing it as a parameter to each method
call. This simplifies the API.
# Test Plan
existing tests
# What does this PR do?
previously, developers who ran `./scripts/unit-tests.sh` would get
`asyncio-mode=auto`, which meant `@pytest.mark.asyncio` and
`@pytest_asyncio.fixture` were redundent. developers who ran `pytest`
directly would get pytest's default (strict mode), would run into errors
leading them to add `@pytest.mark.asyncio` / `@pytest_asyncio.fixture`
to their code.
with this change -
- `asyncio_mode=auto` is included in `pyproject.toml` making behavior
consistent for all invocations of pytest
- removes all redundant `@pytest_asyncio.fixture` and
`@pytest.mark.asyncio`
- for good measure, requires `pytest>=8.4` and `pytest-asyncio>=1.0`
## Test Plan
- `./scripts/unit-tests.sh`
- `uv run pytest tests/unit`
# What does this PR do?
The current authorized sql store implementation does not respect
user.principal (only checks attributes). This PR addresses that.
## Test Plan
Added test cases to integration tests.
# What does this PR do?
Implemetation of NeMO Datastore register, unregister API.
Open Issues:
- provider_id gets set to `localfs` in client.datasets.register() as it
is specified in routing_tables.py: DatasetsRoutingTable
see: #1860
Currently I have passed `"provider_id":"nvidia"` in metadata and have
parsed that in `DatasetsRoutingTable`
(Not the best approach, but just a quick workaround to make it work for
now.)
## Test Plan
- Unit test cases: `pytest
tests/unit/providers/nvidia/test_datastore.py`
```bash
========================================================== test session starts ===========================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-0.26.0, nbval-0.11.0, metadata-3.1.1, html-4.1.1, cov-6.1.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items
tests/unit/providers/nvidia/test_datastore.py .. [100%]
============================================================ warnings summary ============================================================
====================================================== 2 passed, 1 warning in 0.84s ======================================================
```
cc: @dglogo, @mattf, @yanxi0830
# What does this PR do?
with the new /v1/providers API, /v1/inspect/providers is duplicative,
deprecate it by removing the route, and add a test for the full
/v1/providers API
resolves#1623
## Test Plan
`uv run pytest -v tests/integration/providers --stack-config=ollama
--text-model="meta-llama/Llama-3.2-3B-Instruct"
--embedding-model=all-MiniLM-L6-v2`
<img width="1512" alt="Screenshot 2025-03-18 at 9 18 38 AM"
src="https://github.com/user-attachments/assets/2db30f25-3ff6-4374-b39d-0047f093fe36"
/>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
currently the `inspect` API for providers is really a `list` API. Create
a new `providers` API which has a GET `providers/{provider_id}` inspect
API
which returns "user friendly" configuration to the end user. Also add a
GET `/providers` endpoint which returns the list of providers as
`inspect/providers` does today.
This API follows CRUD and is more intuitive/RESTful.
This work is part of the RFC at
https://github.com/meta-llama/llama-stack/pull/1359
sensitive fields are redacted using `redact_sensetive_fields` on the
server side before returning a response:
<img width="456" alt="Screenshot 2025-03-13 at 4 40 21 PM"
src="https://github.com/user-attachments/assets/9465c221-2a26-42f8-a08a-6ac4a9fecce8"
/>
## Test Plan
using https://github.com/meta-llama/llama-stack-client-python/pull/181 a
user is able to to run the following:
`llama stack build --template ollama --image-type venv`
`llama stack run --image-type venv
~/.llama/distributions/ollama/ollama-run.yaml`
`llama-stack-client providers inspect ollama`
<img width="378" alt="Screenshot 2025-03-13 at 4 39 35 PM"
src="https://github.com/user-attachments/assets/8273d05d-8bc3-44c6-9e4b-ef95e48d5466"
/>
also, was able to run the new test_list integration test locally with
ollama:
<img width="1509" alt="Screenshot 2025-03-13 at 11 03 40 AM"
src="https://github.com/user-attachments/assets/9b9db166-f02f-45b0-86a4-306d85149bc8"
/>
Signed-off-by: Charlie Doern <cdoern@redhat.com>