llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Derek Higgins	2e807b38cc	chore: Add fixtures to conftest.py (#2067 ) Add fixtures for SqliteKVStore, DiskDistributionRegistry and CachedDiskDistributionRegistry. And use them in tests that had all been duplicating similar setups. ## Test Plan unit tests continue to run Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-06 13:57:48 +02:00
ehhuang	4597145011	chore: remove recordable mock (#2088 ) # What does this PR do? We've disabled it for a while given that this hasn't worked as well as expected given the frequent changes of llama_stack_client and how this requires both repos to be in sync. ## Test Plan Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-05-05 10:08:55 -07:00
Ashwin Bharambe	d27a0f276c	fix: pytest.mark.skip, not pytest.skip	2025-05-04 13:22:06 -07:00
Ashwin Bharambe	c69f14bfaa	fix: disable rag_and_code_agent test because no code interpreter anymore	2025-05-03 14:29:06 -07:00
Ben Browning	f1b103e6c8	fix: openai_compat messages system/assistant non-str content (#2095 ) # What does this PR do? When converting OpenAI message content for the "system" and "assistant" roles to Llama Stack inference APIs (used for some providers when dealing with Llama models via OpenAI API requests to get proper prompt / tool handling), we were not properly converting any non-string content. I discovered this while running the new Responses AI verification suite against the Fireworks provider, but instead of fixing it as part of some ongoing work there split this out into a separate PR. This fixes that, by using the `openai_content_to_content` helper we used elsewhere to ensure content parts were mapped properly. ## Test Plan I added a couple of new tests to `test_openai_compat` to reproduce this issue and validate its fix. I ran those as below: ``` python -m pytest -s -v tests/unit/providers/utils/inference/test_openai_compat.py ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-02 13:09:27 -07:00
Ashwin Bharambe	272d3359ee	fix: remove code interpeter implementation (#2087 ) # What does this PR do? The builtin implementation of code interpreter is not robust and has a really weak sandboxing shell (the `bubblewrap` container). Given the availability of better MCP code interpreter servers coming up, we should use them instead of baking an implementation into the Stack and expanding the vulnerability surface to the rest of the Stack. This PR only does the removal. We will add examples with how to integrate with MCPs in subsequent ones. ## Test Plan Existing tests.	2025-05-01 14:35:08 -07:00
Ihar Hrachyshka	9e6561a1ec	chore: enable pyupgrade fixes (#1806 ) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 14:23:50 -07:00
Matthew Farrellee	88a796ca5a	fix: allow use of models registered at runtime (#1980 ) # What does this PR do? fix a bug where models registered at runtime could not be used. ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct $ curl http://localhost:8321/v1/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "test-model", "messages": [{"role": "user", "content": "What is the weather like in Boston today?"}] }' =(client)=> {"detail":"Internal server error: An unexpected error occurred."} =(server)=> TypeError: Missing required arguments; Expected either ('messages' and 'model') or ('messages', 'model' and 'stream') arguments to be given ``` root cause: test-model is not added to ModelRegistryHelper's alias_to_provider_id_map. as part of the fix, this adds tests for ModelRegistryHelper and defines its expected behavior. user visible behavior changes - \| action \| existing behavior \| new behavior \| \| -- \| -- \| -- \| \| double register \| success (but no change) \| error \| \| register unknown \| success (fail when used) \| error \| existing behavior for register unknown model and double register - ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct-unknown Successfully registered model test-model $ llama-stack-client models list \| grep test-model │ llm │ test-model │ meta/llama-3.1-70b-instruct-unknown │ │ nv… │ $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct Successfully registered model test-model $ llama-stack-client models list \| grep test-model │ llm │ test-model │ meta/llama-3.1-70b-instruct-unknown │ │ nv… │ ``` new behavior for register unknown - ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct-unknown ╭──────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Failed to register model │ │ │ │ Error Type: BadRequestError │ │ Details: Error code: 400 - {'detail': "Invalid value: Model id │ │ 'meta/llama-3.1-70b-instruct-unknown' is not supported. Supported ids are: │ │ meta/llama-3.1-70b-instruct, snowflake/arctic-embed-l, meta/llama-3.2-1b-instruct, │ │ nvidia/nv-embedqa-mistral-7b-v2, meta/llama-3.2-90b-vision-instruct, meta/llama-3.2-3b-instruct, │ │ meta/llama-3.2-11b-vision-instruct, meta/llama-3.1-405b-instruct, meta/llama3-8b-instruct, │ │ meta/llama3-70b-instruct, nvidia/llama-3.2-nv-embedqa-1b-v2, meta/llama-3.1-8b-instruct, │ │ nvidia/nv-embedqa-e5-v5"} │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` new behavior for double register - ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct Successfully registered model test-model $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.2-1b-instruct ╭──────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Failed to register model │ │ │ │ Error Type: BadRequestError │ │ Details: Error code: 400 - {'detail': "Invalid value: Model id 'test-model' is already │ │ registered. Please use a different id or unregister it first."} │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` ## Test Plan ``` uv run pytest -v tests/unit/providers/utils/test_model_registry.py ```	2025-05-01 12:00:58 -07:00
Derek Higgins	64829947d0	feat: Add temperature support to responses API (#2065 ) # What does this PR do? Add support for the temperature to the responses API ## Test Plan Manually tested simple case unit tests added for simple case and tool calls Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-01 11:47:58 -07:00
Ben Browning	6378c2a2f3	fix: resolve BuiltinTools to strings for vllm tool_call messages (#2071 ) # What does this PR do? When the result of a ToolCall gets passed back into vLLM for the model to handle the tool call result (as is often the case in agentic tool-calling workflows), we forgot to handle the case where BuiltinTool calls are not string values but instead instances of the BuiltinTool enum. This fixes that, properly converting those enums to string values before trying to serialize them into an OpenAI chat completion request to vLLM. PR #1931 fixed a bug where we weren't passing these tool calling results back into vLLM, but as a side-effect it created this serialization bug when using BuiltinTools. Closes #2070 ## Test Plan I added a new unit test to the openai_compat unit tests to cover this scenario, ensured the new test failed before this fix, and all the existing tests there plus the new one passed with this fix. ``` python -m pytest -s -v tests/unit/providers/utils/inference/test_openai_compat.py ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-01 08:47:29 -04:00
Sébastien Han	dc94433072	feat(pre-commit): enhance pre-commit hooks with additional checks (#2014 ) # What does this PR do? Add several new pre-commit hooks to improve code quality and security: - no-commit-to-branch: prevent direct commits to protected branches like `main` - check-yaml: validate YAML files - detect-private-key: prevent accidental commit of private keys - requirements-txt-fixer: maintain consistent requirements.txt format and sorting - mixed-line-ending: enforce LF line endings to avoid mixed line endings - check-executables-have-shebangs: ensure executable scripts have shebangs - check-json: validate JSON files - check-shebang-scripts-are-executable: verify shebang scripts are executable - check-symlinks: validate symlinks and report broken ones - check-toml: validate TOML files mainly for pyproject.toml The respective fixes have been included. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-30 11:35:49 -07:00
Jash Gulabrai	eab550f7d2	fix: Fix messages format in NVIDIA safety check request body (#2063 ) # What does this PR do? When running a Llama Stack server and invoking the `/v1/safety/run-shield` endpoint, the NVIDIA Guardrails endpoint in some cases errors with a `422: Unprocessable Entity` due to malformed input. For example, given an request body like: ``` { "model": "test", "messages": [ { "role": "user", "content": "You are stupid." } ] } ``` `convert_pydantic_to_json_value` converts the message to: ``` { "role": "user", "content": "You are stupid.", "context": null } ``` Which causes NVIDIA Guardrails to return an error `HTTPError: 422 Client Error: Unprocessable Entity for url: http://nemo.test/v1/guardrail/checks`, because `context` shouldn't be included in the body. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan I ran the Llama Stack server locally and manually verified that the endpoint now succeeds. ``` message = {"role": "user", "content": "You are stupid."} response = client.safety.run_shield(messages=[message], shield_id=shield_id, params={}) ``` Server logs: ``` 14:29:09.656 [START] /v1/safety/run-shield INFO: 127.0.0.1:54616 - "POST /v1/safety/run-shield HTTP/1.1" 200 OK 14:29:09.918 [END] /v1/safety/run-shield [StatusCode.OK] (262.26ms ``` [//]: # (## Documentation) Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-30 18:01:28 +02:00
Derek Higgins	78ef6a6099	chore: Increase unit test coverage of routing_tables.py (#2057 ) # What does this PR do? Adds some unit tests for the routing logic ## Test Plan Overall unit test coverage goes from TOTAL 12434 8030 35% to TOTAL 12434 7871 37% Better coverage on router.py, before: ``` llama_stack/distribution/routers/routers.py \| 342 \| 219 \| 0 \| 36% llama_stack/distribution/routers/routing_tables.py \| 346 \| 236 \| 0 \| 32% ``` After: ``` llama_stack/distribution/routers/routers.py \| 342 \| 219 \| 0 \| 36% llama_stack/distribution/routers/routing_tables.py \| 349 \| 89 \| 0 \| 74% ``` Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-04-30 16:00:43 +02:00
Roland Huß	5a2bfd6ad5	refactor: Replace SQLITE_DB_PATH by SQLITE_STORE_DIR env in templates (#2055 ) # What does this PR do? The telemetry provider configs is the only one who leverages the env var `SQLITE_DB_PATH` for pointing to persistent data in the respective templates, whereas usually `SQLITE_STORE_DIR` is used. This PR modifies the `sqlite_db_path` in various telemetry configuration files to use the environment variable `SQLITE_STORE_DIR` instead of `SQLITE_DB_PATH`. This change ensures that _only_ the SQLITE_STORE_DIR needs to be set to point to a different persistence location for providers. All references to `SQLITE_DB_PATH` have been removed. Another improvement could be to move `sqlite_db_path` to `db_path` in the telemetry provider config, to align with the other provider configurations. That could be done by another PR (if wanted).	2025-04-29 15:28:10 -07:00
Ben Browning	8dfce2f596	feat: OpenAI Responses API (#1989 ) # What does this PR do? This provides an initial [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses) implementation. The API is not yet complete, and this is more a proof-of-concept to show how we can store responses in our key-value stores and use them to support the Responses API concepts like `previous_response_id`. ## Test Plan I've added a new `tests/integration/openai_responses/test_openai_responses.py` as part of a test-driven development for this new API. I'm only testing this locally with the remote-vllm provider for now, but it should work with any of our inference providers since the only API it requires out of the inference provider is the `openai_chat_completion` endpoint. ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack build --template remote-vllm --image-type venv --run ``` ``` LLAMA_STACK_CONFIG="http://localhost:8321" \ python -m pytest -v \ tests/integration/openai_responses/test_openai_responses.py \ --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-04-28 14:06:00 -07:00
Sébastien Han	79851d93aa	feat: Add Kubernetes authentication (#1778 ) # What does this PR do? This commit adds a new authentication system to the Llama Stack server with support for Kubernetes and custom authentication providers. Key changes include: - Implemented KubernetesAuthProvider for validating Kubernetes service account tokens - Implemented CustomAuthProvider for validating tokens against external endpoints - this is the same code that was already present. - Added test for Kubernetes - Updated server configuration to support authentication settings - Added documentation for authentication configuration and usage The authentication system supports: - Bearer token validation - Kubernetes service account token validation - Custom authentication endpoints ## Test Plan Setup a Kube cluster using Kind or Minikube. Run a server with: ``` server: port: 8321 auth: provider_type: kubernetes config: api_server_url: http://url ca_cert_path: path/to/cert (optional) ``` Run: ``` curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers ``` Or replace "my-user" with your service account. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-28 22:24:58 +02:00
Rashmi Pawar	e6bbf8d20b	feat: Add NVIDIA NeMo datastore (#1852 ) # What does this PR do? Implemetation of NeMO Datastore register, unregister API. Open Issues: - provider_id gets set to `localfs` in client.datasets.register() as it is specified in routing_tables.py: DatasetsRoutingTable see: #1860 Currently I have passed `"provider_id":"nvidia"` in metadata and have parsed that in `DatasetsRoutingTable` (Not the best approach, but just a quick workaround to make it work for now.) ## Test Plan - Unit test cases: `pytest tests/unit/providers/nvidia/test_datastore.py` ```bash ========================================================== test session starts =========================================================== platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0 rootdir: /home/ubuntu/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0, asyncio-0.26.0, nbval-0.11.0, metadata-3.1.1, html-4.1.1, cov-6.1.0 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 2 items tests/unit/providers/nvidia/test_datastore.py .. [100%] ============================================================ warnings summary ============================================================ ====================================================== 2 passed, 1 warning in 0.84s ====================================================== ``` cc: @dglogo, @mattf, @yanxi0830	2025-04-28 09:41:59 -07:00
Ashwin Bharambe	bb1a85c9a0	fix: make sure test works equally well against llama stack as a server	2025-04-25 15:24:11 -07:00
Jash Gulabrai	8713d67ce3	fix: Correctly parse algorithm_config when launching NVIDIA customization job; fix internal request handler (#2025 ) # What does this PR do? This addresses 2 bugs I ran into when launching a fine-tuning job with the NVIDIA Adapter: 1. Session handling in `_make_request` helper function returns an error. ``` INFO: 127.0.0.1:55831 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 500 Internal Server Error 16:11:45.643 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (270.44ms) 16:11:45.643 [ERROR] Error executing endpoint route='/v1/post-training/supervised-fine-tune' method='post' Traceback (most recent call last): File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/distribution/server/server.py", line 201, in endpoint return await maybe_await(value) File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/distribution/server/server.py", line 161, in maybe_await return await value File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/providers/remote/post_training/nvidia/post_training.py", line 408, in supervised_fine_tune response = await self._make_request( File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/providers/remote/post_training/nvidia/post_training.py", line 98, in _make_request async with self.session.request(method, url, params=params, json=json, **kwargs) as response: File "/Users/jgulabrai/Projects/forks/llama-stack/.venv/lib/python3.10/site-packages/aiohttp/client.py", line 1425, in __aenter__ self._resp: _RetType = await self._coro File "/Users/jgulabrai/Projects/forks/llama-stack/.venv/lib/python3.10/site-packages/aiohttp/client.py", line 579, in _request handle = tm.start() File "/Users/jgulabrai/Projects/forks/llama-stack/.venv/lib/python3.10/site-packages/aiohttp/helpers.py", line 587, in start return self._loop.call_at(when, self.__call__) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 724, in call_at self._check_closed() File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 510, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed ``` Note: This only occurred when initializing the client like so: ``` client = LlamaStackClient( base_url="http://0.0.0.0:8321" ) response = client.post_training.supervised_fine_tune(...) # Returns error ``` I didn't run into this issue when using the library client: ``` client = LlamaStackAsLibraryClient("nvidia") client.initialize() response = client.post_training.supervised_fine_tune(...) # Works fine ``` 2. The `algorithm_config` param in `supervised_fine_tune` is parsed as a `dict` when run from unit tests, but a Pydantic model when invoked using the Llama Stack client. So, the call fails outside of unit tests: ``` INFO: 127.0.0.1:54024 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 500 Internal Server Error 21:14:02.315 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (71.18ms) 21:14:02.314 [ERROR] Error executing endpoint route='/v1/post-training/supervised-fine-tune' method='post' Traceback (most recent call last): File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/distribution/server/server.py", line 205, in endpoint return await maybe_await(value) File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/distribution/server/server.py", line 164, in maybe_await return await value File "/Users/jgulabrai/Projects/forks/llama-stack/llama_stack/providers/remote/post_training/nvidia/post_training.py", line 407, in supervised_fine_tune "adapter_dim": algorithm_config.get("adapter_dim"), File "/Users/jgulabrai/Projects/forks/llama-stack/.venv/lib/python3.10/site-packages/pydantic/main.py", line 891, in __getattr__ raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}') AttributeError: 'LoraFinetuningConfig' object has no attribute 'get' ``` The code assumes `algorithm_config` should be `dict`, so I just handle both cases. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan 1. I ran a local Llama Stack server with the necessary env vars: ``` lama stack run llama_stack/templates/nvidia/run.yaml --port 8321 --env ... ``` And invoked `supervised_fine_tune` to confirm neither of the errors above occur. ``` client = LlamaStackClient( base_url="http://0.0.0.0:8321" ) response = client.post_training.supervised_fine_tune(...) ``` 2. I confirmed the unit tests still pass: `./scripts/unit-tests.sh tests/unit/providers/nvidia/test_supervised_fine_tuning.py` [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-25 13:21:50 -07:00
Ashwin Bharambe	b5d8e44e81	fix: only sleep for tests when they pass or fail	2025-04-25 13:16:22 -07:00
Ashwin Bharambe	4fb583b407	fix: check that llama stack client plain can be used as a subst for OpenAI client (#2032 ) With https://github.com/meta-llama/llama-stack-client-python/pull/226, now we have llama-stack-client be able to used as a substitute for OpenAI client (duck-typed) so you don't need to change downstream library code. <img width="1399" alt="image" src="https://github.com/user-attachments/assets/abab6bfd-e6ff-4a7d-a965-fd93e3c105d7" />	2025-04-25 12:23:33 -07:00
Rashmi Pawar	ace82836c1	feat: NVIDIA allow non-llama model registration (#1859 ) # What does this PR do? Adds custom model registration functionality to NVIDIAInferenceAdapter which let's the inference happen on: - post-training model - non-llama models in API Catalogue(behind https://integrate.api.nvidia.com and endpoints compatible with AyncOpenAI) ## Example Usage: ```python from llama_stack.apis.models import Model, ModelType from llama_stack.distribution.library_client import LlamaStackAsLibraryClient client = LlamaStackAsLibraryClient("nvidia") _ = client.initialize() client.models.register( model_id=model_name, model_type=ModelType.llm, provider_id="nvidia" ) response = client.inference.chat_completion( model_id=model_name, messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Write a limerick about the wonders of GPU computing."}], ) ``` ## Test Plan ```bash pytest tests/unit/providers/nvidia/test_supervised_fine_tuning.py ========================================================== test session starts =========================================================== platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0 rootdir: /home/ubuntu/llama-stack configfile: pyproject.toml plugins: anyio-4.9.0 collected 6 items tests/unit/providers/nvidia/test_supervised_fine_tuning.py ...... [100%] ============================================================ warnings summary ============================================================ ../miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076 /home/ubuntu/miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'contentEncoding'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/ warn( -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ====================================================== 6 passed, 1 warning in 1.51s ====================================================== ``` [//]: # (## Documentation) Updated Readme.md cc: @dglogo, @sumitb, @mattf	2025-04-24 17:13:33 -07:00
Jash Gulabrai	cc77f79f55	feat: Add NVIDIA Eval integration (#1890 ) # What does this PR do? This PR adds support for NVIDIA's NeMo Evaluator API to the Llama Stack eval module. The integration enables users to evaluate models via the Llama Stack interface. ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] 1. Added unit tests and successfully ran from root of project: `./scripts/unit-tests.sh tests/unit/providers/nvidia/test_eval.py` ``` tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_cancel PASSED tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_result PASSED tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_status PASSED tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_register_benchmark PASSED tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_run_eval PASSED ``` 2. Verified I could build the Llama Stack image: `LLAMA_STACK_DIR=$(pwd) llama stack build --template nvidia --image-type venv` Documentation added to `llama_stack/providers/remote/eval/nvidia/README.md` --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-24 17:12:42 -07:00
Ben Browning	0b6cd45950	fix: Additional streaming error handling (#2007 ) # What does this PR do? This expands the `test_sse` test suite and fixes some edge cases with bugs in our SSE error handling to ensure streaming clients always get a proper error response. First, we handle the case where a client disconnects before we actually start streaming the response back. Previously we only handled the case where a client disconnected as we were streaming the response, but there was an edge case where a client disconnecting before we streamed any response back did not trigger our logic to cleanly handle that disconnect. Second, we handle the case where an error is thrown from the server before the actual async generator gets created from the provider. This happens in scenarios like the newly merged OpenAI API input validation, where we eagerly raise validation errors before returning the async generator object that streams the responses back. ## Test Plan Tested via: ``` python -m pytest -s -v tests/unit/server/test_sse.py ``` Both test cases failed before, and passed afterwards. The test cases were written based on me experimenting with actual clients that would do bad things like randomly disconnect or send invalid input in streaming mode and I hit these two cases, where things were misbehaving in our error handling. Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-24 17:01:45 -07:00
Derek Higgins	c8797f1125	fix: Including tool call in chat (#1931 ) Include the tool call details with the chat when doing Rag with Remote vllm Fixes: #1929 With this PR the tool call is included in the chat returned to vllm, the model (meta-llama/Llama-3.1-8B-Instruct) the returns the answer as expected. Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-04-24 16:59:10 -07:00
Sébastien Han	14e60e3c02	feat: include run.yaml in the container image (#2005 ) As part of the build process, we now include the generated run.yaml (based of the provided build configuration file) into the container. We updated the entrypoint to use this run configuration as well. Given this simple distribution configuration: ``` # build.yaml version: '2' distribution_spec: description: Use (an external) Ollama server for running LLM inference providers: inference: - remote::ollama vector_io: - inline::faiss safety: - inline::llama-guard agents: - inline::meta-reference telemetry: - inline::meta-reference eval: - inline::meta-reference datasetio: - remote::huggingface - inline::localfs scoring: - inline::basic - inline::llm-as-judge - inline::braintrust tool_runtime: - remote::brave-search - remote::tavily-search - inline::code-interpreter - inline::rag-runtime - remote::model-context-protocol - remote::wolfram-alpha container_image: "registry.access.redhat.com/ubi9" image_type: container image_name: test ``` Build it: ``` llama stack build --config build.yaml ``` Run it: ``` podman run --rm \ -p 8321:8321 \ -e OLLAMA_URL=http://host.containers.internal:11434 \ --name llama-stack-server \ localhost/leseb-test:0.2.2 ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-24 11:29:53 +02:00
Ben Browning	fa5dfee07b	fix: Return HTTP 400 for OpenAI API validation errors (#2002 ) # What does this PR do? When clients called the Open AI API with invalid input that wasn't caught by our own Pydantic API validation but instead only caught by the backend inference provider, that backend inference provider was returning a HTTP 400 error. However, we were wrapping that into a HTTP 500 error, obfuscating the actual issue from calling clients and triggering OpenAI client retry logic. This change adjusts our existing `translate_exception` method in `server.py` to wrap `openai.BadRequestError` as HTTP 400 errors, passing through the string representation of the error message to the calling user so they can see the actual input validation error and correct it. I tried changing this in a few other places, but ultimately `translate_exception` was the only real place to handle this for both streaming and non-streaming requests across all inference providers that use the OpenAI server APIs. This also tightens up our validation a bit for the OpenAI chat completions API, to catch empty `messages` parameters, invalid `tool_choice` parameters, invalid `tools` items, or passing `tool_choice` when `tools` isn't given. Lastly, this extends our OpenAI API chat completions verifications to also check for consistent input validation across providers. Providers behind Llama Stack should automatically pass all the new tests due to the input validation added here, but some of the providers fail this test when not run behind Llama Stack due to differences in how they handle input validation and errors. (Closes #1951) ## Test Plan To test this, start an OpenAI API verification stack: ``` llama stack run --image-type venv tests/verifications/openai-api-verification-run.yaml ``` Then, run the new verification tests with your provider(s) of choice: ``` python -m pytest -s -v \ tests/verifications/openai_api/test_chat_completion.py \ --provider openai-llama-stack python -m pytest -s -v \ tests/verifications/openai_api/test_chat_completion.py \ --provider together-llama-stack ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-23 17:48:32 +02:00
Ben Browning	dc46725f56	fix: properly handle streaming client disconnects (#2000 ) # What does this PR do? Previously, when a streaming client would disconnect before we were finished streaming the entire response, an error like the below would get raised from the `sse_generator` function in `llama_stack/distribution/server/server.py`: ``` AttributeError: 'coroutine' object has no attribute 'aclose'. Did you mean: 'close'? ``` This was because we were calling `aclose` on a coroutine instead of the awaited value from that coroutine. This change fixes that, so that we save off the awaited value and then can call `aclose` on it if we encounter an `asyncio.CancelledError`, like we see when a client disconnects before we're finished streaming. The other changes in here are to add a simple set of tests for the happy path of our SSE streaming and this client disconnect path. That unfortunately requires adding one more dependency into our unit test section of pyproject.toml since `server.py` requires loading some of the telemetry code for me to test this functionality. ## Test Plan I wrote the tests in `tests/unit/server/test_sse.py` first, verified the client disconnected test failed before my change, and that it passed afterwards. ``` python -m pytest -s -v tests/unit/server/test_sse.py ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-23 15:44:28 +02:00
Sébastien Han	94f83382eb	feat: allow building distro with external providers (#1967 ) # What does this PR do? We can now build a distribution that includes external providers. Closes: https://github.com/meta-llama/llama-stack/issues/1948 ## Test Plan Build a distro with an external provider following the doc instructions. [//]: # (## Documentation) Added. Rendered: ![Screenshot 2025-04-18 at 11 26 39](https://github.com/user-attachments/assets/afcf3d50-8d30-48c3-8d24-06a4b3662881) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-18 17:18:28 +02:00
ehhuang	0ed41aafbf	test: add multi_image test (#1972 ) # What does this PR do? ## Test Plan pytest tests/verifications/openai_api/test_chat_completion.py --provider openai -k 'test_chat_multiple_images'	2025-04-17 12:51:42 -07:00
ehhuang	2976b5d992	fix: OAI compat endpoint for meta reference inference provider (#1962 ) Test plan: python tests/verifications/generate_report.py --providers fireworks,together,llama_meta_ref,openai Co-authored-by: Eric Huang <erichuang@fb.com>	2025-04-17 11:16:04 -07:00
ehhuang	8bd6665775	chore(verification): update README and reorganize generate_report.py (#1978 ) # What does this PR do? ## Test Plan uv run --with-editable ".[dev]" python tests/verifications/generate_report.py --run-tests	2025-04-17 10:41:22 -07:00
Alexey Rybak	326cbba579	feat(agents): add agent naming functionality (#1922 ) # What does this PR do? Allow users to name an agent and use the name in telemetry instead of relying on randomly generated agent_ids. This improves the developer experience by making it easier to find specific agents in telemetry logs. Closes #1832 ## Test Plan - Added tests to verify the agent name is properly stored and retrieved - Ran `uv run -- pytest -v tests/integration/telemetry/test_telemetry.py::test_agent_name_filtering` from the root of the project and made sure the tests pass - Ran `uv run -- pytest -v tests/integration/telemetry/test_telemetry.py::test_agent_query_spans` to verify existing code without agent names still works correctly ## Use Example ``` agent = Agent( llama_stack_client, model=text_model_id, name="CustomerSupportAgent", # New parameter instructions="You are a helpful customer support assistant" ) session_id = agent.create_session(f"test-session-{uuid4()}") ``` ## Implementation Notes - Agent names are optional string parameters with no additional validation - Names are not required to be unique - multiple agents can have the same name - The agent_id remains the unique identifier for an agent --------- Co-authored-by: raghotham <raghotham@gmail.com>	2025-04-17 07:02:47 -07:00
Jash Gulabrai	45e08ff417	fix: Handle case when Customizer Job status is unknown (#1965 ) # What does this PR do? This PR handles the case where a Customization Job's status is `unknown`. Since we don't map `unknown` to a valid `JobStatus`, the PostTraining provider throws an exception when fetching/listing a job. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] `./scripts/unit-tests.sh tests/unit/providers/nvidia/test_supervised_fine_tuning.py` succeeds [//]: # (## Documentation) Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-17 10:27:07 +02:00
Alexey Rybak	8f57b08f2c	fix(build): always pass path when no template/config provided (#1982 ) # What does this PR do? Fixes a crash that occurred when building a stack as a container image via the interactive wizard without supplying --template or --config. - Root cause: template_or_config was None; only the container path relies on that parameter, which later reaches subprocess.run() and triggers `TypeError: expected str, bytes or os.PathLike object, not NoneType.` - Change: in `_run_stack_build_command_from_build_config` we now fall back to the freshly‑written build‑spec file whenever both optional sources are missing. Also adds a spy‑based unit test that asserts a valid string path is passed to build_image() for container builds. ### Closes #1976 ## Test Plan - New unit test: test_build_path.py. Monkey‑patches build_image, captures the fourth argument, and verifies it is a real path - Manual smoke test: ``` llama stack build --image-type container # answer wizard prompts ``` Build proceeds into Docker without raising the previous TypeError. ## Future Work Harmonise `build_image` arguments so every image type receives the same inputs, eliminating this asymmetric special‑case.	2025-04-17 10:20:43 +02:00
ehhuang	b44f84ce18	test: disable flaky dataset (#1979 ) # What does this PR do? ## Test Plan	2025-04-16 15:33:37 -07:00
Daniel Alvarez Sanchez	b5a9ef4c6d	fix: Do not send an empty 'tools' list to remote vllm (#1957 ) Fixes: #1955 Since 0.2.0, the vLLM gets an empty list (vs ``None``in 0.1.9 and before) when there are no tools configured which causes the issue described in #1955 p. This patch avoids sending the 'tools' param to the vLLM altogether instead of an empty list. It also adds a small unit test to avoid regressions. The OpenAI [specification](https://platform.openai.com/docs/api-reference/chat/create) does not explicitly state that the list cannot be empty but I found this out through experimentation and it might depend on the actual remote vllm. In any case, as this parameter is Optional, is best to skip it altogether if there's no tools configured. Signed-off-by: Daniel Alvarez <dalvarez@redhat.com>	2025-04-15 20:31:12 -04:00
ehhuang	32e3da7392	test(verification): more tests, multiturn tool use tests (#1954 ) # What does this PR do? ## Test Plan (myenv) ➜ llama-stack python tests/verifications/generate_report.py --providers fireworks,together,openai --run-tests `f27f617629/tests/verifications/REPORT.md`	2025-04-14 18:45:22 -07:00
Ihar Hrachyshka	3ed4316ed5	feat: Implement async job execution for torchtune training (#1437 ) # What does this PR do? Now a separate thread is started to execute training jobs. Training requests now return job ID before the job completes. (Which fixes API timeouts for any jobs that take longer than a minute.) Note: the scheduler code is meant to be spun out in the future into a common provider service that can be reused for different APIs and providers. It is also expected to back the /jobs API proposed here: https://github.com/meta-llama/llama-stack/discussions/1238 Hence its somewhat generalized form which is expected to simplify its adoption elsewhere in the future. Note: this patch doesn't attempt to implement missing APIs (e.g. cancel or job removal). This work will belong to follow-up PRs. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Added unit tests for the scheduler module. For the API coverage, did manual testing and was able to run a training cycle on GPU. The initial call returned job ID before the training completed, as (now) expected. Artifacts are returned as expected. ``` JobArtifactsResponse(checkpoints=[{'identifier': 'meta-llama/Llama-3.2-3B-Instruct-sft-0', 'created_at': '2025-03-07T22:45:19.892714', 'epoch': 0, 'post_training_job_id': 'test-job2ee77104-2fd3-4a4e-84cf-f83f8b8f1f50', 'path': '/home/ec2-user/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0', 'training_metrics': None}], job_uuid='test-job2ee77104-2fd3-4a4e-84cf-f83f8b8f1f50') ``` The integration test is currently disabled for the provider. I will look into how it can be enabled in a different PR / issue context. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-04-14 08:59:11 -07:00
Ben Browning	7641a5cd0b	fix: 100% OpenAI API verification for together and fireworks (#1946 ) # What does this PR do? TLDR: Changes needed to get 100% passing tests for OpenAI API verification tests when run against Llama Stack with the `together`, `fireworks`, and `openai` providers. And `groq` is better than before, at 88% passing. This cleans up the OpenAI API support for image message types (specifically `image_url` types) and handling of the `response_format` chat completion parameter. Both of these required a few more Pydantic model definitions in our Inference API, just to move from the not-quite-right stubs I had in place to something fleshed out to match the actual OpenAI API specs. As part of testing this, I also found and fixed a bug in the litellm implementation of openai_completion and openai_chat_completion, so the providers based on those should actually be working now. The method `prepare_openai_completion_params` in `llama_stack/providers/utils/inference/openai_compat.py` was improved to actually recursively clean up input parameters, including handling of lists, dicts, and dumping of Pydantic models to dicts. These changes were required to get to 100% passing tests on the OpenAI API verification against the `openai` provider. With the above, the together.ai provider was passing as well as it is without Llama Stack. But, since we have Llama Stack in the middle, I took the opportunity to clean up the together.ai provider so that it now also passes the OpenAI API spec tests we have at 100%. That means together.ai is now passing our verification test better when using an OpenAI client talking to Llama Stack than it is when hitting together.ai directly, without Llama Stack in the middle. And, another round of work for Fireworks to improve translation of incoming OpenAI chat completion requests to Llama Stack chat completion requests gets the fireworks provider passing at 100%. The server-side fireworks.ai tool calling support with OpenAI chat completions and Llama 4 models isn't great yet, but by pointing the OpenAI clients at Llama Stack's API we can clean things up and get everything working as expected for Llama 4 models. ## Test Plan ### OpenAI API Verification Tests I ran the OpenAI API verification tests as below and 100% of the tests passed. First, start a Llama Stack server that runs the `openai` provider with the `gpt-4o` and `gpt-4o-mini` models deployed. There's not a template setup to do this out of the box, so I added a `tests/verifications/openai-api-verification-run.yaml` to do this. First, ensure you have the necessary API key environment variables set: ``` export TOGETHER_API_KEY="..." export FIREWORKS_API_KEY="..." export OPENAI_API_KEY="..." ``` Then, run a Llama Stack server that serves up all these providers: ``` llama stack run \ --image-type venv \ tests/verifications/openai-api-verification-run.yaml ``` Finally, generate a new verification report against all these providers, both with and without the Llama Stack server in the middle. ``` python tests/verifications/generate_report.py \ --run-tests \ --provider \ together \ fireworks \ groq \ openai \ together-llama-stack \ fireworks-llama-stack \ groq-llama-stack \ openai-llama-stack ``` You'll see that most of the configurations with Llama Stack in the middle now pass at 100%, even though some of them do not pass at 100% when hitting the backend provider's API directly with an OpenAI client. ### OpenAI Completion Integration Tests with vLLM: I also ran the smaller `test_openai_completion.py` test suite (that's not yet merged with the verification tests) on multiple of the providers, since I had to adjust the method signature of openai_chat_completion a bit and thus had to touch lots of these providers to match. Here's the tests I ran there, all passing: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` ### OpenAI Completion Integration Tests with ollama ``` INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0" ``` ### OpenAI Completion Integration Tests with together.ai ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" llama stack build --template together --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct-Turbo" ``` ### OpenAI Completion Integration Tests with fireworks.ai ``` INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack build --template fireworks --image-type venv --run ``` in another terminal ``` LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.1-8B-Instruct" --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-14 08:56:29 -07:00
Ashwin Bharambe	429f6de7d7	fix: misc fixes for tests kill horrible warnings	2025-04-12 17:12:11 -07:00
ehhuang	ad86a68a32	feat: support '-' in tool names (#1807 ) # What does this PR do? titled ## Test Plan added new unit tests pytest -s -v tests/unit/models/llama/llama3/test_tool_utils.py	2025-04-12 14:23:03 -07:00
Ashwin Bharambe	ef3dc143ec	fix: test_registration was borked somehow	2025-04-12 12:04:01 -07:00
Ashwin Bharambe	f34f22f8c7	feat: add batch inference API to llama stack inference (#1945 ) # What does this PR do? This PR adds two methods to the Inference API: - `batch_completion` - `batch_chat_completion` The motivation is for evaluations targeting a local inference engine (like meta-reference or vllm) where batch APIs provide for a substantial amount of acceleration. Why did I not add this to `Api.batch_inference` though? That just resulted in a _lot_ more book-keeping given the structure of Llama Stack. Had I done that, I would have needed to create a notion of a "batch model" resource, setup routing based on that, etc. This does not sound ideal. So what's the future of the batch inference API? I am not sure. Maybe we can keep it for true _asynchronous_ execution. So you can submit requests, and it can return a Job instance, etc. ## Test Plan Run meta-reference-gpu using: ```bash export INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct export INFERENCE_CHECKPOINT_DIR=../checkpoints/Llama-4-Scout-17B-16E-Instruct-20250331210000 export MODEL_PARALLEL_SIZE=4 export MAX_BATCH_SIZE=32 export MAX_SEQ_LEN=6144 LLAMA_MODELS_DEBUG=1 llama stack run meta-reference-gpu ``` Then run the batch inference test case.	2025-04-12 11:41:12 -07:00
Ben Browning	2b2db5fbda	feat: OpenAI-Compatible models, completions, chat/completions (#1894 ) # What does this PR do? This stubs in some OpenAI server-side compatibility with three new endpoints: /v1/openai/v1/models /v1/openai/v1/completions /v1/openai/v1/chat/completions This gives common inference apps using OpenAI clients the ability to talk to Llama Stack using an endpoint like http://localhost:8321/v1/openai/v1 . The two "v1" instances in there isn't awesome, but the thinking is that Llama Stack's API is v1 and then our OpenAI compatibility layer is compatible with OpenAI V1. And, some OpenAI clients implicitly assume the URL ends with "v1", so this gives maximum compatibility. The openai models endpoint is implemented in the routing layer, and just returns all the models Llama Stack knows about. The following providers should be working with the new OpenAI completions and chat/completions API: * remote::anthropic (untested) * remote::cerebras-openai-compat (untested) * remote::fireworks (tested) * remote::fireworks-openai-compat (untested) * remote::gemini (untested) * remote::groq-openai-compat (untested) * remote::nvidia (tested) * remote::ollama (tested) * remote::openai (untested) * remote::passthrough (untested) * remote::sambanova-openai-compat (untested) * remote::together (tested) * remote::together-openai-compat (untested) * remote::vllm (tested) The goal to support this for every inference provider - proxying directly to the provider's OpenAI endpoint for OpenAI-compatible providers. For providers that don't have an OpenAI-compatible API, we'll add a mixin to translate incoming OpenAI requests to Llama Stack inference requests and translate the Llama Stack inference responses to OpenAI responses. This is related to #1817 but is a bit larger in scope than just chat completions, as I have real use-cases that need the older completions API as well. ## Test Plan ### vLLM ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` ### ollama ``` INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0" ``` ## Documentation Run a Llama Stack distribution that uses one of the providers mentioned in the list above. Then, use your favorite OpenAI client to send completion or chat completion requests with the base_url set to http://localhost:8321/v1/openai/v1 . Replace "localhost:8321" with the host and port of your Llama Stack server, if different. --------- Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-04-11 13:14:17 -07:00
Jash Gulabrai	c1cb6aad11	feat: Add unit tests for NVIDIA safety (#1897 ) # What does this PR do? This PR adds unit tests for the NVIDIA Safety provider implementation. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] 1. Ran `./scripts/unit-tests.sh tests/unit/providers/nvidia/test_safety.py` from the root of the project. Verified tests pass. ``` tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_init_nemo_guardrails Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_init_nemo_guardrails_invalid_temperature Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_register_shield_with_valid_id Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_register_shield_without_id Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_allowed Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_blocked Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_http_error Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED tests/unit/providers/nvidia/test_safety.py::TestNVIDIASafetyAdapter::test_run_shield_not_found Initializing NVIDIASafetyAdapter(http://nemo.test)... PASSED ``` [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-04-11 11:49:55 -07:00
ehhuang	2fcb70b789	test(verification): overwrite test result instead of creating new ones (#1934 ) # What does this PR do? ## Test Plan (myenv) ➜ llama-stack python tests/verifications/generate_report.py --providers fireworks,together,openai --run-tests	2025-04-10 16:59:28 -07:00
ehhuang	a4cc4b7e31	test(verification): add streaming tool calling test (#1933 ) # What does this PR do? ## Test Plan --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1933). * #1934 * __->__ #1933	2025-04-10 16:58:06 -07:00
Francisco Arceo	de6ec5803e	fix: Fix linter failures from #1921 (#1932 ) # What does this PR do? fix: Fix linter failures from #1921 Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-10 10:37:31 -07:00
ehhuang	14146e4b3f	feat(verification): various improvements (#1921 ) # What does this PR do? - provider and their models now live in config.yaml - better distinguish different cases within a test - add model key to surface provider's model_id - include example command to rerun single test case ## Test Plan <img width="1173" alt="image" src="https://github.com/user-attachments/assets/b414baf0-c768-451f-8c3b-c2905cf36fac" />	2025-04-10 10:26:19 -07:00

1 2 3 4 5 ...

323 commits