llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-04 04:04:14 +00:00

Author	SHA1	Message	Date
Charlie Doern	92d0470f74	feat: allow for multiple external provider specs when using the providers.d method of installation users could hand craft their AdapterSpec's to use overlapping code meaning one repo could contain an inline and remote impl. Currently installing a provider via module does not allow for that as each repo is only allowed to have one `get_provider_spec` method with one Spec returned add an optional way for `get_provider_spec` to return a list of `ProviderSpec` where each can be either an inline or remote impl. Note: the `adapter_type` in `get_provider_spec` MUST match the `provider_type` in the build/run yaml for this to work. resolves #3226 Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-03 13:30:48 -04:00
Ashwin Bharambe	ef0736527d	feat(tools)!: substantial clean up of "Tool" related datatypes (#3627 ) This is a sweeping change to clean up some gunk around our "Tool" definitions. First, we had two types `Tool` and `ToolDef`. The first of these was a "Resource" type for the registry but we had stopped registering tools inside the Registry long back (and only registered ToolGroups.) The latter was for specifying tools for the Agents API. This PR removes the former and adds an optional `toolgroup_id` field to the latter. Secondly, as pointed out by @bbrowning in https://github.com/llamastack/llama-stack/pull/3003#issuecomment-3245270132, we were doing a lossy conversion from a full JSON schema from the MCP tool specification into our ToolDefinition to send it to the model. There is no necessity to do this -- we ourselves aren't doing any execution at all but merely passing it to the chat completions API which supports this. By doing this (and by doing it poorly), we encountered limitations like not supporting array items, or not resolving $refs, etc. To fix this, we replaced the `parameters` field by `{ input_schema, output_schema }` which can be full blown JSON schemas. Finally, there were some types in our llama-related chat format conversion which needed some cleanup. We are taking this opportunity to clean those up. This PR is a substantial breaking change to the API. However, given our window for introducing breaking changes, this suits us just fine. I will be landing a concurrent `llama-stack-client` change as well since API shapes are changing.	2025-10-02 15:12:03 -07:00
Ben Browning	b6e2934f7b	fix: Gracefully handle errors when listing MCP tools (#2544 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 38s Details Pre-commit / pre-commit (push) Successful in 1m17s Details # What does this PR do? When listing (and lazily indexing) tools, it's possible for an error to get thrown by individual toolgroups if for example an MCP toolgroup is unable to connect to its `mcp_endpoint`. This logs a warning in the server when that happens, logs a full stack trace of the error if debug logging is enabled, and just returns the list of tools from all working toolgroups instead of throwing an error to the client when a single toolgroup is temporarily or permanently misbehaving. The exception to the above is authentication errors, which we specifically send all the way back to the client as that's how we indicate to the client that it needs to provide authentication data for the remote MCP servers. Closes #2540 ## Test Plan A new unit test was added to test this exception handling, which is run as part of our regular test suite but also manually run to specifically verify this fix via: ``` uv run pytest -sv --asyncio-mode=auto \ tests/unit/distribution/routers/test_routing_tables.py ``` To verify the additional debug logging is printing properly: ``` LLAMA_STACK_LOGGING=core=debug \ uv run pytest -sv --asyncio-mode=auto \ tests/unit/distribution/routers/test_routing_tables.py ``` The mcp integration tests were run as below (and by CI): ``` ollama run llama3.2:3b ENABLE_OLLAMA="ollama" \ OLLAMA_INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=starter \ uv run pytest -sv tests/integration/tool_runtime/test_mcp.py \ --text-model meta-llama/Llama-3.2-3B-Instruct ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-09-26 18:09:48 +02:00
ehhuang	4c2fcb6b51	chore: refactor server.main (#3462 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s Details API Conformance Tests / check-schema-compatibility (push) Successful in 22s Details UI Tests / ui-tests (22) (push) Successful in 29s Details Pre-commit / pre-commit (push) Successful in 1m25s Details # What does this PR do? As shown in #3421, we can scale stack to handle more RPS with k8s replicas. This PR enables multi process stack with uvicorn --workers so that we can achieve the same scaling without being in k8s. To achieve that we refactor main to split out the app construction logic. This method needs to be non-async. We created a new `Stack` class to house impls and have a `start()` method to be called in lifespan to start background tasks instead of starting them in the old `construct_stack`. This way we avoid having to manage an event loop manually. ## Test Plan CI > uv run --with llama-stack python -m llama_stack.core.server.server benchmarking/k8s-benchmark/stack_run_config.yaml works. > LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv run uvicorn llama_stack.core.server.server:create_app --port 8321 --workers 4 works.	2025-09-18 21:11:13 -07:00
Charlie Doern	8422bd102a	feat: combine ProviderSpec datatypes (#3378 ) Some checks failed Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 36s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 4s Details Pre-commit / pre-commit (push) Successful in 1m12s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details # What does this PR do? currently `RemoteProviderSpec` has an `AdapterSpec` embedded in it. Remove `AdapterSpec`, and put its leftover fields into `RemoteProviderSpec`. Additionally, many of the fields were duplicated between `InlineProviderSpec` and `RemoteProviderSpec`. Move these to `ProviderSpec` so they are shared. Fixup the distro codegen to use `RemoteProviderSpec` directly rather than `remote_provider_spec` which took an AdapterSpec and returned a full provider spec ## Test Plan existing distro tests should pass. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-09-18 16:10:00 +02:00
Francisco Arceo	9acf49753e	fix: Fixing prompts import warning (#3455 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 9s Details UI Tests / ui-tests (22) (push) Successful in 41s Details Pre-commit / pre-commit (push) Successful in 1m17s Details # What does this PR do? Fixes this warning in llama stack build: ```bash WARNING 2025-09-15 15:29:02,197 llama_stack.core.distribution:149 core: Failed to import module prompts: No module named 'llama_stack.providers.registry.prompts'" ``` ## Test Plan Test added --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-09-17 10:24:58 +02:00
IAN MILLER	ab321739f2	feat: create HTTP DELETE API endpoints to unregister ScoringFn and Benchmark resources in Llama Stack (#3371 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR provides functionality for users to unregister ScoringFn and Benchmark resources for `scoring` and `eval` APIs. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3051 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Updated integration and unit tests via CI workflow	2025-09-15 12:43:38 -07:00
Matthew Farrellee	01bdcce4d2	chore(recorder): update mocks to be closer to non-mock environment (#3442 ) # What does this PR do? the @required_args decorator in openai-python is masking the async nature of the {AsyncCompletions,chat.AsyncCompletions}.create method. see https://github.com/openai/openai-python/issues/996 this means two things - 0. we cannot use iscoroutine in the recorder to detect async vs non 1. our mocks are inappropriately introducing identifiable async for (0), we update the iscoroutine check w/ detection of /v1/models, which is the only non-async function we mock & record. for (1), we could leave everything as is and assume (0) will catch errors. to be defensive, we update the unit tests to mock below create methods, allowing the true openai-python create() methods to be tested.	2025-09-15 15:25:53 -04:00
Matthew Farrellee	6787755c0c	chore(recorder): add support for NOT_GIVEN (#3430 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 8s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 4s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s Details Python Package Build Test / build (3.12) (push) Failing after 14s Details UI Tests / ui-tests (22) (push) Successful in 41s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Pre-commit / pre-commit (push) Successful in 1m31s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details # What does this PR do? the recorder mocks the openai-python interface. the openai-python interface allows NOT_GIVEN as an input option. this change properly handles NOT_GIVEN. ## Test Plan ci (coverage for chat, completions, embeddings)	2025-09-13 11:11:38 -07:00
Matthew Farrellee	3de9ad0a87	chore(recorder, tests): add test for openai /v1/models (#3426 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 1m19s Details # What does this PR do? - [x] adds a test for the recorder's handling of /v1/models - [x] adds a fix for /v1/models handling ## Test Plan ci	2025-09-12 14:59:56 -07:00
Francisco Arceo	e2fe39aee1	feat!: Migrate Vector DB IDs to Vector Store IDs (breaking change) (#3253 ) Some checks failed Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 35s Details Pre-commit / pre-commit (push) Successful in 1m15s Details # What does this PR do? This change migrates the VectorDB id generation to Vector Stores. This is a breaking change for _some users_ that may have application code using the `vector_db_id` parameter in the request of the VectorDB protocol instead of the `VectorDB.identifier` in the response. By default we will now create a Vector Store every time we register a VectorDB. The caveat with this approach is that this maps the `vector_db_id` → `vector_store.name`. This is a reasonable tradeoff to transition users towards OpenAI Vector Stores. As an added benefit, registering VectorDBs will result in them appearing in the VectorStores admin UI. ### Why? This PR makes the `POST` API call to `/v1/vector-dbs` swap the `vector_db_id` parameter in the request body into the VectorStore's name field and sets the `vector_db_id` to the generated vector store id (e.g., `vs_038247dd-4bbb-4dbb-a6be-d5ecfd46cfdb`). That means that users would have to do something like follows in their application code: ```python res = client.vector_dbs.register( vector_db_id='my-vector-db-id', embedding_model='ollama/all-minilm:l6-v2', embedding_dimension=384, ) vector_db_id = res.identifier ``` And then the rest of their code would behave, including `VectorIO`'s insert protocol using `vector_db_id` in the request. An alternative implementation would be to just delete the `vector_db_id` parameter in `VectorDB` but the end result would still require users having to write `vector_db_id = res.identifier` since `VectorStores.create()` generates the ID for you. So this approach felt the easiest way to migrate users towards VectorStores (subsequent PRs will be added to trigger `files.create()` and `vector_stores.files.create()`). ## Test Plan Unit tests and integration tests have been added. Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-09-05 15:40:34 +02:00
Ashwin Bharambe	c3d3a0b833	feat(tests): auto-merge all model list responses and unify recordings (#3320 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Python Package Build Test / build (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 14s Details UI Tests / ui-tests (22) (push) Successful in 1m7s Details Pre-commit / pre-commit (push) Successful in 2m34s Details One needed to specify record-replay related environment variables for running integration tests. We could not use defaults because integration tests could be run against Ollama instances which could be running different models. For example, text vs vision tests needed separate instances of Ollama because a single instance typically cannot serve both of these models if you assume the standard CI worker configuration on Github. As a result, `client.list()` as returned by the Ollama client would be different between these runs and we'd end up overwriting responses. This PR "solves" it by adding a small amount of complexity -- we store model list responses specially, keyed by the hashes of the models they return. At replay time, we merge all of them and pretend that we have the union of all models available. ## Test Plan Re-recorded all the tests using `scripts/integration-tests.sh --inference-mode record`, including the vision tests.	2025-09-03 11:33:03 -07:00
Derek Higgins	7ca8233889	feat(testing): remove SQLite dependency from inference recorder (#3254 ) Recording files use a predictable naming format, making the SQLite index redundant. The binary SQLite file was causing frequent git conflicts. Simplify by calculating file paths directly from request hashes. Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-08-26 09:17:00 -07:00
Mustafa Elbehery	1790fc0f25	feat: Remove initialize() Method from LlamaStackAsLibrary (#2979 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR removes `init()` from `LlamaStackAsLibrary` Currently client.initialize() had to be invoked by user. To improve dev experience and to avoid runtime errors, this PR init LlamaStackAsLibrary implicitly upon using the client. It prevents also multiple init of the same client, while maintaining backward ccompatibility. This PR does the following - Automatic Initialization: Constructor calls initialize_impl() automatically. - Client is fully initialized after __init__ completes. - Prevents consecutive initialization after the client has been successfully initialized. - initialize() method still exists but is now a no-op. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> fixes https://github.com/meta-llama/llama-stack/issues/2946 --------- Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-21 15:59:04 -07:00
IAN MILLER	e12524af85	feat: create unregister shield API endpoint in Llama Stack (#2853 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Integration Tests (Replay) / discover-tests (push) Successful in 13s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s Details Test External API and Providers / test-external (venv) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 2m21s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Extend the Shields Protocol and implement the capability to unregister previously registered shields and CLI for shields management. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2581 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> First of, test API for shields 1. Install and start Ollama: `ollama serve` 2. Pull Llama Guard Model in Ollama: `ollama pull llama-guard3:8b` 3. Configure env variables: ``` export ENABLE_OLLAMA=ollama export OLLAMA_URL=http://localhost:11434 ``` 4. Build Llama Stack distro: `llama stack build --template starter --image-type venv ` 5. Start Llama Stack server: `llama stack run starter --port 8321` 6. Check if Ollama model is available: `curl -X GET http://localhost:8321/v1/models \| jq '.data[] \| select(.provider_id=="ollama")'` 7. Register a new Shield using Ollama provider: ``` curl -X POST http://localhost:8321/v1/shields \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "provider_id": "llama-guard", "provider_shield_id": "ollama/llama-guard3:8b", "params": {} }' ``` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 8. Check if shield was registered: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 9. Run shield: ``` curl -X POST http://localhost:8321/v1/safety/run-shield \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "messages": [ { "role": "user", "content": "How can I hack into someone computer?" } ], "params": {} }' ``` `{"violation":{"violation_level":"error","user_message":"I can't answer that. Can I help with something else?","metadata":{"violation_type":"S2"}}}% ` 10. Unregister shield: `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `null% ` 11. Verify shield was deleted: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"detail":"Invalid value: Shield 'test-shield' not found"}%` All tests passed ✅ ``` ========================================================================== 430 passed, 194 warnings in 19.54s ========================================================================== /Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited loop.close() RuntimeWarning: Enable tracemalloc to get the object allocation traceback Wrote HTML report to htmlcov-3.12/index.html ```	2025-08-05 07:33:46 -07:00
Ashwin Bharambe	cc87995e2b	chore: rename templates to distributions (#3035 ) As the title says. Distributions is in, Templates is out. `llama stack build --template` --> `llama stack build --distro`. For backward compatibility, the previous option is kept but results in a warning. Updated `server.py` to remove the "config_or_template" backward compatibility since it has been a couple releases since that change.	2025-08-04 11:34:17 -07:00
IAN MILLER	a749d5f4a4	refactor: remove Conda support from Llama Stack (#2969 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for removal of Conda support in Llama Stack <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2539 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-02 15:52:59 -07:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00
Matthew Farrellee	b69bafba30	fix(library_client): improve initialization error handling and prevent AttributeError (#2944 ) # What does this PR do? - Initialize route_impls to None in constructor to prevent AttributeError - Consolidate initialization checks to single point in request() method - Improve error message to be more helpful ("Please call initialize() first") - Add comprehensive test suite to prevent regressions The library client now has better error handling when users forget to call initialize(), showing a clear ValueError instead of confusing AttributeError. All initialization validation is now centralized in the request() method, with internal methods (_call_non_streaming, _call_streaming, _convert_body) relying on this single check for cleaner, more maintainable code. closes #2943 ## Test Plan `./scripts/unit-tests.sh`	2025-07-30 11:58:47 -04:00
Ashwin Bharambe	08b4a1deb3	feat(tests): introduce inference record/replay to increase test reliability (#2941 ) Implements a comprehensive recording and replay system for inference API calls that eliminates dependency on online inference providers during testing. The system treats inference as deterministic by recording real API responses and replaying them in subsequent test runs. Applies to OpenAI clients (which should cover many inference requests) as well as Ollama AsyncClient. For storing, we use a hybrid system: Sqlite for fast lookups and JSON files for easy greppability / debuggability. As expected, tests become much much faster (more than 3x in just inference testing.) ```bash LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \ uv run pytest -s -v tests/integration/inference \ --stack-config=starter \ -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \ --text-model="ollama/llama3.2:3b-instruct-fp16" \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 ``` ```bash LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \ uv run pytest -s -v tests/integration/inference \ --stack-config=starter \ -k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \ --text-model="ollama/llama3.2:3b-instruct-fp16" \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 ``` - `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or `replay` - `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified for record or replay modes)	2025-07-29 12:41:31 -07:00
Christian Zaccaria	c48dcafc77	fix: Fix unit tests CI and failing tests (#2928 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - Added `set -e` to the beginning of the unit test script to ensure the script exits on failure and correctly fails the CI when tests do not pass. - Fixed all unit tests that were silently failing in the CI. - Fixed Python 3.13 unit test CI failing silently. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2877 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> - Previously: Unit tests passing in CI eventhough it failed 11 tests -> [CI-run](`4683681501 (step)`:4:2097) - Made the fix. Now, ensuring CI fails as expected on test failures: Unit tests failing in CI with 1 failed test -> [CI-run](`4684234247 (step)`:4:1506) - This PR shows the CI passing and all unit tests passing.	2025-07-28 10:07:26 -07:00
Ashwin Bharambe	9583f468f8	feat(starter)!: simplify starter distro; litellm model registry changes (#2916 )	2025-07-25 15:02:04 -07:00
Charlie Doern	de6919ecdd	refactor: install external providers from module (#2637 ) # What does this PR do? Today, external providers are installed via the `external_providers_dir` in the config. This necessitates users to understand the `ProviderSpec` and set up their directories accordingly. This process splits up the config for the stack across multiple files, directories, and formats. Most (if not all) external providers today have a [get_provider_spec](`559cb18fbb/src/ramalama_stack/provider.py (L9)`) method that sits unused. Utilizing this method rather than the providers.d route allows for a much easier installation process for external providers and limits the amount of extra configuration a regular user has to do to get their stack off the ground. To accomplish this and wire it throughout the build process, Introduce the concept of a `module` for users to specify for an external provider upon build time. In order to facilitate this, align the build and run spec to use `Provider` class rather than the stringified provider_type that build currently uses. For example, say this is in your build config: ``` - provider_id: ramalama provider_type: remote::ramalama module: ramalama_stack ``` during build (in the various `build_...` scripts), additionally to installing any pip dependencies we will also install this module and use the `get_provider_spec` method to retrieve the ProviderSpec that is currently specified using `providers.d`. In production so far, providing instructions for installing external providers for users has been difficult: they need to install the module as a pre-req, create the providers.d directory, copy in the provider spec, and also copy in the necessary build/run yaml files. Accessing an external provider should be as easy as possible, and pointing to its installable module aligns more with the rest of our build and dependency management process. For now, `external_providers_dir` still exists as an alternate more declarative method of using external providers. ## Test Plan added an integration test installing an external provider from module and more unit test coverage for `get_provider_registry` ( the warning in yellow is expected, the module is installed inside of the build env, not where we are running the command) <img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30 48 AM" src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d" /> <img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31 14 AM" src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-25 15:41:26 +02:00
Ashwin Bharambe	1463b79218	feat(registry): make the Stack query providers for model listing (#2862 ) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider.	2025-07-24 10:39:53 -07:00
Ashwin Bharambe	3b83032555	feat(registry): more flexible model lookup (#2859 ) This PR updates model registration and lookup behavior to be slightly more general / flexible. See https://github.com/meta-llama/llama-stack/issues/2843 for more details. Note that this change is backwards compatible given the design of the `lookup_model()` method. ## Test Plan Added unit tests	2025-07-22 15:22:48 -07:00
Francisco Arceo	c8f274347d	chore: Adding Access Control for OpenAI Vector Stores methods (#2772 ) # What does this PR do? Refactors the vector store routing logic by moving OpenAI-compatible vector store operations from the `VectorIORouter` to the `VectorDBsRoutingTable`. Closes https://github.com/meta-llama/llama-stack/issues/2761 ## Test Plan Added unit tests to cover new routing logic and ACL checks. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-21 16:22:44 -04:00
Matthew Farrellee	30b2e6a495	chore: default to pytest asyncio-mode=auto (#2730 ) # What does this PR do? previously, developers who ran `./scripts/unit-tests.sh` would get `asyncio-mode=auto`, which meant `@pytest.mark.asyncio` and `@pytest_asyncio.fixture` were redundent. developers who ran `pytest` directly would get pytest's default (strict mode), would run into errors leading them to add `@pytest.mark.asyncio` / `@pytest_asyncio.fixture` to their code. with this change - - `asyncio_mode=auto` is included in `pyproject.toml` making behavior consistent for all invocations of pytest - removes all redundant `@pytest_asyncio.fixture` and `@pytest.mark.asyncio` - for good measure, requires `pytest>=8.4` and `pytest-asyncio>=1.0` ## Test Plan - `./scripts/unit-tests.sh` - `uv run pytest tests/unit`	2025-07-11 13:00:24 -07:00
Sébastien Han	ac5fd57387	chore: remove nested imports (#2515 ) # What does this PR do? * Given that our API packages use "import " in `__init.py__` we don't need to do `from llama_stack.apis.models.models` but simply from llama_stack.apis.models. The decision to use `import ` is debatable and should probably be revisited at one point. * Remove unneeded Ruff F401 rule * Consolidate Ruff F403 rule in the pyprojectfrom llama_stack.apis.models.models Signed-off-by: Sébastien Han <seb@redhat.com>	2025-06-26 08:01:05 +05:30
grs	7c1998db25	feat: fine grained access control policy (#2264 ) This allows a set of rules to be defined for determining access to resources. The rules are (loosely) based on the cedar policy format. A rule defines a list of action either to permit or to forbid. It may specify a principal or a resource that must match for the rule to take effect. It may also specify a condition, either a 'when' or an 'unless', with additional constraints as to where the rule applies. A list of rules is held for each type to be protected and tried in order to find a match. If a match is found, the request is permitted or forbidden depening on the type of rule. If no match is found, the request is denied. If no rules are specified for a given type, a rule that allows any action as long as the resource attributes match the user attributes is added (i.e. the previous behaviour is the default. Some examples in yaml: ``` model: - permit: principal: user-1 actions: [create, read, delete] comment: user-1 has full access to all models - permit: principal: user-2 actions: [read] resource: model-1 comment: user-2 has read access to model-1 only - permit: actions: [read] when: user_in: resource.namespaces comment: any user has read access to models with matching attributes vector_db: - forbid: actions: [create, read, delete] unless: user_in: role::admin comment: only user with admin role can use vector_db resources ``` --------- Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-06-03 14:51:12 -07:00
Ashwin Bharambe	ce33d02443	fix(tools): do not index tools, only index toolgroups (#2261 ) When registering a MCP endpoint, we cannot list tools (like we used to) since the MCP endpoint may be behind an auth wall. Registration can happen much sooner (via run.yaml). Instead, we do listing only when the _user_ actually calls listing. Furthermore, we cache the list in-memory in the server. Currently, the cache is not invalidated -- we may want to periodically re-list for MCP servers. Note that they must call `list_tools` before calling `invoke_tool` -- we use this critically. This will enable us to list MCP servers in run.yaml ## Test Plan Existing tests, updated tests accordingly.	2025-05-25 13:27:52 -07:00
Ashwin Bharambe	298721c238	chore: split routing_tables into individual files (#2259 )	2025-05-24 23:15:05 -07:00
Derek Higgins	2e807b38cc	chore: Add fixtures to conftest.py (#2067 ) Add fixtures for SqliteKVStore, DiskDistributionRegistry and CachedDiskDistributionRegistry. And use them in tests that had all been duplicating similar setups. ## Test Plan unit tests continue to run Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-06 13:57:48 +02:00
Ihar Hrachyshka	9e6561a1ec	chore: enable pyupgrade fixes (#1806 ) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 14:23:50 -07:00
Derek Higgins	78ef6a6099	chore: Increase unit test coverage of routing_tables.py (#2057 ) # What does this PR do? Adds some unit tests for the routing logic ## Test Plan Overall unit test coverage goes from TOTAL 12434 8030 35% to TOTAL 12434 7871 37% Better coverage on router.py, before: ``` llama_stack/distribution/routers/routers.py \| 342 \| 219 \| 0 \| 36% llama_stack/distribution/routers/routing_tables.py \| 346 \| 236 \| 0 \| 32% ``` After: ``` llama_stack/distribution/routers/routers.py \| 342 \| 219 \| 0 \| 36% llama_stack/distribution/routers/routing_tables.py \| 349 \| 89 \| 0 \| 74% ``` Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-04-30 16:00:43 +02:00
Sébastien Han	14e60e3c02	feat: include run.yaml in the container image (#2005 ) As part of the build process, we now include the generated run.yaml (based of the provided build configuration file) into the container. We updated the entrypoint to use this run configuration as well. Given this simple distribution configuration: ``` # build.yaml version: '2' distribution_spec: description: Use (an external) Ollama server for running LLM inference providers: inference: - remote::ollama vector_io: - inline::faiss safety: - inline::llama-guard agents: - inline::meta-reference telemetry: - inline::meta-reference eval: - inline::meta-reference datasetio: - remote::huggingface - inline::localfs scoring: - inline::basic - inline::llm-as-judge - inline::braintrust tool_runtime: - remote::brave-search - remote::tavily-search - inline::code-interpreter - inline::rag-runtime - remote::model-context-protocol - remote::wolfram-alpha container_image: "registry.access.redhat.com/ubi9" image_type: container image_name: test ``` Build it: ``` llama stack build --config build.yaml ``` Run it: ``` podman run --rm \ -p 8321:8321 \ -e OLLAMA_URL=http://host.containers.internal:11434 \ --name llama-stack-server \ localhost/leseb-test:0.2.2 ``` Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-24 11:29:53 +02:00
Alexey Rybak	8f57b08f2c	fix(build): always pass path when no template/config provided (#1982 ) # What does this PR do? Fixes a crash that occurred when building a stack as a container image via the interactive wizard without supplying --template or --config. - Root cause: template_or_config was None; only the container path relies on that parameter, which later reaches subprocess.run() and triggers `TypeError: expected str, bytes or os.PathLike object, not NoneType.` - Change: in `_run_stack_build_command_from_build_config` we now fall back to the freshly‑written build‑spec file whenever both optional sources are missing. Also adds a spy‑based unit test that asserts a valid string path is passed to build_image() for container builds. ### Closes #1976 ## Test Plan - New unit test: test_build_path.py. Monkey‑patches build_image, captures the fourth argument, and verifies it is a real path - Manual smoke test: ``` llama stack build --image-type container # answer wizard prompts ``` Build proceeds into Docker without raising the previous TypeError. ## Future Work Harmonise `build_image` arguments so every image type receives the same inputs, eliminating this asymmetric special‑case.	2025-04-17 10:20:43 +02:00
Sébastien Han	389767010b	feat: ability to execute external providers (#1672 ) # What does this PR do? Providers that live outside of the llama-stack codebase are now supported. A new property `external_providers_dir` has been added to the main config and can be configured as follow: ``` external_providers_dir: /etc/llama-stack/providers.d/ ``` Where the expected structure is: ``` providers.d/ inference/ custom_ollama.yaml vllm.yaml vector_io/ qdrant.yaml ``` Where `custom_ollama.yaml` is: ``` adapter: adapter_type: custom_ollama pip_packages: ["ollama", "aiohttp"] config_class: llama_stack_ollama_provider.config.OllamaImplConfig module: llama_stack_ollama_provider api_dependencies: [] optional_api_dependencies: [] ``` Obviously the package must be installed on the system, here is the `llama_stack_ollama_provider` example: ``` $ uv pip show llama-stack-ollama-provider Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv Name: llama-stack-ollama-provider Version: 0.1.0 Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider Requires: Required-by: ``` Closes: https://github.com/meta-llama/llama-stack/issues/658 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-09 10:30:41 +02:00
Sébastien Han	7d9adf22ad	refactor: move missing tests to test directory (#1892 ) Move the test_context.py under the main tests directory, and fix the code. The problem was that the function captures the initial values of the context variables and then restores those same initial values before each iteration. This means that any modifications made to the context variables during iteration are lost when the next iteration starts. Error was: ``` ====================================================== FAILURES ======================================================= ______________________________________ test_preserve_contexts_across_event_loops ______________________________________ @pytest.mark.asyncio async def test_preserve_contexts_across_event_loops(): """ Test that context variables are preserved across event loop boundaries with nested generators. This simulates the real-world scenario where: 1. A new event loop is created for each streaming request 2. The async generator runs inside that loop 3. There are multiple levels of nested generators 4. Context needs to be preserved across these boundaries """ # Create context variables request_id = ContextVar("request_id", default=None) user_id = ContextVar("user_id", default=None) # Set initial values # Results container to verify values across thread boundaries results = [] # Inner-most generator (level 2) async def inner_generator(): # Should have the context from the outer scope yield (1, request_id.get(), user_id.get()) # Modify one context variable user_id.set("user-modified") # Should reflect the modification yield (2, request_id.get(), user_id.get()) # Middle generator (level 1) async def middle_generator(): inner_gen = inner_generator() # Forward the first yield from inner item = await inner_gen.__anext__() yield item # Forward the second yield from inner item = await inner_gen.__anext__() yield item request_id.set("req-modified") # Add our own yield with both modified variables yield (3, request_id.get(), user_id.get()) # Function to run in a separate thread with a new event loop def run_in_new_loop(): # Create a new event loop for this thread loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) try: # Outer generator (runs in the new loop) async def outer_generator(): request_id.set("req-12345") user_id.set("user-6789") # Wrap the middle generator wrapped_gen = preserve_contexts_async_generator(middle_generator(), [request_id, user_id]) # Process all items from the middle generator async for item in wrapped_gen: # Store results for verification results.append(item) # Run the outer generator in the new loop loop.run_until_complete(outer_generator()) finally: loop.close() # Run the generator chain in a separate thread with a new event loop with ThreadPoolExecutor(max_workers=1) as executor: future = executor.submit(run_in_new_loop) future.result() # Wait for completion # Verify the results assert len(results) == 3 # First yield should have original values assert results[0] == (1, "req-12345", "user-6789") # Second yield should have modified user_id assert results[1] == (2, "req-12345", "user-modified") # Third yield should have both modified values > assert results[2] == (3, "req-modified", "user-modified") E AssertionError: assert (3, 'req-modified', 'user-6789') == (3, 'req-modified', 'user-modified') E E At index 2 diff: 'user-6789' != 'user-modified' E E Full diff: E ( E 3, E 'req-modified', E - 'user-modified', E + 'user-6789', E ) tests/unit/distribution/test_context.py:155: AssertionError -------------------------------------------------- Captured log call -------------------------------------------------- ERROR asyncio:base_events.py:1758 Task was destroyed but it is pending! task: <Task pending name='Task-7' coro=<<async_generator_athrow without __name__>()>> ================================================== warnings summary =================================================== .venv/lib/python3.10/site-packages/pydantic/fields.py:1042 /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pydantic/fields.py:1042: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'contentEncoding'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/ warn( -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =============================================== short test summary info =============================================== FAILED tests/unit/distribution/test_context.py::test_preserve_contexts_across_event_loops - AssertionError: assert (3, 'req-modified', 'user-6789') == (3, 'req-modified', 'user-modified') At index 2 diff: 'user-6789' != 'user-modified' Full diff: ( 3, 'req-modified', - 'user-modified', + 'user-6789', ) ``` [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-08 18:54:00 -07:00

38 commits