llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

Author	SHA1	Message	Date
Emilio Garcia	5984ae6a76	Merge branch 'main' into auto_instrument_1	2025-11-18 15:06:15 -05:00
Anastas Stoyanovsky	a3580e6bc0	feat!: Wire through parallel_tool_calls to Responses API (#4124 ) # What does this PR do? Initial PR against #4123 Adds `parallel_tool_calls` spec to Responses API and basic initial implementation where no more than one function call is generated when set to `False`. ## Test Plan * Unit tests have been added to verify no more than one function call is generated. * A followup PR will verify passing through `parallel_tool_calls` to providers. * A followup PR will address verification and/or implementation of incremental function calling across multiple conversational turns. --------- Signed-off-by: Anastas Stoyanovsky <astoyano@redhat.com>	2025-11-18 11:25:08 -08:00
Emilio Garcia	cb357dd7a2	fix(docs): clean up branding and wording in telemetry docs	2025-11-18 10:58:52 -05:00
Emilio Garcia	7d8cef6c71	Merge branch 'main' into auto_instrument_1	2025-11-17 20:12:37 -05:00
Theofanis Petkos	5fe6098350	docs: Improvements on `provider_codegen` for type hints and multi-line yaml descriptions (#4033 ) # What does this PR do? This PR improves type hint cleanup in auto-generated provider documentation by adding regex logic. Issues Fixed: - Type hints with missing closing brackets (e.g., `list[str` instead of `list[str]`) - Types showing as `<class 'bool'>`, `<class 'str'>` instead of `bool`, `str` - The multi-line YAML frontmatter in index documentation files wasn't ideal, so we now add the proper `\|` character. Changes: 1. Replaced string replacement (`.replace`) with regex-based type cleaning to preserve the trailing bracket in case of `list` and `dict`. 2. Adds the `\|` character for multi-line YAML descriptions. 3. I have regenerated the docs. However, let me know if that's not needed. ## Test Plan 1. Ran uv run python scripts/provider_codegen.py - successfully regenerated all docs 2. We can see that the updated docs handle correctly type hint cleanup and multi-line yaml descriptions have now the `\|` character. ### Note to the reviewer(s) This is my first contribution to your lovely repo! Initially I was going thourgh docs (wanted to use `remote::gemini` as provider) and realized the issue. I've read the [CONTRIBUTING.md](https://github.com/llamastack/llama-stack/blob/main/CONTRIBUTING.md) and decided to open the PR. Let me know if there's anything I did wrong and I'll update my PR! --------- Signed-off-by: thepetk <thepetk@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-17 12:35:28 -08:00
Omar Abdelwahab	fe91d331ef	fix: Remove authorization from provider data (#4161 ) # What does this PR do? - Remove backward compatibility for authorization in mcp_headers - Enforce authorization must use dedicated parameter - Add validation error if Authorization found in provider_data headers - Update test_mcp.py to use authorization parameter - Update test_mcp_json_schema.py to use authorization parameter - Update test_tools_with_schemas.py to use authorization parameter - Update documentation to show the change in the authorization approach Breaking Change: - Authorization can no longer be passed via mcp_headers in provider_data - Users must use the dedicated 'authorization' parameter instead - Clear error message guides users to the new approach" ## Test Plan CI --------- Co-authored-by: Omar Abdelwahab <omara@fb.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-17 12:16:35 -08:00
Emilio Garcia	8eb0f78b40	Merge branch 'main' into auto_instrument_1	2025-11-17 14:37:40 -05:00
Ashwin Bharambe	f648cacdad	fix(openapi): restore embedded request wrappers (#4176 ) FastAPI generator now only unwraps body params explicitly marked with Body(embed=False) so the /eval run_eval schema once again exposes RunEvalRequest, matching our integration tests and the server's request parsing. Regenerated the OpenAPI specs to capture the restored wrapper. CI on the Stainless preview builds should be green.	2025-11-17 11:36:23 -08:00
Emilio Garcia	f0324d4222	fix(rebase): pre-commit formatting fixes caused by rebase	2025-11-17 12:40:26 -05:00
Emilio Garcia	4ef8982209	docs(telemetry): update docs to reflect the telemetry re-architecture	2025-11-17 12:28:16 -05:00
Sébastien Han	97f535c4f1	feat(openapi): switch to fastapi-based generator (#3944 ) Some checks failed Pre-commit / pre-commit (push) Successful in 3m27s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test llama stack list-deps / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test llama stack list-deps / show-single-provider (push) Successful in 25s Details Test External API and Providers / test-external (venv) (push) Failing after 34s Details Vector IO Integration Tests / test-matrix (push) Failing after 43s Details Test Llama Stack Build / build (push) Successful in 37s Details Test Llama Stack Build / build-single-provider (push) Successful in 48s Details Test llama stack list-deps / list-deps-from-config (push) Successful in 52s Details Test llama stack list-deps / list-deps (push) Failing after 52s Details Python Package Build Test / build (3.13) (push) Failing after 1m2s Details UI Tests / ui-tests (22) (push) Successful in 1m15s Details Test Llama Stack Build / build-custom-container-distribution (push) Successful in 1m29s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m45s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 1m54s Details Unit Tests / unit-tests (3.13) (push) Failing after 2m13s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s Details # What does this PR do? This replaces the legacy "pyopenapi + strong_typing" pipeline with a FastAPI-backed generator that has an explicit schema registry inside `llama_stack_api`. The key changes: 1. New generator architecture. FastAPI now builds the OpenAPI schema directly from the real routes, while helper modules (`schema_collection`, `endpoints`, `schema_transforms`, etc.) post-process the result. The old pyopenapi stack and its strong_typing helpers are removed entirely, so we no longer rely on fragile AST analysis or top-level import side effects. 2. Schema registry in `llama_stack_api`. `schema_utils.py` keeps a `SchemaInfo` record for every `@json_schema_type`, `register_schema`, and dynamically created request model. The OpenAPI generator and other tooling query this registry instead of scanning the package tree, producing deterministic names (e.g., `{MethodName}Request`), capturing all optional/nullable fields, and making schema discovery testable. A new unit test covers the registry behavior. 3. Regenerated specs + CI alignment. All docs/Stainless specs are regenerated from the new pipeline, so optional/nullable fields now match reality (expect the API Conformance workflow to report breaking changes—this PR establishes the new baseline). The workflow itself is back to the stock oasdiff invocation so future regressions surface normally. Conformance will be RED on this PR; we choose to accept the deviations. ## Test Plan - `uv run pytest tests/unit/server/test_schema_registry.py` - `uv run python -m scripts.openapi_generator.main docs/static` --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-14 15:53:53 -08:00
Omar Abdelwahab	eb545034ab	fix: MCP authorization parameter implementation (#4052 ) # What does this PR do? Adding a user-facing `authorization ` parameter to MCP tool definitions that allows users to explicitly configure credentials per MCP server, addressing GitHub Issue #4034 in a secure manner. ## Test Plan tests/integration/responses/test_mcp_authentication.py --------- Co-authored-by: Omar Abdelwahab <omara@fb.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-14 08:54:42 -08:00
Ashwin Bharambe	2441ca9389	fix(api): ensure openapi spec has deprecated routes (#4156 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / generate-matrix (push) Successful in 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test llama stack list-deps / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 19s Details Python Package Build Test / build (3.13) (push) Failing after 17s Details Test External API and Providers / test-external (venv) (push) Failing after 30s Details Test llama stack list-deps / list-deps-from-config (push) Successful in 36s Details Test Llama Stack Build / build-single-provider (push) Successful in 40s Details Test llama stack list-deps / show-single-provider (push) Successful in 48s Details Vector IO Integration Tests / test-matrix (push) Failing after 55s Details Test Llama Stack Build / build (push) Successful in 48s Details UI Tests / ui-tests (22) (push) Successful in 54s Details Test llama stack list-deps / list-deps (push) Failing after 1m34s Details Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m6s Details Unit Tests / unit-tests (3.13) (push) Failing after 2m38s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m38s Details Unit Tests / unit-tests (3.12) (push) Failing after 2m44s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m50s Details Pre-commit / pre-commit (push) Successful in 3m51s Details Deprecated doesn't mean it's "gone", it just means it is "going away" in the next major version of the package.	2025-11-13 13:16:02 -08:00
Charlie Doern	840ad75fe9	feat: split API and provider specs into separate llama-stack-api pkg (#3895 ) # What does this PR do? Extract API definitions and provider specifications into a standalone llama-stack-api package that can be published to PyPI independently of the main llama-stack server. see: https://github.com/llamastack/llama-stack/pull/2978 and https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942 Motivation External providers currently import from llama-stack, which overrides the installed version and causes dependency conflicts. This separation allows external providers to: - Install only the type definitions they need without server dependencies - Avoid version conflicts with the installed llama-stack package - Be versioned and released independently This enables us to re-enable external provider module tests that were previously blocked by these import conflicts. Changes - Created llama-stack-api package with minimal dependencies (pydantic, jsonschema) - Moved APIs, providers datatypes, strong_typing, and schema_utils - Updated all imports from llama_stack.* to llama_stack_api.* - Configured local editable install for development workflow - Updated linting and type-checking configuration for both packages Next Steps - Publish llama-stack-api to PyPI - Update external provider dependencies - Re-enable external provider module tests Pre-cursor PRs to this one: - #4093 - #3954 - #4064 These PRs moved key pieces _out_ of the Api pkg, limiting the scope of change here. relates to #3237 ## Test Plan Package builds successfully and can be imported independently. All pre-commit hooks pass with expected exclusions maintained. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-13 11:51:17 -08:00
Francisco Arceo	4442b24de7	chore: Fix docs so can be deployed (#4149 ) # What does this PR do? Building/Deploying docs is failing here: `5530320962 (step)`:8:49 Needs the playground file. Updated it to reflect current admin status. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-11-13 09:15:32 -08:00
Akram Ben Aissi	9eb81439d2	docs: Add comprehensive Files API and Vector Store integration doc (#3279 ) docs: Add comprehensive Files API and Vector Store integration documentation - Add Files API documentation with OpenAI-compatible endpoints - Create comprehensive guide for OpenAI-compatible file operations - Reorganize documentation structure: move file operations to files/ directory - Add vector store provider documentation for Milvus, SQLite-vec, FAISS - Clean up redundant files and improve navigation - Update cross-references and eliminate documentation duplication - Support for release 0.2.14 FileResponse and Vector Store API features # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-11-13 08:50:06 -05:00
Derek Higgins	356f37b1ba	docs: clarify model identification uses provider_model_id not model_id (#4128 ) Updated documentation to accurately reflect current behavior where models are identified as provider_id/provider_model_id in the system. Changes: o Clarify that model_id is for configuration purposes only o Explain models are accessed as provider_id/provider_model_id o Remove outdated aliasing example that suggested model_id could be used as a custom identifier This corrects the documentation which previously suggested model_id could be used to create friendly aliases, which is not how the code actually works. Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-11-12 10:13:26 -08:00
Francisco Arceo	eb3f9ac278	feat: allow returning embeddings and metadata from `/vector_stores/` methods; disallow changing Provider ID (#4046 ) # What does this PR do? - Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to allow returning `embeddings` and `metadata` using the `extra_query` - Updates the UI accordingly to display them. - Update UI to support CRUD operations in the Vector Stores section and adds a new modal exposing the functionality. - Updates Vector Store update to fail if a user tries to update Provider ID (which doesn't make sense to allow) ```python In [1]: client.vector_stores.files.content( vector_store_id=vector_store.id, file_id=file.id, extra_query={"include_embeddings": True, "include_metadata": True} ) Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic -ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt') ``` Screenshots of UI are displayed below: ### List Vector Store with Added "Create New Vector Store" <img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47 25 PM" src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3" /> ### Create New Vector Store <img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47 49 PM" src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158" /> ### Edit Vector Store <img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48 32 PM" src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414" /> ### Vector Store Files Contents page (with Embeddings) <img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54 32 PM" src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27" /> ### Vector Store Files Contents Details page (with Embeddings) <img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55 00 PM" src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c" /> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Tests added for Middleware extension and Provider failures. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-11-12 09:59:48 -08:00
ehhuang	71b328fc4b	chore(ui): add npm package and dockerfile (#4100 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / generate-matrix (push) Successful in 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 53s Details # What does this PR do? - sets up package.json for npm `llama-stack-ui` package (will update llama-stack-ops) - adds dockerfile for UI docker image ## Test Plan npx: npm build && npm pack LLAMA_STACK_UI_PORT=8322 npx /Users/erichuang/projects/ui/src/llama_stack_ui/llama-stack-ui-0.4.0-alpha.2.tgz docker: cd src/llama_stack_ui docker build . -f Dockerfile --tag test_ui --no-cache ❯ docker run -p 8322:8322 \ -e LLAMA_STACK_UI_PORT=8322 \ test_ui:latest	2025-11-11 10:40:31 -08:00
paulengineer	e5a55f3677	docs: use 'uv pip' to avoid pitfalls of using 'pip' in virtual environment (#4122 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Pre-commit / pre-commit (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 9s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 25s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s Details UI Tests / ui-tests (22) (push) Successful in 53s Details # What does this PR do? In the Detailed Tutorial, at Step 3, the Install with venv option creates a new virtual environment `client`, activates it then attempts to install the llama-stack-client using pip. ``` uv venv client --python 3.12 source client/bin/activate pip install llama-stack-client <- this is the problematic line ``` However, the pip command will likely fail because the `uv venv` command doesn't, by default, include adding the pip command to the virtual environment that is created. The pip command will error either because pip doesn't exist at all, or, if the pip command does exist outside of the virtual environment, return a different error message. The latter may be unclear to the user why it is failing. This PR changes 'pip' to 'uv pip', allowing the install action to function in the virtual environment as intended, and without the need for pip to be installed. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan 1. Use linux or WSL (virtual environments on Windows use `Scripts` folder instead of `bin` [virtualenv #993ba13](`993ba1316a`) which doesn't align with the tutorial) 2. Clone the `llama-stack` repo 3. Run the following and verify success: ``` uv venv client --python 3.12 source client/bin/activate ``` 5. Run the updated command: ``` uv pip install llama-stack-client ``` 6. Observe the console output confirms that the virtual environment `client` was used: > Using Python 3.12.3 environment at: client	2025-11-11 07:49:03 -05:00
Nathan Weinberg	97ccfb5e62	refactor: inspect routes now shows all non-deprecated APIs (#4116 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Test llama stack list-deps / show-single-provider (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 21s Details UI Tests / ui-tests (22) (push) Successful in 46s Details # What does this PR do? the inspect API lacked any mechanism to get all non-deprecated APIs (v1, v1alpha, v1beta) change default to this behavior 'v1' filter can be used for user' wanting a list of stable APIs ## Test Plan 1. pull the PR 2. launch a LLS server 3. run `curl http://beanlab3.bss.redhat.com:8321/v1/inspect/routes` 4. note there are APIs for `v1`, `v1alpha`, and `v1beta` but no deprecated APIs Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-11-10 15:57:17 -08:00
Shabana Baig	433438cfc0	feat: Implement the 'max_tool_calls' parameter for the Responses API (#4062 ) # Problem Responses API uses max_tool_calls parameter to limit the number of tool calls that can be generated in a response. Currently, LLS implementation of the Responses API does not support this parameter. # What does this PR do? This pull request adds the max_tool_calls field to the response object definition and updates the inline provider. it also ensures that: - the total number of calls to built-in and mcp tools do not exceed max_tool_calls - an error is thrown if max_tool_calls < 1 (behavior seen with the OpenAI Responses API, but we can change this if needed) Closes #[3563](https://github.com/llamastack/llama-stack/issues/3563) ## Test Plan - Tested manually for change in model response w.r.t supplied max_tool_calls field. - Added integration tests to test invalid max_tool_calls parameter. - Added integration tests to check max_tool_calls parameter with built-in and function tools. - Added integration tests to check max_tool_calls parameter in the returned response object. - Recorded OpenAI Responses API behavior using a sample script: https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/max_tool_calls.py Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-10 13:21:27 -08:00
Dennis Kennetz	209a78b618	feat: add oci genai service as chat inference provider (#3876 ) # What does this PR do? Adds OCI GenAI PaaS models for openai chat completion endpoints. ## Test Plan In an OCI tenancy with access to GenAI PaaS, perform the following steps: 1. Ensure you have IAM policies in place to use service (check docs included in this PR) 2. For local development, [setup OCI cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm) and configure the CLI with your region, tenancy, and auth [here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm) 3. Once configured, go through llama-stack setup and run llama-stack (uses config based auth) like: ```bash OCI_AUTH_TYPE=config_file \ OCI_CLI_PROFILE=CHICAGO \ OCI_REGION=us-chicago-1 \ OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \ llama stack run oci ``` 4. Hit the `models` endpoint to list models after server is running: ```bash curl http://localhost:8321/v1/models \| jq ... { "identifier": "meta.llama-4-scout-17b-16e-instruct", "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q", "provider_id": "oci", "type": "model", "metadata": { "display_name": "meta.llama-4-scout-17b-16e-instruct", "capabilities": [ "CHAT" ], "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q" }, "model_type": "llm" }, ... ``` 5. Use the "display_name" field to use the model in a `/chat/completions` request: ```bash # Streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": true, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' # Non-streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": false, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' ``` 6. Try out other models from the `/models` endpoint.	2025-11-10 16:16:24 -05:00
Ashwin Bharambe	fadf17daf3	feat(api)!: deprecate register/unregister resource APIs (#4099 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Pre-commit / pre-commit (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 8s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details Mark all register_* / unregister_* APIs as deprecated across models, shields, tool groups, datasets, benchmarks, and scoring functions. This is the first step toward moving resource mutations to an `/admin` namespace as outlined in https://github.com/llamastack/llama-stack/issues/3809#issuecomment-3492931585. The deprecation flag will be reflected in the OpenAPI schema to warn API users that these endpoints are being phased out. Next step will be implementing the `/admin` route namespace for these resource management operations. - `register_model` / `unregister_model` - `register_shield` / `unregister_shield` - `register_tool_group` / `unregister_toolgroup` - `register_dataset` / `unregister_dataset` - `register_benchmark` / `unregister_benchmark` - `register_scoring_function` / `unregister_scoring_function`	2025-11-10 10:36:33 -08:00
ehhuang	d4ecbfd092	fix(vector store)!: fix file content API (#4105 ) # What does this PR do? - changed to match https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml ## Test Plan updated test CI	2025-11-10 10:16:35 -08:00
Vaishnavi Hire	4341c4c2ac	docs: Add Llama Stack Operator docs (#3983 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Add documentation for llama-stack-k8s-operator under kubernetes deployment guide. Signed-off-by: Vaishnavi Hire <vhire@redhat.com>	2025-11-10 15:29:15 +01:00
Aakanksha Duggal	b83184f7ef	feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103 ) # What does this PR do? Resolves #4102 1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and the `OpenAIResponseInputToolWebSearch.type` Literal union 2. No changes needed to tool execution logic - all `web_search` types map to the same underlying tool 3. Backward compatibility is maintained - existing `web_search`, `web_search_preview`, and `web_search_preview_2025_03_11` types continue to work 4. Added an integration test case using {"type": "web_search_2025_08_26"} to verify it works correctly 5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to reflect that `web_search_2025_08_26` is now supported. 6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist in the codebase) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Aakanksha Duggal <aduggal@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-07 10:01:12 -08:00
Ashwin Bharambe	f49cb0b717	chore: Stack server no longer depends on llama-stack-client (#4094 ) This dependency has been bothering folks for a long time (cc @leseb). We really needed it due to "library client" which is primarily used for our tests and is not a part of the Stack server. Anyone who needs to use the library client can certainly install `llama-stack-client` in their environment to make that work. Updated the notebook references to install `llama-stack-client` additionally when setting things up.	2025-11-07 09:54:09 -08:00
Sumanth Kamenani	e894e36eea	feat: add OpenAI-compatible Bedrock provider (#3748 ) Some checks failed Pre-commit / pre-commit (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s Details UI Tests / ui-tests (22) (push) Successful in 48s Details Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock. Closes: #3410 ## What does this PR do? Adds AWS Bedrock as an inference provider using the OpenAI-compatible endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the standard llama-stack inference API. The implementation uses LiteLLM's OpenAI client under the hood, so it gets all the OpenAI compatibility features. The provider handles per-request API key overrides via headers. ## Test Plan Tested the following scenarios: - Non-streaming completion - basic request/response flow - Streaming completion - SSE streaming with chunked responses - Multi-turn conversations - context retention across turns - Tool calling - function calling with proper tool_calls format # Bedrock OpenAI-Compatible Provider - Test Results Model: `bedrock-inference/openai.gpt-oss-20b-1:0` --- ## Test 1: Model Listing Request: ```http GET /v1/models HTTP/1.1 ``` Response: ```http HTTP/1.1 200 OK Content-Type: application/json { "data": [ {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}, {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...} ] } ``` --- ## Test 2: Non-Streaming Completion Request: ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}], "stream": false } ``` Response: ```http HTTP/1.1 200 OK Content-Type: application/json { "choices": [{ "finish_reason": "stop", "message": {"content": "...Hello from Bedrock"} }], "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129} } ``` --- ## Test 3: Streaming Completion Request: ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Count from 1 to 5"}], "stream": true } ``` Response: ```http HTTP/1.1 200 OK Content-Type: text/event-stream [6 SSE chunks received] Final content: "1, 2, 3, 4, 5" ``` --- ## Test 4: Error Handling - Invalid Model Request: ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "invalid-model-id", "messages": [{"role": "user", "content": "Hello"}], "stream": false } ``` Response: ```http HTTP/1.1 404 Not Found Content-Type: application/json { "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models." } ``` --- ## Test 5: Multi-Turn Conversation Request 1: ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "My name is Alice"}] } ``` Response 1: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Nice to meet you, Alice! How can I help you today?"} }] } ``` Request 2 (with history): ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "user", "content": "My name is Alice"}, {"role": "assistant", "content": "...Nice to meet you, Alice!..."}, {"role": "user", "content": "What is my name?"} ] } ``` Response 2: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Your name is Alice."} }], "usage": {"prompt_tokens": 183, "completion_tokens": 42} } ``` Context retained across turns --- ## Test 6: System Messages Request: ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."}, {"role": "user", "content": "Tell me about the weather"} ] } ``` Response: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "Lo! I heed thy request..."} }], "usage": {"completion_tokens": 813} } ``` --- ## Test 7: Tool Calling Request: ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}], "tools": [{ "type": "function", "function": { "name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}} } }] } ``` Response: ```http HTTP/1.1 200 OK { "choices": [{ "finish_reason": "tool_calls", "message": { "tool_calls": [{ "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"} }] } }] } ``` --- ## Test 8: Sampling Parameters Request: ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "Say hello"}], "temperature": 0.7, "top_p": 0.9 } ``` Response: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! 👋 How can I help you today?"} }] } ``` --- ## Test 9: Authentication Error Handling ### Subtest A: Invalid API Key Request: ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` Response: ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ``` --- ### Subtest B: Empty API Key (Fallback to Config) Request: ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": ""} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` Response: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! How can I assist you today?"} }] } ``` Fell back to config key --- ### Subtest C: Malformed Token Request: ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` Response: ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ```	2025-11-06 17:18:18 -08:00
Ashwin Bharambe	a2c4c12384	chore(ui): remove the Streamlit UI (#4097 )	2025-11-06 15:51:57 -08:00
Ashwin Bharambe	bef1b044bd	refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 48s Details We'd like to remove the dependence of `llama-stack` on `llama-stack-client`. This is a necessary step. A few small cleanups - Enables `embeddings` now also - Remove ModelRegistryHelper dependency (unused) - Consolidate to auth_credential field via RemoteInferenceProviderConfig - Implement list_models() to fetch from downstream /v1/models ## Test Plan Tested using this script https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581 Output: ``` Listing models from downstream server... Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct -v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5'] Using LLM model: passthrough/ollama/llama3.2-vision:11b Making inference request... Response: 4. --- Testing streaming --- Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None) ... 5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None) ```	2025-11-05 18:15:11 -08:00
Roy Belio	c672a5d792	feat: ability to use postgres as store for starter distro (#4076 ) ## What does this PR do? The starter distribution now comes with all the required packages to support persistent stores—like the agent store, metadata, and inference—using PostgreSQL. Users can enable PostgreSQL support by setting the `ENABLE_POSTGRES_STORE=1` environment variable. This PR consolidates the functionality from the removed `postgres-demo` distribution into the starter distribution, reducing maintenance overhead. Closes: #2619 Supersedes: #2851 (rebased and updated) ## Changes Made 1. Added PostgreSQL support to starter distribution - New `run-with-postgres-store.yaml` configuration - Automatic config switching via `ENABLE_POSTGRES_STORE` environment variable - Removed separate `postgres-demo` distribution 2. Updated to new build system - Integrated postgres switching logic into Containerfile entrypoint - Uses new `storage_backends` and `storage_stores` API - Properly configured both PostgreSQL KV store and SQL store 3. Updated dependencies - Added `psycopg2-binary` and `asyncpg` to starter distribution - All postgres-related dependencies automatically included ## How to Use ### With Docker (PostgreSQL): ```bash docker run \ -e ENABLE_POSTGRES_STORE=1 \ -e POSTGRES_HOST=your_postgres_host \ -e POSTGRES_PORT=5432 \ -e POSTGRES_DB=llamastack \ -e POSTGRES_USER=llamastack \ -e POSTGRES_PASSWORD=llamastack \ -e OPENAI_API_KEY=your_key \ llamastack/distribution-starter ``` ### PostgreSQL environment variables: - `POSTGRES_HOST`: Postgres host (default: `localhost`) - `POSTGRES_PORT`: Postgres port (default: `5432`) - `POSTGRES_DB`: Postgres database name (default: `llamastack`) - `POSTGRES_USER`: Postgres username (default: `llamastack`) - `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`) ## Test Plan All pre-commit hooks pass (mypy, ruff, distro-codegen) `llama stack list-deps starter` confirms psycopg2-binary is included Storage configuration correctly uses PostgreSQL backends Container builds successfully with postgres support ## Credits Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to work with latest main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Sébastien Han @leseb --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-11-05 15:37:06 -08:00
ehhuang	9d5c34af27	fix!: BREAKING CHANGE: vector_store: search API response fix (#4080 ) # What does this PR do? - search_query in the vector store search API should be a list, according to https://github.com/openai/openai-openapi ## Test Plan modified tests --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080). * #4086 * __->__ #4080	2025-11-05 15:01:48 -08:00
ehhuang	95b0493fae	chore: move src/llama_stack/ui to src/llama_stack_ui (#4068 ) # What does this PR do? This better separates UI from backend code, which was a point of confusion often for our beloved AI friends. ## Test Plan CI	2025-11-04 15:21:49 -08:00
Ashwin Bharambe	5850e3473f	fix: remove straggler openapi HTML file	2025-11-04 14:54:33 -08:00
Ashwin Bharambe	0c49a53c97	chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067 ) RAG aka file search is implemented via the Responses API by specifying the file-search tool. The backend implementation remains unchanged. This PR merely removes the directly exposed API surface which allowed users to directly perform searches from the client. This facility is now available via the `client.vector_store.search()` OpenAI compatible API.	2025-11-04 14:50:54 -08:00
Ashwin Bharambe	a8a8aa56c0	chore!: remove the agents (sessions and turns) API (#4055 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details - Removes the deprecated agents (sessions and turns) API that was marked alpha in 0.3.0 - Cleans up unused imports and orphaned types after the API removal - Removes `SessionNotFoundError` and `AgentTurnInputType` which are no longer needed The agents API is completely superseded by the Responses + Conversations APIs, and the client SDK Agent class already uses those implementations. Corresponding client-side PR: https://github.com/llamastack/llama-stack-client-python/pull/295	2025-11-04 09:38:39 -08:00
Ashwin Bharambe	053fc0ac39	chore!: remove all deprecated routes (including /openai/v1/ ones) (#4054 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m13s Details This PR removes all routes which we had marked deprecated for the 0.3.0 release. This includes: - all the `/v1/openai/v1/` routes (the corresponding /v1 routes still exist of course) - the /agents API (which is superseded completely by Responses + Conversations) - several alpha routes which had a "v1" route to aide transitioning to "v1alpha" This is the corresponding client-python change: https://github.com/llamastack/llama-stack-client-python/pull/294	2025-11-03 19:00:59 -08:00
Ashwin Bharambe	cb40da210f	fix: update tests for OpenAI-style models endpoint (#4053 ) The llama-stack-client now uses /`v1/openai/v1/models` which returns OpenAI-compatible model objects with 'id' and 'custom_metadata' fields instead of the Resource-style 'identifier' field. Updated api_recorder to handle the new endpoint and modified tests to access model metadata appropriately. Deleted stale model recordings for re-recording. NOTE: CI will be red on this one since it is dependent on https://github.com/llamastack/llama-stack-client-python/pull/291/files landing. I verified locally that it is green.	2025-11-03 17:30:08 -08:00
Sébastien Han	4a5ef65286	chore!: remove SDG API (#4035 ) # What does this PR do? This API hasn't received any traction and close to zero interest from the community. Let's revisit in the future if things change. Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 16:12:06 -08:00
Ashwin Bharambe	44096512b5	feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051 ) We need to remove `/v1/openai/v1` paths shortly. There is one trouble -- our current `/v1/openai/v1/models` endpoint provides different data than `/v1/models`. Unfortunately our tests target the latter (llama-stack customized) behavior. We need to get to true OpenAI compatibility. This is step 1: adding `custom_metadata` field to `OpenAIModel` that includes all the extra stuff we add in the native `/v1/models` response. This can be extracted on the consumer end by look at `__pydantic_extra__` or other similar fields. This PR: - Adds `custom_metadata` field to `OpenAIModel` class in `src/llama_stack/apis/models/models.py` - Modified `openai_list_models()` in `src/llama_stack/core/routing_tables/models.py` to populate custom_metadata Next Steps 1. Update stainless client to use `/v1/openai/v1/models` instead of `/v1/models` 2. Migrate tests to read from `custom_metadata` 3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single `/v1/models` endpoint	2025-11-03 15:56:07 -08:00
Sébastien Han	d4aa348b60	chore: remove HTML generation for openapi spec (#4039 ) # What does this PR do? This seems to be an ancient artifact when we were using readthedocs? Now docusaurus read the specs directly. --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-03 18:03:40 +01:00
raghotham	62603d25c2	chore(api)!: /v1/inspect only lists v1 apis by default (#3948 ) # What does this PR do? Allow filtering for v1alpha, v1beta, deprecated and v1. Backward incompatible change since by default it only returns v1 apis now. ## Test Plan added unit test	2025-10-31 11:55:46 -07:00
Jiayi Ni	fa7699d2c3	feat: Add rerank API for NVIDIA Inference Provider (#3329 ) # What does this PR do? Add rerank API for NVIDIA Inference Provider. <!-- If resolving an issue, uncomment and update the line below --> Closes #3278 ## Test Plan Unit test: ``` pytest tests/unit/providers/nvidia/test_rerank_inference.py ``` Integration test: ``` pytest -s -v tests/integration/inference/test_rerank.py --stack-config="inference=nvidia" --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3 --env NVIDIA_API_KEY="" --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" ```	2025-10-30 21:42:09 -07:00
Sébastien Han	b4ea05ada9	chore: add batches to openapi schema (#3980 ) # What does this PR do? While working on https://github.com/llamastack/llama-stack/pull/3944 I realized that the batches API wasn't generated. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-30 07:08:35 -07:00
Charlie Doern	e8ecc99524	fix!: remove chunk_id property from Chunk class (#3954 ) # What does this PR do? chunk_id in the Chunk class executes actual logic to compute a chunk ID. This sort of logic should not live in the API spec. Instead, the providers should be in charge of calling generate_chunk_id, and pass it to `Chunk`. this removes the incorrect dependency between Provider impl and API impl Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-29 18:59:59 -07:00
Omar Abdelwahab	e6b27db30a	docs: A getting started notebook featuring simple agent examples. (#3955 ) # What does this PR do? Getting started notebook featuring simple agent examples. --------- Co-authored-by: Omar Abdelwahab <omara@fb.com>	2025-10-29 14:13:34 -04:00
Nathan Weinberg	b90c6a2c8b	fix(docs): remove leftover telemetry sidebar section (#3961 ) Leftover telemetry section was preventing `npm run build` from completing successfully Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-29 11:20:13 -04:00
ehhuang	1f9d48cd54	feat: openai files provider (#3946 ) # What does this PR do? - Adds OpenAI files provider - Note that file content retrieval is pretty limited by `purpose` https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com ## Test Plan Modify run yaml to use openai files provider: ``` files: - provider_id: openai provider_type: remote::openai config: api_key: ${env.OPENAI_API_KEY:=} metadata_store: backend: sql_default table_name: openai_files_metadata # Then run files tests ❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files ```	2025-10-28 16:25:03 -07:00
raghotham	feabcdd67b	docs: add documentation on how to use custom run yaml in docker (#3949 ) as title test plan: ```yaml # custom-ollama-run.yaml version: 2 image_name: starter external_providers_dir: /.llama/providers.d apis: - inference - vector_io - files - safety - tool_runtime - agents providers: inference: # Single Ollama provider for all models - provider_id: ollama provider_type: remote::ollama config: url: ${env.OLLAMA_URL:=http://localhost:11434} vector_io: - provider_id: faiss provider_type: inline::faiss config: persistence: namespace: vector_io::faiss backend: kv_default files: - provider_id: meta-reference-files provider_type: inline::localfs config: storage_dir: /.llama/files metadata_store: table_name: files_metadata backend: sql_default safety: - provider_id: llama-guard provider_type: inline::llama-guard config: excluded_categories: [] tool_runtime: - provider_id: rag-runtime provider_type: inline::rag-runtime agents: - provider_id: meta-reference provider_type: inline::meta-reference config: persistence: agent_state: namespace: agents backend: kv_default responses: table_name: responses backend: sql_default max_write_queue_size: 10000 num_writers: 4 storage: backends: kv_default: type: kv_sqlite db_path: /.llama/kvstore.db sql_default: type: sql_sqlite db_path: /.llama/sql_store.db stores: metadata: namespace: registry backend: kv_default inference: table_name: inference_store backend: sql_default max_write_queue_size: 10000 num_writers: 4 conversations: table_name: openai_conversations backend: sql_default registered_resources: models: # All models use the same 'ollama' provider - model_id: llama3.2-vision:latest provider_id: ollama provider_model_id: llama3.2-vision:latest model_type: llm - model_id: llama3.2:3b provider_id: ollama provider_model_id: llama3.2:3b model_type: llm # Embedding models - model_id: nomic-embed-text-v2-moe provider_id: ollama provider_model_id: toshk0/nomic-embed-text-v2-moe:Q6_K model_type: embedding metadata: embedding_dimension: 768 shields: [] vector_dbs: [] datasets: [] scoring_fns: [] benchmarks: [] tool_groups: [] server: port: 8321 telemetry: enabled: true vector_stores: default_provider_id: faiss default_embedding_model: provider_id: ollama model_id: toshk0/nomic-embed-text-v2-moe:Q6_K ``` ```bash docker run -it --pull always -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT -v ~/.llama:/root/.llama -v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml -e RUN_CONFIG_PATH=/app/custom-run.yaml -e OLLAMA_URL=http://host.docker.internal:11434/ llamastack/distribution-starter:0.3.0 --port $LLAMA_STACK_PORT ```	2025-10-28 16:05:44 -07:00

1 2 3 4 5 ...

949 commits