llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Omar Abdelwahab	c0295a2495	revert(debug): Remove temporary debug logging from resolver Removing the debug logging that was added to diagnose signature mismatch errors. The logging served its purpose - it helped us identify that the error was coming from api_recorder.py patched methods, not the actual provider implementations. With the root cause now fixed in api_recorder.py, this debug logging is no longer needed and can be safely removed to keep the code clean.	2025-11-12 16:12:14 -08:00
Omar Abdelwahab	d156451890	fix(ci): Add authorization parameter to api_recorder tool runtime patches The ACTUAL root cause of the signature mismatch errors was found! The api_recorder.py module patches tool runtime invoke_tool methods for test recording/replay, but the patched methods were missing the new 'authorization' parameter. The debug logging revealed: Object method: patched_tavily_invoke_tool (from api_recorder module) Object method's module: llama_stack.testing.api_recorder Changes made: 1. Updated _patched_tool_invoke_method() to accept authorization parameter 2. Updated patched_tavily_invoke_tool() signature to include authorization 3. Added debug logging to resolver to help identify similar issues in the future This fix ensures that when tests run in record/replay mode, the patched methods preserve the full signature including the authorization parameter, allowing the protocol compliance checks to pass.	2025-11-12 16:06:29 -08:00
Omar Abdelwahab	bae5b14adf	debug: Add detailed logging for signature mismatch errors Adding comprehensive debug logging to understand what's causing the persistent signature mismatch errors in CI. The logging will show: - Provider class name and module - Both protocol and object signatures - The actual method object - The method's source module This will help us identify if the issue is: 1. A cached module being loaded 2. A parent class overriding the method 3. Some other source of the wrong signature Once we see the debug output, we can pinpoint the exact root cause.	2025-11-12 16:01:13 -08:00
Omar Abdelwahab	778b7de9cb	fix: add authorization parameter to ToolRuntimeRouter and routing table The auto-routing layer was missing the authorization parameter: - ToolRuntimeRouter.invoke_tool() now accepts and passes authorization - ToolRuntimeRouter.list_runtime_tools() now accepts and passes authorization - ToolGroupsRoutingTable.list_tools() now accepts and forwards authorization - ToolGroupsRoutingTable._index_tools() now accepts and uses authorization This fixes the '__autorouted__' provider signature mismatch error in CI.	2025-11-12 15:08:00 -08:00
Omar Abdelwahab	607e3cc05c	Merge branch 'main' into add-mcp-authentication-param	2025-11-12 14:55:23 -08:00
Omar Abdelwahab	d0ec3b07b5	fix: add authorization parameter to all ToolRuntime provider implementations Updated all ToolRuntime provider implementations to match the protocol signature: - BraveSearchToolRuntimeImpl - TavilySearchToolRuntimeImpl - BingSearchToolRuntimeImpl - WolframAlphaToolRuntimeImpl - MemoryToolRuntimeImpl This fixes the signature mismatch error in CI where protocol had 'authorization' parameter but implementations didn't.	2025-11-12 14:47:22 -08:00
Omar Abdelwahab	84baa5c406	feat: unify MCP authentication across Responses and Tool Runtime APIs - Add authorization parameter to Tool Runtime API signatures (list_runtime_tools, invoke_tool) - Update MCP provider implementation to use authorization from request body instead of provider-data - Deprecate mcp_authorization and mcp_headers from provider-data (MCPProviderDataValidator now empty) - Update all Tool Runtime tests to pass authorization as request body parameter - Responses API already uses request body authorization (no changes needed) This provides a single, consistent way to pass MCP authentication tokens across both APIs, addressing reviewer feedback about avoiding multiple configuration paths.	2025-11-12 14:41:00 -08:00
Ashwin Bharambe	fcf649b97a	feat(storage): share sql/kv instances and add upsert support (#4140 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / generate-matrix (push) Successful in 2s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Python Package Build Test / build (3.13) (push) Failing after 17s Details Test Llama Stack Build / build-single-provider (push) Successful in 31s Details Test External API and Providers / test-external (venv) (push) Failing after 32s Details Vector IO Integration Tests / test-matrix (push) Failing after 45s Details Test Llama Stack Build / build (push) Successful in 47s Details UI Tests / ui-tests (22) (push) Successful in 1m42s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m8s Details Unit Tests / unit-tests (3.13) (push) Failing after 2m7s Details Unit Tests / unit-tests (3.12) (push) Failing after 2m28s Details Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m32s Details Pre-commit / pre-commit (push) Successful in 3m20s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3m33s Details A few changes to the storage layer to ensure we reduce unnecessary contention arising out of our design choices (and letting the database layer do its correct thing): - SQL stores now share a single `SqlAlchemySqlStoreImpl` per backend, and `kvstore_impl` caches instances per `(backend, namespace)`. This avoids spawning multiple SQLite connections for the same file, reducing lock contention and aligning the cache story for all backends. - Added an async upsert API (with SQLite/Postgres dialect inserts) and routed it through `AuthorizedSqlStore`, then switched conversations and responses to call it. Using native `ON CONFLICT DO UPDATE` eliminates the insert-then-update retry window that previously caused long WAL lock retries. ### Test Plan Existing tests, added a unit test for `upsert()`	2025-11-12 12:14:26 -08:00
Ashwin Bharambe	492f79ca9b	fix: harden storage semantics (#4118 ) Fixes issues in the storage system by guaranteeing immediate durability for responses and ensuring background writers stay alive. Three related fixes: * Responses to the OpenAI-compatible API now write directly to Postgres/SQLite inside the request instead of detouring through an async queue that might never drain; this restores the expected read-after-write behavior and removes the "response not found" races reported by users. * The access-control shim was stamping owner_principal/access_attributes as SQL NULL, which Postgres interprets as non-public rows; fixing it to use the empty-string/JSON-null pattern means conversations and responses stored without an authenticated user stay queryable (matching SQLite). * The inference-store queue remains for batching, but its worker tasks now start lazily on the live event loop so server startup doesn't cancel them—writes keep flowing even when the stack is launched via llama stack run. Closes #4115 ### Test Plan Added a matrix entry to test our "base" suite against Postgres as the store.	2025-11-12 10:35:39 -08:00
Francisco Arceo	eb3f9ac278	feat: allow returning embeddings and metadata from `/vector_stores/` methods; disallow changing Provider ID (#4046 ) # What does this PR do? - Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to allow returning `embeddings` and `metadata` using the `extra_query` - Updates the UI accordingly to display them. - Update UI to support CRUD operations in the Vector Stores section and adds a new modal exposing the functionality. - Updates Vector Store update to fail if a user tries to update Provider ID (which doesn't make sense to allow) ```python In [1]: client.vector_stores.files.content( vector_store_id=vector_store.id, file_id=file.id, extra_query={"include_embeddings": True, "include_metadata": True} ) Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic -ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt') ``` Screenshots of UI are displayed below: ### List Vector Store with Added "Create New Vector Store" <img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47 25 PM" src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3" /> ### Create New Vector Store <img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47 49 PM" src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158" /> ### Edit Vector Store <img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48 32 PM" src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414" /> ### Vector Store Files Contents page (with Embeddings) <img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54 32 PM" src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27" /> ### Vector Store Files Contents Details page (with Embeddings) <img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55 00 PM" src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c" /> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Tests added for Middleware extension and Provider failures. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-11-12 09:59:48 -08:00
Charlie Doern	6ca2a67a9f	chore: remove dead code (#4125 ) # What does this PR do? build_image is not used because `llama stack build` is gone. Remove it. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-12 10:09:14 +01:00
Omar Abdelwahab	893e186d5c	Merge branch 'main' into add-mcp-authentication-param	2025-11-11 11:21:09 -08:00
ehhuang	71b328fc4b	chore(ui): add npm package and dockerfile (#4100 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / generate-matrix (push) Successful in 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 53s Details # What does this PR do? - sets up package.json for npm `llama-stack-ui` package (will update llama-stack-ops) - adds dockerfile for UI docker image ## Test Plan npx: npm build && npm pack LLAMA_STACK_UI_PORT=8322 npx /Users/erichuang/projects/ui/src/llama_stack_ui/llama-stack-ui-0.4.0-alpha.2.tgz docker: cd src/llama_stack_ui docker build . -f Dockerfile --tag test_ui --no-cache ❯ docker run -p 8322:8322 \ -e LLAMA_STACK_UI_PORT=8322 \ test_ui:latest	2025-11-11 10:40:31 -08:00
Omar Abdelwahab	30a544fb8c	Merge branch 'main' into add-mcp-authentication-param	2025-11-10 18:26:48 -08:00
Nathan Weinberg	97ccfb5e62	refactor: inspect routes now shows all non-deprecated APIs (#4116 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Test llama stack list-deps / show-single-provider (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 21s Details UI Tests / ui-tests (22) (push) Successful in 46s Details # What does this PR do? the inspect API lacked any mechanism to get all non-deprecated APIs (v1, v1alpha, v1beta) change default to this behavior 'v1' filter can be used for user' wanting a list of stable APIs ## Test Plan 1. pull the PR 2. launch a LLS server 3. run `curl http://beanlab3.bss.redhat.com:8321/v1/inspect/routes` 4. note there are APIs for `v1`, `v1alpha`, and `v1beta` but no deprecated APIs Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-11-10 15:57:17 -08:00
Charlie Doern	43adc23ef6	refactor: remove dead inference API code and clean up imports (#4093 ) # What does this PR do? Delete ~2,000 lines of dead code from the old bespoke inference API that was replaced by OpenAI-only API. This includes removing unused type conversion functions, dead provider methods, and event_logger.py. Clean up imports across the codebase to remove references to deleted types. This eliminates unnecessary code and dependencies, helping isolate the API package as a self-contained module. This is the last interdependency between the .api package and "exterior" packages, meaning that now every other package in llama stack imports the API, not the other way around. ## Test Plan this is a structural change, no tests needed. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-10 15:29:24 -08:00
Omar Abdelwahab	5c6f713354	Merge branch 'main' into add-mcp-authentication-param	2025-11-10 15:13:45 -08:00
Shabana Baig	433438cfc0	feat: Implement the 'max_tool_calls' parameter for the Responses API (#4062 ) # Problem Responses API uses max_tool_calls parameter to limit the number of tool calls that can be generated in a response. Currently, LLS implementation of the Responses API does not support this parameter. # What does this PR do? This pull request adds the max_tool_calls field to the response object definition and updates the inline provider. it also ensures that: - the total number of calls to built-in and mcp tools do not exceed max_tool_calls - an error is thrown if max_tool_calls < 1 (behavior seen with the OpenAI Responses API, but we can change this if needed) Closes #[3563](https://github.com/llamastack/llama-stack/issues/3563) ## Test Plan - Tested manually for change in model response w.r.t supplied max_tool_calls field. - Added integration tests to test invalid max_tool_calls parameter. - Added integration tests to check max_tool_calls parameter with built-in and function tools. - Added integration tests to check max_tool_calls parameter in the returned response object. - Recorded OpenAI Responses API behavior using a sample script: https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/max_tool_calls.py Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-10 13:21:27 -08:00
Omar Abdelwahab	114ab693a5	Merge branch 'main' into add-mcp-authentication-param	2025-11-10 13:19:12 -08:00
Dennis Kennetz	209a78b618	feat: add oci genai service as chat inference provider (#3876 ) # What does this PR do? Adds OCI GenAI PaaS models for openai chat completion endpoints. ## Test Plan In an OCI tenancy with access to GenAI PaaS, perform the following steps: 1. Ensure you have IAM policies in place to use service (check docs included in this PR) 2. For local development, [setup OCI cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm) and configure the CLI with your region, tenancy, and auth [here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm) 3. Once configured, go through llama-stack setup and run llama-stack (uses config based auth) like: ```bash OCI_AUTH_TYPE=config_file \ OCI_CLI_PROFILE=CHICAGO \ OCI_REGION=us-chicago-1 \ OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \ llama stack run oci ``` 4. Hit the `models` endpoint to list models after server is running: ```bash curl http://localhost:8321/v1/models \| jq ... { "identifier": "meta.llama-4-scout-17b-16e-instruct", "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q", "provider_id": "oci", "type": "model", "metadata": { "display_name": "meta.llama-4-scout-17b-16e-instruct", "capabilities": [ "CHAT" ], "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q" }, "model_type": "llm" }, ... ``` 5. Use the "display_name" field to use the model in a `/chat/completions` request: ```bash # Streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": true, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' # Non-streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": false, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' ``` 6. Try out other models from the `/models` endpoint.	2025-11-10 16:16:24 -05:00
Ashwin Bharambe	fadf17daf3	feat(api)!: deprecate register/unregister resource APIs (#4099 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Pre-commit / pre-commit (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 8s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details Mark all register_* / unregister_* APIs as deprecated across models, shields, tool groups, datasets, benchmarks, and scoring functions. This is the first step toward moving resource mutations to an `/admin` namespace as outlined in https://github.com/llamastack/llama-stack/issues/3809#issuecomment-3492931585. The deprecation flag will be reflected in the OpenAPI schema to warn API users that these endpoints are being phased out. Next step will be implementing the `/admin` route namespace for these resource management operations. - `register_model` / `unregister_model` - `register_shield` / `unregister_shield` - `register_tool_group` / `unregister_toolgroup` - `register_dataset` / `unregister_dataset` - `register_benchmark` / `unregister_benchmark` - `register_scoring_function` / `unregister_scoring_function`	2025-11-10 10:36:33 -08:00
ehhuang	d4ecbfd092	fix(vector store)!: fix file content API (#4105 ) # What does this PR do? - changed to match https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml ## Test Plan updated test CI	2025-11-10 10:16:35 -08:00
Omar Abdelwahab	6716e128be	security: exclude mcp_authorization from serialization and logs Added Field(exclude=True) to mcp_authorization field to ensure tokens are NEVER exposed in: - API responses (model_dump()) - JSON serialization (model_dump_json()) - Logs - Any Pydantic serialization This prevents accidental token leakage through: - Error messages - Debug logs - API response payloads - Monitoring/telemetry systems The field is still accessible within the application code but will be automatically excluded from all Pydantic serialization operations.	2025-11-10 10:06:07 -08:00
Juan Pérez de Algaba	6147321083	fix: Vector store persistence across server restarts (#3977 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Integration Tests (Replay) / generate-matrix (push) Successful in 21s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Pre-commit / pre-commit (push) Failing after 23s Details Test External API and Providers / test-external (venv) (push) Failing after 22s Details API Conformance Tests / check-schema-compatibility (push) Successful in 30s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 20s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details # What does this PR do? This PR fixes a bug in LlamaStack 0.3.0 where vector stores created via the OpenAI-compatible API (`POST /v1/vector_stores`) would fail with `VectorStoreNotFoundError` after server restart when attempting operations like `vector_io.insert()` or `vector_io.query()`. The bug affected 6 vector IO providers: `pgvector`, `sqlite_vec`, `chroma`, `milvus`, `qdrant`, and `weaviate`. Created with the assistance of: claude-4.5-sonnet ## Root Cause All affected providers had a broken `_get_and_cache_vector_store_index()` method that: 1. Did not load existing vector stores from persistent storage during initialization 2. Attempted to use `vector_store_table` (which was either `None` or a `KVStore` without the required `get_vector_store()` method) 3. Could not reload vector stores after server restart or cache miss ## Solution This PR implements a consistent pattern across all 6 providers: 1. Load vector stores during initialization - Pre-populate the cache from KV store on startup 2. Fix lazy loading - Modified `_get_and_cache_vector_store_index()` to load directly from KV store instead of relying on `vector_store_table` 3. Remove broken dependency - Eliminated reliance on the `vector_store_table` pattern ## Testing steps ### 1.1 Configure the stack Create or use an existing configuration with a vector IO provider. Example `run.yaml`: ```yaml vector_io_store: - provider_id: pgvector provider_type: remote::pgvector config: host: localhost port: 5432 db: llamastack user: llamastack password: llamastack inference: - provider_id: sentence-transformers provider_type: inline::sentence-transformers config: model: sentence-transformers/all-MiniLM-L6-v2 ``` ### 1.2 Start the server ```bash llama stack run run.yaml --port 5000 ``` Wait for the server to fully start. You should see: ``` INFO: Started server process INFO: Application startup complete ``` --- ## Step 2: Create a Vector Store ### 2.1 Create via API ```bash curl -X POST http://localhost:5000/v1/vector_stores \ -H "Content-Type: application/json" \ -d '{ "name": "test-persistence-store", "extra_body": { "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "embedding_dimension": 384, "provider_id": "pgvector" } }' \| jq ``` ### 2.2 Expected Response ```json { "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d", "object": "vector_store", "name": "test-persistence-store", "status": "completed", "created_at": 1730304000, "file_counts": { "total": 0, "completed": 0, "in_progress": 0, "failed": 0, "cancelled": 0 }, "usage_bytes": 0 } ``` Save the `id` field (e.g., `vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d`) — you’ll need it for the next steps. --- ## Step 3: Insert Data (Before Restart) ### 3.1 Insert chunks into the vector store ```bash export VS_ID="vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d" curl -X POST http://localhost:5000/vector-io/insert \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"chunks\": [ { \"content\": \"Python is a high-level programming language known for its readability.\", \"metadata\": {\"source\": \"doc1\", \"page\": 1} }, { \"content\": \"Machine learning enables computers to learn from data without explicit programming.\", \"metadata\": {\"source\": \"doc2\", \"page\": 1} }, { \"content\": \"Neural networks are inspired by biological neurons in the brain.\", \"metadata\": {\"source\": \"doc3\", \"page\": 1} } ] }" ``` ### 3.2 Expected Response Status: 200 OK Response: Empty or success confirmation --- ## Step 4: Query Data (Before Restart – Baseline) ### 4.1 Query the vector store ```bash curl -X POST http://localhost:5000/vector-io/query \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"query\": \"What is machine learning?\" }" \| jq ``` ### 4.2 Expected Response ```json { "chunks": [ { "content": "Machine learning enables computers to learn from data without explicit programming.", "metadata": {"source": "doc2", "page": 1} }, { "content": "Neural networks are inspired by biological neurons in the brain.", "metadata": {"source": "doc3", "page": 1} } ], "scores": [0.85, 0.72] } ``` Checkpoint: Works correctly before restart. --- ## Step 5: Restart the Server (Critical Test) ### 5.1 Stop the server In the terminal where it’s running: ``` Ctrl + C ``` Wait for: ``` Shutting down... ``` ### 5.2 Restart the server ```bash llama stack run run.yaml --port 5000 ``` Wait for: ``` INFO: Started server process INFO: Application startup complete ``` The vector store cache is now empty, but data should persist. --- ## Step 6: Verify Vector Store Exists (After Restart) ### 6.1 List vector stores ```bash curl http://localhost:5000/v1/vector_stores \| jq ``` ### 6.2 Expected Response ```json { "object": "list", "data": [ { "id": "vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d", "name": "test-persistence-store", "status": "completed" } ] } ``` Checkpoint: Vector store should be listed. --- ## Step 7: Insert Data (After Restart – THE BUG TEST) ### 7.1 Insert new chunks ```bash curl -X POST http://localhost:5000/vector-io/insert \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"chunks\": [ { \"content\": \"This chunk was inserted AFTER the server restart.\", \"metadata\": {\"source\": \"post-restart\", \"test\": true} } ] }" ``` ### 7.2 Expected Results With Fix (Correct): ``` Status: 200 OK Response: Success ``` Without Fix (Bug): ```json { "detail": "VectorStoreNotFoundError: Vector Store 'vs_a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d' not found." } ``` Critical Test: If insertion succeeds, the fix works. --- ## Step 8: Query Data (After Restart – Verification) ### 8.1 Query all data ```bash curl -X POST http://localhost:5000/vector-io/query \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"query\": \"restart\" }" \| jq ``` ### 8.2 Expected Response ```json { "chunks": [ { "content": "This chunk was inserted AFTER the server restart.", "metadata": {"source": "post-restart", "test": true} } ], "scores": [0.95] } ``` Checkpoint: Both old and new data are queryable. --- ## Step 9: Multiple Restart Test (Extra Verification) ### 9.1 Restart again ```bash Ctrl + C llama stack run run.yaml --port 5000 ``` ### 9.2 Query after restart ```bash curl -X POST http://localhost:5000/vector-io/query \ -H "Content-Type: application/json" \ -d "{ \"vector_store_id\": \"$VS_ID\", \"query\": \"programming\" }" \| jq ``` Expected: Works correctly across multiple restarts. --------- Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-11-09 00:05:00 -05:00
Omar Abdelwahab	1a7ba683e3	Merge branch 'main' into add-mcp-authentication-param	2025-11-07 14:26:06 -08:00
Omar Abdelwahab	9e972cf20c	docs: clarify security mechanism comments in get_headers_from_request Based on user feedback, improved comments to distinguish between the two security layers: 1. PRIMARY: Line 89 - Architectural prevention - get_request_provider_data() only reads from request body - Never accesses HTTP Authorization header - This is what actually prevents inference token leakage 2. SECONDARY: Lines 97-104 - Validation prevention - Rejects Authorization in mcp_headers dict - Enforces using dedicated mcp_authorization field - Prevents users from misusing the API Previous comment was misleading by suggesting the validation prevented inference token leakage, when the architecture already ensures that isolation.	2025-11-07 14:05:48 -08:00
Omar Abdelwahab	2295a1aad5	formatting changes	2025-11-07 14:01:54 -08:00
Omar Abdelwahab	c563d8ad80	formatting	2025-11-07 13:58:13 -08:00
Omar Abdelwahab	a2098eea27	docs: add comprehensive docstring for MCPProviderDataValidator Adds inline documentation to help users understand: - How to structure provider_data in HTTP requests - Where to place mcp_headers vs mcp_authorization - Security requirements (no Authorization in headers) - Token format requirements (without Bearer prefix) - Example usage with multiple MCP endpoints	2025-11-07 13:50:23 -08:00
Omar Abdelwahab	ccb870c8fb	precommit	2025-11-07 12:14:42 -08:00
Omar Abdelwahab	445135b8cc	feat: implement dedicated mcp_authorization field for remote provider Completes the TODO for extracting authorization from a dedicated field. What changed: - Added mcp_authorization field to MCPProviderDataValidator - Updated get_headers_from_request() to extract from mcp_authorization - Authorization is now properly isolated per MCP endpoint API usage example: { "provider_data": { "mcp_headers": { "http://mcp-server.com": { "X-Trace-ID": "trace-123" } }, "mcp_authorization": { "http://mcp-server.com": "mcp_token_xyz789" } } } Security guarantees: - Authorization cannot be in mcp_headers (validation rejects it) - Each MCP endpoint gets its own dedicated token - No cross-service token leakage possible	2025-11-07 11:45:47 -08:00
Omar Abdelwahab	a842c90059	security: enforce Authorization rejection in remote MCP provider Addresses reviewer concern about token isolation between services. The remote provider now rejects Authorization headers in mcp_headers to prevent accidentally passing inference tokens to MCP servers. This makes the remote provider consistent with the inline provider: - Both reject Authorization in headers dict - Both require dedicated authorization parameter - Prevents token leakage across service boundaries Related changes: - Added validation in get_headers_from_request() - Throws ValueError if Authorization found in mcp_headers - Added TODO for dedicated authorization field in provider_data	2025-11-07 11:34:33 -08:00
Omar Abdelwahab	2b0423c337	refactor: move Authorization validation to correct handler file Per reviewer feedback, validation should be in the openai_responses.py handler, not the streaming.py file. Moved validation logic to create_openai_response() method which is the main entry point for response creation. - Added validation in create_openai_response() before processing - Removed duplicate validation from _process_mcp_tool() in streaming.py - Validation runs early and rejects malformed requests immediately - Maintains same security check: rejects Authorization in headers dict	2025-11-07 11:06:24 -08:00
Omar Abdelwahab	50040f3df7	refactor: move Authorization validation from API model to handler layer Per reviewer feedback, API models should be pure data structures without business logic. Moved the Authorization header validation from the Pydantic @model_validator in openai_responses.py to the handler in streaming.py. - Removed @model_validator from OpenAIResponseInputToolMCP - Added validation at handler level in _process_mcp_tool() - Maintains same security check: rejects Authorization in headers dict - Follows separation of concerns: models are data, handlers have logic	2025-11-07 11:04:27 -08:00
Omar Abdelwahab	1c27c1bef6	feat: add response sanitization and validation for MCP authorization - Add Field(exclude=True) to authorization parameter to prevent token leakage in responses - Add model validator to reject Authorization header in headers dict - Users must use dedicated 'authorization' parameter instead of headers - Headers field is preserved for legitimate non-auth headers (tracing, routing, etc.) This implements the security requirement that authorization params are never returned in responses, unlike generic headers which may be echoed back.	2025-11-07 10:50:20 -08:00
Aakanksha Duggal	b83184f7ef	feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103 ) # What does this PR do? Resolves #4102 1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and the `OpenAIResponseInputToolWebSearch.type` Literal union 2. No changes needed to tool execution logic - all `web_search` types map to the same underlying tool 3. Backward compatibility is maintained - existing `web_search`, `web_search_preview`, and `web_search_preview_2025_03_11` types continue to work 4. Added an integration test case using {"type": "web_search_2025_08_26"} to verify it works correctly 5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to reflect that `web_search_2025_08_26` is now supported. 6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist in the codebase) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Aakanksha Duggal <aduggal@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-07 10:01:12 -08:00
Ashwin Bharambe	f49cb0b717	chore: Stack server no longer depends on llama-stack-client (#4094 ) This dependency has been bothering folks for a long time (cc @leseb). We really needed it due to "library client" which is primarily used for our tests and is not a part of the Stack server. Anyone who needs to use the library client can certainly install `llama-stack-client` in their environment to make that work. Updated the notebook references to install `llama-stack-client` additionally when setting things up.	2025-11-07 09:54:09 -08:00
Lê Nam Khánh	68c976a2d8	docs: fix typos in some files (#4101 ) This PR fixes typos in the file file using codespell.	2025-11-07 16:07:46 +01:00
Sumanth Kamenani	e894e36eea	feat: add OpenAI-compatible Bedrock provider (#3748 ) Some checks failed Pre-commit / pre-commit (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s Details UI Tests / ui-tests (22) (push) Successful in 48s Details Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock. Closes: #3410 ## What does this PR do? Adds AWS Bedrock as an inference provider using the OpenAI-compatible endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the standard llama-stack inference API. The implementation uses LiteLLM's OpenAI client under the hood, so it gets all the OpenAI compatibility features. The provider handles per-request API key overrides via headers. ## Test Plan Tested the following scenarios: - Non-streaming completion - basic request/response flow - Streaming completion - SSE streaming with chunked responses - Multi-turn conversations - context retention across turns - Tool calling - function calling with proper tool_calls format # Bedrock OpenAI-Compatible Provider - Test Results Model: `bedrock-inference/openai.gpt-oss-20b-1:0` --- ## Test 1: Model Listing Request: ```http GET /v1/models HTTP/1.1 ``` Response: ```http HTTP/1.1 200 OK Content-Type: application/json { "data": [ {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}, {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...} ] } ``` --- ## Test 2: Non-Streaming Completion Request: ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}], "stream": false } ``` Response: ```http HTTP/1.1 200 OK Content-Type: application/json { "choices": [{ "finish_reason": "stop", "message": {"content": "...Hello from Bedrock"} }], "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129} } ``` --- ## Test 3: Streaming Completion Request: ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Count from 1 to 5"}], "stream": true } ``` Response: ```http HTTP/1.1 200 OK Content-Type: text/event-stream [6 SSE chunks received] Final content: "1, 2, 3, 4, 5" ``` --- ## Test 4: Error Handling - Invalid Model Request: ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "invalid-model-id", "messages": [{"role": "user", "content": "Hello"}], "stream": false } ``` Response: ```http HTTP/1.1 404 Not Found Content-Type: application/json { "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models." } ``` --- ## Test 5: Multi-Turn Conversation Request 1: ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "My name is Alice"}] } ``` Response 1: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Nice to meet you, Alice! How can I help you today?"} }] } ``` Request 2 (with history): ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "user", "content": "My name is Alice"}, {"role": "assistant", "content": "...Nice to meet you, Alice!..."}, {"role": "user", "content": "What is my name?"} ] } ``` Response 2: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Your name is Alice."} }], "usage": {"prompt_tokens": 183, "completion_tokens": 42} } ``` Context retained across turns --- ## Test 6: System Messages Request: ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."}, {"role": "user", "content": "Tell me about the weather"} ] } ``` Response: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "Lo! I heed thy request..."} }], "usage": {"completion_tokens": 813} } ``` --- ## Test 7: Tool Calling Request: ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}], "tools": [{ "type": "function", "function": { "name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}} } }] } ``` Response: ```http HTTP/1.1 200 OK { "choices": [{ "finish_reason": "tool_calls", "message": { "tool_calls": [{ "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"} }] } }] } ``` --- ## Test 8: Sampling Parameters Request: ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "Say hello"}], "temperature": 0.7, "top_p": 0.9 } ``` Response: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! 👋 How can I help you today?"} }] } ``` --- ## Test 9: Authentication Error Handling ### Subtest A: Invalid API Key Request: ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` Response: ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ``` --- ### Subtest B: Empty API Key (Fallback to Config) Request: ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": ""} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` Response: ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! How can I assist you today?"} }] } ``` Fell back to config key --- ### Subtest C: Malformed Token Request: ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` Response: ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ```	2025-11-06 17:18:18 -08:00
Ashwin Bharambe	a2c4c12384	chore(ui): remove the Streamlit UI (#4097 )	2025-11-06 15:51:57 -08:00
Omar Abdelwahab	dd9c7b3253	removed a small comment	2025-11-06 13:10:56 -08:00
Omar Abdelwahab	5ce48d2c6a	precommit	2025-11-06 12:02:45 -08:00
Omar Abdelwahab	e8cb52683d	Updated get_headers_from_request	2025-11-06 11:41:33 -08:00
Omar Abdelwahab	18aff1abaa	rejecting headers that include Authorization in the header and pointing them to the authorization param.	2025-11-06 10:59:45 -08:00
Charlie Doern	9df073450f	feat: remove core.telemetry as a dependency of llama_stack.apis (#4064 ) Some checks failed Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 55s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details # What does this PR do? Remove circular dependency by moving tracing from API protocol definitions to router implementation layer. This gets us closer to having a self contained API package with no other cross-cutting dependencies to other parts of the llama stack codebase. To the best of our ability, the llama_stack.api should only be type and protocol definitions. Changes: - Create apis/common/tracing.py with marker decorator (zero core dependencies) - Add the _new_ `@telemetry_traceable` marker decorator to 11 protocol classes - Apply actual tracing in core/resolver.py in `instantiate_provider` based on protocol marker - Move MetricResponseMixin from core to apis (it's an API response type) - APIs package is now self-contained with zero core dependencies The tracing functionality remains identical - actual trace_protocol from core is applied to router implementations at runtime when both telemetry is enabled and the protocol has the `__marked_for_tracing__` marker. ## Test Plan Manual integration test confirms identical behavior to main branch: ```bash llama stack list-deps --format uv starter \| sh export OLLAMA_URL=http://localhost:11434 llama stack run starter curl -X POST http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "ollama/gpt-oss:20b", "messages": [{"role": "user", "content": "Say hello"}], "max_tokens": 10}' ``` Verified identical between main and this branch: - trace_id present in response - metrics array with prompt_tokens, completion_tokens, total_tokens - Server logs show trace_protocol applied to all routers Existing telemetry integration tests (tests/integration/telemetry/) validate trace context propagation and span attributes. relates to #3895 --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-06 10:58:30 -08:00
Ashwin Bharambe	bef1b044bd	refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 48s Details We'd like to remove the dependence of `llama-stack` on `llama-stack-client`. This is a necessary step. A few small cleanups - Enables `embeddings` now also - Remove ModelRegistryHelper dependency (unused) - Consolidate to auth_credential field via RemoteInferenceProviderConfig - Implement list_models() to fetch from downstream /v1/models ## Test Plan Tested using this script https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581 Output: ``` Listing models from downstream server... Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct -v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5'] Using LLM model: passthrough/ollama/llama3.2-vision:11b Making inference request... Response: 4. --- Testing streaming --- Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None) ... 5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None) ```	2025-11-05 18:15:11 -08:00
ehhuang	b335419faa	fix: actualize chunking strategy in vector store create API (#4086 ) # What does this PR do? - when create vector store is called without chunk strategy, we actually the strategy used so that the value is persisted instead of strategy='None' ## Test Plan updated tests	2025-11-05 15:47:54 -08:00
Roy Belio	c672a5d792	feat: ability to use postgres as store for starter distro (#4076 ) ## What does this PR do? The starter distribution now comes with all the required packages to support persistent stores—like the agent store, metadata, and inference—using PostgreSQL. Users can enable PostgreSQL support by setting the `ENABLE_POSTGRES_STORE=1` environment variable. This PR consolidates the functionality from the removed `postgres-demo` distribution into the starter distribution, reducing maintenance overhead. Closes: #2619 Supersedes: #2851 (rebased and updated) ## Changes Made 1. Added PostgreSQL support to starter distribution - New `run-with-postgres-store.yaml` configuration - Automatic config switching via `ENABLE_POSTGRES_STORE` environment variable - Removed separate `postgres-demo` distribution 2. Updated to new build system - Integrated postgres switching logic into Containerfile entrypoint - Uses new `storage_backends` and `storage_stores` API - Properly configured both PostgreSQL KV store and SQL store 3. Updated dependencies - Added `psycopg2-binary` and `asyncpg` to starter distribution - All postgres-related dependencies automatically included ## How to Use ### With Docker (PostgreSQL): ```bash docker run \ -e ENABLE_POSTGRES_STORE=1 \ -e POSTGRES_HOST=your_postgres_host \ -e POSTGRES_PORT=5432 \ -e POSTGRES_DB=llamastack \ -e POSTGRES_USER=llamastack \ -e POSTGRES_PASSWORD=llamastack \ -e OPENAI_API_KEY=your_key \ llamastack/distribution-starter ``` ### PostgreSQL environment variables: - `POSTGRES_HOST`: Postgres host (default: `localhost`) - `POSTGRES_PORT`: Postgres port (default: `5432`) - `POSTGRES_DB`: Postgres database name (default: `llamastack`) - `POSTGRES_USER`: Postgres username (default: `llamastack`) - `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`) ## Test Plan All pre-commit hooks pass (mypy, ruff, distro-codegen) `llama stack list-deps starter` confirms psycopg2-binary is included Storage configuration correctly uses PostgreSQL backends Container builds successfully with postgres support ## Credits Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to work with latest main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Sébastien Han @leseb --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-11-05 15:37:06 -08:00
ehhuang	9d5c34af27	fix!: BREAKING CHANGE: vector_store: search API response fix (#4080 ) # What does this PR do? - search_query in the vector store search API should be a list, according to https://github.com/openai/openai-openapi ## Test Plan modified tests --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080). * #4086 * __->__ #4080	2025-11-05 15:01:48 -08:00
Omar Abdelwahab	411b18a90f	Merge branch 'main' into add-mcp-authentication-param	2025-11-05 14:12:32 -08:00

1 2 3

121 commits