llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Omar Abdelwahab	607e3cc05c	Merge branch 'main' into add-mcp-authentication-param	2025-11-12 14:55:23 -08:00
Omar Abdelwahab	7a823bc280	fix: remove syntax errors from test files caused by sed Fixed syntax errors in test files that were introduced by batch sed replacement: - test_tools_with_schemas.py: Removed leftover broken comments and closing brace - test_mcp_json_schema.py: Removed all instances of broken comment blocks The sed command left remnants that broke Python syntax.	2025-11-12 14:54:38 -08:00
Omar Abdelwahab	84baa5c406	feat: unify MCP authentication across Responses and Tool Runtime APIs - Add authorization parameter to Tool Runtime API signatures (list_runtime_tools, invoke_tool) - Update MCP provider implementation to use authorization from request body instead of provider-data - Deprecate mcp_authorization and mcp_headers from provider-data (MCPProviderDataValidator now empty) - Update all Tool Runtime tests to pass authorization as request body parameter - Responses API already uses request body authorization (no changes needed) This provides a single, consistent way to pass MCP authentication tokens across both APIs, addressing reviewer feedback about avoiding multiple configuration paths.	2025-11-12 14:41:00 -08:00
Ashwin Bharambe	492f79ca9b	fix: harden storage semantics (#4118 ) Fixes issues in the storage system by guaranteeing immediate durability for responses and ensuring background writers stay alive. Three related fixes: * Responses to the OpenAI-compatible API now write directly to Postgres/SQLite inside the request instead of detouring through an async queue that might never drain; this restores the expected read-after-write behavior and removes the "response not found" races reported by users. * The access-control shim was stamping owner_principal/access_attributes as SQL NULL, which Postgres interprets as non-public rows; fixing it to use the empty-string/JSON-null pattern means conversations and responses stored without an authenticated user stay queryable (matching SQLite). * The inference-store queue remains for batching, but its worker tasks now start lazily on the live event loop so server startup doesn't cancel them—writes keep flowing even when the stack is launched via llama stack run. Closes #4115 ### Test Plan Added a matrix entry to test our "base" suite against Postgres as the store.	2025-11-12 10:35:39 -08:00
Ken Dreyer	94e977c257	fix(docs): link to test replay-record docs for discoverability (#4134 ) Help users find the comprehensive integration testing docs by linking to the record-replay documentation. This clarifies that the technical README complements the main docs.	2025-11-12 10:04:56 -08:00
Francisco Arceo	eb3f9ac278	feat: allow returning embeddings and metadata from `/vector_stores/` methods; disallow changing Provider ID (#4046 ) # What does this PR do? - Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to allow returning `embeddings` and `metadata` using the `extra_query` - Updates the UI accordingly to display them. - Update UI to support CRUD operations in the Vector Stores section and adds a new modal exposing the functionality. - Updates Vector Store update to fail if a user tries to update Provider ID (which doesn't make sense to allow) ```python In [1]: client.vector_stores.files.content( vector_store_id=vector_store.id, file_id=file.id, extra_query={"include_embeddings": True, "include_metadata": True} ) Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic -ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt') ``` Screenshots of UI are displayed below: ### List Vector Store with Added "Create New Vector Store" <img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47 25 PM" src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3" /> ### Create New Vector Store <img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47 49 PM" src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158" /> ### Edit Vector Store <img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48 32 PM" src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414" /> ### Vector Store Files Contents page (with Embeddings) <img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54 32 PM" src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27" /> ### Vector Store Files Contents Details page (with Embeddings) <img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55 00 PM" src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c" /> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Tests added for Middleware extension and Provider failures. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-11-12 09:59:48 -08:00
Charlie Doern	37853ca558	fix(tests): add OpenAI client connection cleanup to prevent CI hangs (#4119 ) # What does this PR do? Add explicit connection cleanup and shorter timeouts to OpenAI client fixtures. Fixes CI deadlock after 25+ tests due to connection pool exhaustion. Also adds 60s timeout to test_conversation_context_loading as safety net. ## Test Plan tests pass Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-12 12:17:13 -05:00
Omar Abdelwahab	5c6f713354	Merge branch 'main' into add-mcp-authentication-param	2025-11-10 15:13:45 -08:00
Shabana Baig	433438cfc0	feat: Implement the 'max_tool_calls' parameter for the Responses API (#4062 ) # Problem Responses API uses max_tool_calls parameter to limit the number of tool calls that can be generated in a response. Currently, LLS implementation of the Responses API does not support this parameter. # What does this PR do? This pull request adds the max_tool_calls field to the response object definition and updates the inline provider. it also ensures that: - the total number of calls to built-in and mcp tools do not exceed max_tool_calls - an error is thrown if max_tool_calls < 1 (behavior seen with the OpenAI Responses API, but we can change this if needed) Closes #[3563](https://github.com/llamastack/llama-stack/issues/3563) ## Test Plan - Tested manually for change in model response w.r.t supplied max_tool_calls field. - Added integration tests to test invalid max_tool_calls parameter. - Added integration tests to check max_tool_calls parameter with built-in and function tools. - Added integration tests to check max_tool_calls parameter in the returned response object. - Recorded OpenAI Responses API behavior using a sample script: https://github.com/s-akhtar-baig/llama-stack-examples/blob/main/responses/src/max_tool_calls.py Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-10 13:21:27 -08:00
Omar Abdelwahab	114ab693a5	Merge branch 'main' into add-mcp-authentication-param	2025-11-10 13:19:12 -08:00
Dennis Kennetz	209a78b618	feat: add oci genai service as chat inference provider (#3876 ) # What does this PR do? Adds OCI GenAI PaaS models for openai chat completion endpoints. ## Test Plan In an OCI tenancy with access to GenAI PaaS, perform the following steps: 1. Ensure you have IAM policies in place to use service (check docs included in this PR) 2. For local development, [setup OCI cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm) and configure the CLI with your region, tenancy, and auth [here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm) 3. Once configured, go through llama-stack setup and run llama-stack (uses config based auth) like: ```bash OCI_AUTH_TYPE=config_file \ OCI_CLI_PROFILE=CHICAGO \ OCI_REGION=us-chicago-1 \ OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \ llama stack run oci ``` 4. Hit the `models` endpoint to list models after server is running: ```bash curl http://localhost:8321/v1/models \| jq ... { "identifier": "meta.llama-4-scout-17b-16e-instruct", "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q", "provider_id": "oci", "type": "model", "metadata": { "display_name": "meta.llama-4-scout-17b-16e-instruct", "capabilities": [ "CHAT" ], "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q" }, "model_type": "llm" }, ... ``` 5. Use the "display_name" field to use the model in a `/chat/completions` request: ```bash # Streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": true, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' # Non-streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": false, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' ``` 6. Try out other models from the `/models` endpoint.	2025-11-10 16:16:24 -05:00
ehhuang	d4ecbfd092	fix(vector store)!: fix file content API (#4105 ) # What does this PR do? - changed to match https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml ## Test Plan updated test CI	2025-11-10 10:16:35 -08:00
Omar Abdelwahab	c353873774	precommit run	2025-11-07 14:54:33 -08:00
Omar Abdelwahab	0f0aa6a6c5	fix: correct import path for LlamaStackAsLibraryClient in test Fixed incorrect import in test_mcp_authentication.py: - Changed: from llama_stack import LlamaStackAsLibraryClient - To: from llama_stack.core.library_client import LlamaStackAsLibraryClient This aligns with the correct import pattern used in other test files.	2025-11-07 14:49:27 -08:00
Omar Abdelwahab	735831206d	fix: update tests to use new mcp_authorization field Updates integration tests to use the new mcp_authorization field instead of the old method of passing Authorization in mcp_headers. Changes: - tests/integration/tool_runtime/test_mcp.py - tests/integration/inference/test_tools_with_schemas.py - tests/integration/tool_runtime/test_mcp_json_schema.py (6 occurrences) All tests now use: provider_data = {"mcp_authorization": {uri: AUTH_TOKEN}} Instead of the old rejected format: provider_data = {"mcp_headers": {uri: {"Authorization": f"Bearer {AUTH_TOKEN}"}}} This aligns with the security architecture that prevents accidentally leaking inference tokens to MCP servers.	2025-11-07 14:46:30 -08:00
Omar Abdelwahab	1a7ba683e3	Merge branch 'main' into add-mcp-authentication-param	2025-11-07 14:26:06 -08:00
Omar Abdelwahab	ccb870c8fb	precommit	2025-11-07 12:14:42 -08:00
Omar Abdelwahab	8ce30b71f4	test: update error message match for authorization validation Updated test_mcp_authorization_error_when_header_provided to match the new validation error message from the Pydantic validator.	2025-11-07 10:52:40 -08:00
Ashwin Bharambe	aa2bd82b1d	fix(ci): add recordings for responses suite due to web search type changing (#4104 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test llama stack list-deps / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 1m3s Details #4103 broke (even though the PR itself was green) trunk	2025-11-07 10:42:07 -08:00
Aakanksha Duggal	b83184f7ef	feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103 ) # What does this PR do? Resolves #4102 1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and the `OpenAIResponseInputToolWebSearch.type` Literal union 2. No changes needed to tool execution logic - all `web_search` types map to the same underlying tool 3. Backward compatibility is maintained - existing `web_search`, `web_search_preview`, and `web_search_preview_2025_03_11` types continue to work 4. Added an integration test case using {"type": "web_search_2025_08_26"} to verify it works correctly 5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to reflect that `web_search_2025_08_26` is now supported. 6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist in the codebase) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Aakanksha Duggal <aduggal@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-07 10:01:12 -08:00
Ashwin Bharambe	f49cb0b717	chore: Stack server no longer depends on llama-stack-client (#4094 ) This dependency has been bothering folks for a long time (cc @leseb). We really needed it due to "library client" which is primarily used for our tests and is not a part of the Stack server. Anyone who needs to use the library client can certainly install `llama-stack-client` in their environment to make that work. Updated the notebook references to install `llama-stack-client` additionally when setting things up.	2025-11-07 09:54:09 -08:00
Ashwin Bharambe	b68a25d377	fix(tests): bring back some responses tests (#4098 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 1m6s Details https://github.com/llamastack/llama-stack/pull/4055 cleaned the agents implementation but while doing so it removed some tests which actually corresponded to the responses implementation. This PR brings those tests and assocated recordings back. (We should likely combine all responses tests into one suite, but that is beyond the scope of this PR.)	2025-11-07 07:49:38 +01:00
Omar Abdelwahab	d08c529ac0	formatting issues	2025-11-06 12:43:24 -08:00
Omar Abdelwahab	5ce48d2c6a	precommit	2025-11-06 12:02:45 -08:00
Omar Abdelwahab	ac9442eb92	fix: update test_mcp to use authorization parameter instead of headers Changed tool_defs in test_mcp_invocation to use 'authorization' parameter instead of passing Authorization via headers dict for security compliance.	2025-11-06 11:46:45 -08:00
Omar Abdelwahab	dbe41d9510	Updated a single test case to not include authorization field in the header	2025-11-06 11:08:27 -08:00
Omar Abdelwahab	d58da03e40	fix: update test to use authorization parameter instead of headers For security reasons, reject Authorization header in headers dict and require use of the dedicated authorization parameter instead.	2025-11-06 11:07:21 -08:00
Omar Abdelwahab	18aff1abaa	rejecting headers that include Authorization in the header and pointing them to the authorization param.	2025-11-06 10:59:45 -08:00
Derek Higgins	dc9497a3b2	ci: Temperarily disable Telemetry during tests (#4090 ) Closes: #4089 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-11-06 17:53:02 +01:00
Derek Higgins	03d23db910	ci: vllm ci job update (#4088 ) Add missing recording for vllm in library mode Add Docker env (missed during rebase) Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-11-06 16:59:55 +01:00
Derek Higgins	c62a09ab76	ci: Add vLLM support to integration testing infrastructure (with qwen) (#3545 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Pre-commit / pre-commit (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 14s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 22s Details UI Tests / ui-tests (22) (push) Successful in 57s Details o Introduces vLLM provider support to the record/replay testing framework o Enabling both recording and replay of vLLM API interactions alongside existing Ollama support. The changes enable testing of vLLM functionality. vLLM tests focus on inference capabilities, while Ollama continues to exercise the full API surface including vision features. -- This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B appears to be more capable at structure output and tool calls. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-06 10:36:40 +01:00
ehhuang	b335419faa	fix: actualize chunking strategy in vector store create API (#4086 ) # What does this PR do? - when create vector store is called without chunk strategy, we actually the strategy used so that the value is persisted instead of strategy='None' ## Test Plan updated tests	2025-11-05 15:47:54 -08:00
ehhuang	9d5c34af27	fix!: BREAKING CHANGE: vector_store: search API response fix (#4080 ) # What does this PR do? - search_query in the vector store search API should be a list, according to https://github.com/openai/openai-openapi ## Test Plan modified tests --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080). * #4086 * __->__ #4080	2025-11-05 15:01:48 -08:00
Omar Abdelwahab	411b18a90f	Merge branch 'main' into add-mcp-authentication-param	2025-11-05 14:12:32 -08:00
ehhuang	84a84ee85c	fix: last_id when listing files in vector store (#4079 ) # What does this PR do? the last_id should be the id of the last item in the returned list, not the unfiltered list. ## Test Plan fixed test	2025-11-05 14:10:10 -08:00
Omar Abdelwahab	dcb3dc4211	raising an error when the authentication field is present in the authorization field and in the header	2025-11-05 11:41:02 -08:00
Omar Abdelwahab	09ef0b38c1	Updated the authentication field to take just the token	2025-11-05 10:49:35 -08:00
Emilio Garcia	ba50790a28	feat(tests): metrics tests (#3966 ) # What does this PR do? 1. Make telemetry tests as easy as possible for users by expanding the `SpanStub` data class and creating the `MetricStub` dataclass as a way to consistently marshal telemetry data in test fixtures and unmarshal and handle it in tests. 2. Structure server and client tests to always follow the same standards for consistent testing experience by using the `SpanStub` and `MetricStub` data class objects. 3. Enable Metrics Testing for completions endpoint 4. Correct token metrics to use histograms instead of counts to capture tokens per request rather than a cumulative count of tokens over the lifecycle of the server. ## Test Plan These are tests	2025-11-05 10:26:15 -08:00
Ashwin Bharambe	4d3069bfa5	chore(ci): remove unused recordings (#4074 ) Added a script to cleanup recordings. While doing this, moved the CI matrix generation to a separate script so there is a single source of truth for the matrix. Ran the cleanup script as: ``` PYTHONPATH=. python scripts/cleanup_recordings.py ``` Also added this as part of the pre-commit workflow to ensure that the recordings are always up to date and that no stale recordings are left in the repo.	2025-11-05 09:21:58 -08:00
Omar Abdelwahab	8632c705aa	Merge branch 'main' into add-mcp-authentication-param	2025-11-04 16:20:38 -08:00
Omar Abdelwahab	5c5f6f7e65	updated the test script	2025-11-04 15:36:09 -08:00
Ashwin Bharambe	a8a8aa56c0	chore!: remove the agents (sessions and turns) API (#4055 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details - Removes the deprecated agents (sessions and turns) API that was marked alpha in 0.3.0 - Cleans up unused imports and orphaned types after the API removal - Removes `SessionNotFoundError` and `AgentTurnInputType` which are no longer needed The agents API is completely superseded by the Responses + Conversations APIs, and the client SDK Agent class already uses those implementations. Corresponding client-side PR: https://github.com/llamastack/llama-stack-client-python/pull/295	2025-11-04 09:38:39 -08:00
Ashwin Bharambe	cb40da210f	fix: update tests for OpenAI-style models endpoint (#4053 ) The llama-stack-client now uses /`v1/openai/v1/models` which returns OpenAI-compatible model objects with 'id' and 'custom_metadata' fields instead of the Resource-style 'identifier' field. Updated api_recorder to handle the new endpoint and modified tests to access model metadata appropriately. Deleted stale model recordings for re-recording. NOTE: CI will be red on this one since it is dependent on https://github.com/llamastack/llama-stack-client-python/pull/291/files landing. I verified locally that it is green.	2025-11-03 17:30:08 -08:00
Omar Abdelwahab	1143db0f64	added a fix	2025-11-03 16:55:13 -08:00
Omar Abdelwahab	c49fef8087	precommit	2025-11-03 16:12:38 -08:00
Omar Abdelwahab	57eb575ea1	Added minor changes	2025-11-03 15:57:45 -08:00
Omar Abdelwahab	d0a8878337	MCP authentication parameter implementation	2025-11-03 15:48:56 -08:00
Derek Higgins	1562277cfd	ci: test adjustments for Qwen3-0.6B (#3978 ) Without this hint Qwen3-0.6B tends to reply with the full name and sometimes doesn't reply with the correct drafted year. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 12:19:35 -08:00
raghotham	62603d25c2	chore(api)!: /v1/inspect only lists v1 apis by default (#3948 ) # What does this PR do? Allow filtering for v1alpha, v1beta, deprecated and v1. Backward incompatible change since by default it only returns v1 apis now. ## Test Plan added unit test	2025-10-31 11:55:46 -07:00
Jiayi Ni	fa7699d2c3	feat: Add rerank API for NVIDIA Inference Provider (#3329 ) # What does this PR do? Add rerank API for NVIDIA Inference Provider. <!-- If resolving an issue, uncomment and update the line below --> Closes #3278 ## Test Plan Unit test: ``` pytest tests/unit/providers/nvidia/test_rerank_inference.py ``` Integration test: ``` pytest -s -v tests/integration/inference/test_rerank.py --stack-config="inference=nvidia" --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3 --env NVIDIA_API_KEY="" --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" ```	2025-10-30 21:42:09 -07:00

1 2 3 4 5 ...

361 commits