llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
ehhuang	b335419faa	fix: actualize chunking strategy in vector store create API (#4086 ) # What does this PR do? - when create vector store is called without chunk strategy, we actually the strategy used so that the value is persisted instead of strategy='None' ## Test Plan updated tests	2025-11-05 15:47:54 -08:00
Roy Belio	c672a5d792	feat: ability to use postgres as store for starter distro (#4076 ) ## What does this PR do? The starter distribution now comes with all the required packages to support persistent stores—like the agent store, metadata, and inference—using PostgreSQL. Users can enable PostgreSQL support by setting the `ENABLE_POSTGRES_STORE=1` environment variable. This PR consolidates the functionality from the removed `postgres-demo` distribution into the starter distribution, reducing maintenance overhead. Closes: #2619 Supersedes: #2851 (rebased and updated) ## Changes Made 1. Added PostgreSQL support to starter distribution - New `run-with-postgres-store.yaml` configuration - Automatic config switching via `ENABLE_POSTGRES_STORE` environment variable - Removed separate `postgres-demo` distribution 2. Updated to new build system - Integrated postgres switching logic into Containerfile entrypoint - Uses new `storage_backends` and `storage_stores` API - Properly configured both PostgreSQL KV store and SQL store 3. Updated dependencies - Added `psycopg2-binary` and `asyncpg` to starter distribution - All postgres-related dependencies automatically included ## How to Use ### With Docker (PostgreSQL): ```bash docker run \ -e ENABLE_POSTGRES_STORE=1 \ -e POSTGRES_HOST=your_postgres_host \ -e POSTGRES_PORT=5432 \ -e POSTGRES_DB=llamastack \ -e POSTGRES_USER=llamastack \ -e POSTGRES_PASSWORD=llamastack \ -e OPENAI_API_KEY=your_key \ llamastack/distribution-starter ``` ### PostgreSQL environment variables: - `POSTGRES_HOST`: Postgres host (default: `localhost`) - `POSTGRES_PORT`: Postgres port (default: `5432`) - `POSTGRES_DB`: Postgres database name (default: `llamastack`) - `POSTGRES_USER`: Postgres username (default: `llamastack`) - `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`) ## Test Plan All pre-commit hooks pass (mypy, ruff, distro-codegen) `llama stack list-deps starter` confirms psycopg2-binary is included Storage configuration correctly uses PostgreSQL backends Container builds successfully with postgres support ## Credits Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to work with latest main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Sébastien Han @leseb --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-11-05 15:37:06 -08:00
ehhuang	9d5c34af27	fix!: BREAKING CHANGE: vector_store: search API response fix (#4080 ) # What does this PR do? - search_query in the vector store search API should be a list, according to https://github.com/openai/openai-openapi ## Test Plan modified tests --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080). * #4086 * __->__ #4080	2025-11-05 15:01:48 -08:00
Omar Abdelwahab	411b18a90f	Merge branch 'main' into add-mcp-authentication-param	2025-11-05 14:12:32 -08:00
ehhuang	84a84ee85c	fix: last_id when listing files in vector store (#4079 ) # What does this PR do? the last_id should be the id of the last item in the returned list, not the unfiltered list. ## Test Plan fixed test	2025-11-05 14:10:10 -08:00
Omar Abdelwahab	7db4ed7bbb	fix: update MCP tool runtime provider to use new function signatures Updated list_mcp_tools and invoke_mcp_tool calls to use named parameters instead of positional arguments to match the refactored API signatures.	2025-11-05 13:21:12 -08:00
Omar Abdelwahab	76fdff4a85	created a single helper function and updated list_mcp_tools and invoke_mcp_tool. Removed the comments in openai_responses.py	2025-11-05 13:12:28 -08:00
Ashwin Bharambe	d9cf5cd480	fix(ci): use --no-cache instead of --no-cache-dir (#4081 ) This is necessary to make sure GPU dockers can be built on CI without running out of space.	2025-11-05 12:14:02 -08:00
Omar Abdelwahab	a605cc2e14	formatting	2025-11-05 11:45:01 -08:00
Omar Abdelwahab	dcb3dc4211	raising an error when the authentication field is present in the authorization field and in the header	2025-11-05 11:41:02 -08:00
Charlie Doern	c899b50723	fix: print help for list-deps if no args (#4078 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / generate-matrix (push) Successful in 4s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test llama stack list-deps / generate-matrix (push) Successful in 5s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details Pre-commit / pre-commit (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 16s Details UI Tests / ui-tests (22) (push) Successful in 57s Details # What does this PR do? list-deps takes positional args OR things like --providers the issue with this, is that these args need to be optional since by nature, one or the other can be specified. add a check to list-deps that checks `if not args.providers and not args.config`. If this is true, help is printed and we exit. resolves #4075 ## Test Plan before: ``` ╰─ llama stack list-deps Traceback (most recent call last): File "/Users/charliedoern/projects/Documents/llama-stack/venv/bin/llama", line 10, in <module> sys.exit(main()) ^^^^^^ File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 52, in main parser.run(args) File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 43, in run args.func(args) File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/list_deps.py", line 51, in _run_stack_list_deps_command return run_stack_list_deps_command(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/_list_deps.py", line 135, in run_stack_list_deps_command normal_deps, special_deps, external_provider_dependencies = get_provider_dependencies(build_config) ^^^^^^^^^^^^ UnboundLocalError: cannot access local variable 'build_config' where it is not associated with a value ``` after: ``` ╰─ llama stack list-deps usage: llama stack list-deps [-h] [--providers PROVIDERS] [--format {uv,deps-only}] [config \| distro] list the dependencies for a llama stack distribution positional arguments: config \| distro Path to config file to use or name of known distro (llama stack list for a list). (default: None) options: -h, --help show this help message and exit --providers PROVIDERS sync dependencies for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple providers per API. (default: None) --format {uv,deps-only} Output format: 'uv' shows shell commands, 'deps-only' shows just the list of dependencies without `uv` (default) (default: deps-only) ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-05 11:34:08 -08:00
Omar Abdelwahab	b8c24198eb	precommit	2025-11-05 11:16:11 -08:00
Omar Abdelwahab	09ef0b38c1	Updated the authentication field to take just the token	2025-11-05 10:49:35 -08:00
Wojciech-Rebisz	07c28cd519	fix: Avoid model_limits KeyError (#4060 ) # What does this PR do? It avoids model_limit KeyError while trying to get embedding models for Watsonx <!-- If resolving an issue, uncomment and update the line below --> Closes https://github.com/llamastack/llama-stack/issues/4059 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Start server with watsonx distro: ```bash llama stack list-deps watsonx \| xargs -L1 uv pip install uv run llama stack run watsonx ``` Run ```python client = LlamaStackClient(base_url=base_url) client.models.list() ``` Check if there is any embedding model available (currently there is not a single one)	2025-11-05 10:34:40 -08:00
Emilio Garcia	ba50790a28	feat(tests): metrics tests (#3966 ) # What does this PR do? 1. Make telemetry tests as easy as possible for users by expanding the `SpanStub` data class and creating the `MetricStub` dataclass as a way to consistently marshal telemetry data in test fixtures and unmarshal and handle it in tests. 2. Structure server and client tests to always follow the same standards for consistent testing experience by using the `SpanStub` and `MetricStub` data class objects. 3. Enable Metrics Testing for completions endpoint 4. Correct token metrics to use histograms instead of counts to capture tokens per request rather than a cumulative count of tokens over the lifecycle of the server. ## Test Plan These are tests	2025-11-05 10:26:15 -08:00
Roy Belio	2619f3552e	fix: show built-in distributions in llama stack list (#4040 ) # What does this PR do? Fixes issue #3922 where `llama stack list` only showed distributions after they were run. This PR makes the command show all available distributions immediately on a fresh install. Closes #3922 ## Changes - Updated `_get_distribution_dirs()` to discover both built-in and built distributions: - Built-in distributions from `src/llama_stack/distributions/` (e.g., starter, nvidia, dell) - Built distributions from `~/.llama/distributions` - Added a "Source" column to distinguish between "built-in" and "built" distributions - Built distributions override built-in ones with the same name (expected behavior) - Updated config file detection logic to handle both naming conventions: - Built-in: `build.yaml` and `run.yaml` - Built: `{name}-build.yaml` and `{name}-run.yaml` ## Test Plan ### Unit Tests Added comprehensive unit tests in `tests/unit/distribution/test_stack_list.py`: ```bash uv run pytest tests/unit/distribution/test_stack_list.py -v ``` Result: ✅ All 8 tests pass - `test_builtin_distros_shown_without_running` - Verifies the core fix for issue #3922 - `test_builtin_and_built_distros_shown_together` - Ensures both types are shown - `test_built_distribution_overrides_builtin` - Tests override behavior - `test_empty_distributions` - Edge case handling - `test_config_files_detection_builtin` - Config file detection for built-in distros - `test_config_files_detection_built` - Config file detection for built distros - `test_llamastack_prefix_stripped` - Name normalization - `test_hidden_directories_ignored` - Filters hidden directories ### Manual Testing Before the fix (simulated with empty `~/.llama/distributions`): ```bash $ llama stack list No stacks found in ~/.llama/distributions ``` After the fix: ```bash $ llama stack list ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Stack Name ┃ Source ┃ Path ┃ Build Config ┃ Run Config ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ ci-tests │ built-in │ /path/to/src/... │ Yes │ Yes │ │ dell │ built-in │ /path/to/src/... │ Yes │ Yes │ │ meta-reference-g… │ built-in │ /path/to/src/... │ Yes │ Yes │ │ nvidia │ built-in │ /path/to/src/... │ Yes │ Yes │ │ open-benchmark │ built-in │ /path/to/src/... │ Yes │ Yes │ │ postgres-demo │ built-in │ /path/to/src/... │ Yes │ Yes │ │ starter │ built-in │ /path/to/src/... │ Yes │ Yes │ │ starter-gpu │ built-in │ /path/to/src/... │ Yes │ Yes │ │ watsonx │ built-in │ /path/to/src/... │ Yes │ Yes │ └───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘ ``` After running a distribution: ```bash $ llama stack run starter # Creates ~/.llama/distributions/starter $ llama stack list ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Stack Name ┃ Source ┃ Path ┃ Build Config ┃ Run Config ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ ... │ built-in │ ... │ Yes │ Yes │ │ starter │ built │ ~/.llama/distri… │ No │ No │ │ ... │ built-in │ ... │ Yes │ Yes │ └───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘ ``` Note how `starter` now shows as "built" and points to `~/.llama/distributions`, overriding the built-in version. ## Breaking Changes No breaking changes - This is a bug fix that improves user experience with minimal risk: - No programmatic parsing of output found in the codebase - Table format is clearly for human consumption - The new "Source" column helps users understand where distributions come from - The behavior change is exactly what users expect (seeing all available distributions) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-11-05 10:16:28 -08:00
Ashwin Bharambe	4d3069bfa5	chore(ci): remove unused recordings (#4074 ) Added a script to cleanup recordings. While doing this, moved the CI matrix generation to a separate script so there is a single source of truth for the matrix. Ran the cleanup script as: ``` PYTHONPATH=. python scripts/cleanup_recordings.py ``` Also added this as part of the pre-commit workflow to ensure that the recordings are always up to date and that no stale recordings are left in the repo.	2025-11-05 09:21:58 -08:00
Sébastien Han	fd1603beef	chore: remove unused classes (#4077 ) # What does this PR do? These were maybe be included in the webmethod? The unit test was pointless too since the request was never used anywhere? This shouldn't be in the API definition, if we never consume it. ## Test Plan CI with pre-commit on OpenAPI spec generation. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-05 16:45:23 +01:00
Omar Abdelwahab	8632c705aa	Merge branch 'main' into add-mcp-authentication-param	2025-11-04 16:20:38 -08:00
Ashwin Bharambe	392e01dc79	chore: add stainless config Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Pre-commit / pre-commit (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 1m13s Details name it to indicate it is not yet source of truth to avoid confusion	2025-11-04 15:44:07 -08:00
Omar Abdelwahab	5c5f6f7e65	updated the test script	2025-11-04 15:36:09 -08:00
ehhuang	95b0493fae	chore: move src/llama_stack/ui to src/llama_stack_ui (#4068 ) # What does this PR do? This better separates UI from backend code, which was a point of confusion often for our beloved AI friends. ## Test Plan CI	2025-11-04 15:21:49 -08:00
Ashwin Bharambe	5850e3473f	fix: remove straggler openapi HTML file	2025-11-04 14:54:33 -08:00
Ashwin Bharambe	0c49a53c97	chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067 ) RAG aka file search is implemented via the Responses API by specifying the file-search tool. The backend implementation remains unchanged. This PR merely removes the directly exposed API surface which allowed users to directly perform searches from the client. This facility is now available via the `client.vector_store.search()` OpenAI compatible API.	2025-11-04 14:50:54 -08:00
Omar Abdelwahab	c911e9a3c1	minor formatting change	2025-11-04 13:19:39 -08:00
Omar Abdelwahab	6bd0d644d1	reverting some formatting	2025-11-04 13:18:28 -08:00
Omar Abdelwahab	a23ee35b24	reverting some formatting changes	2025-11-04 13:10:46 -08:00
Omar Abdelwahab	59793ac63b	minor linting change	2025-11-04 12:51:19 -08:00
Omar Abdelwahab	1db14ca4a3	removed _convert_authorization_to_headers	2025-11-04 12:46:52 -08:00
Omar Abdelwahab	abc717ed1d	reverted some formatting changes	2025-11-04 12:39:48 -08:00
Omar Abdelwahab	fec6f20792	reverted some formatting changes	2025-11-04 11:56:32 -08:00
Omar Abdelwahab	0487496ce1	precommit	2025-11-04 11:54:25 -08:00
Omar Abdelwahab	d2103eb868	precommit	2025-11-04 11:29:40 -08:00
Ashwin Bharambe	a8a8aa56c0	chore!: remove the agents (sessions and turns) API (#4055 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details - Removes the deprecated agents (sessions and turns) API that was marked alpha in 0.3.0 - Cleans up unused imports and orphaned types after the API removal - Removes `SessionNotFoundError` and `AgentTurnInputType` which are no longer needed The agents API is completely superseded by the Responses + Conversations APIs, and the client SDK Agent class already uses those implementations. Corresponding client-side PR: https://github.com/llamastack/llama-stack-client-python/pull/295	2025-11-04 09:38:39 -08:00
Mustafa Elbehery	a6ddbae0ed	chore(test): migrate unit tests from `unittest` to `pytest` nvidia test eval (#3249 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details UI Tests / ui-tests (22) (push) Successful in 1m16s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR migrates `unittest` to `pytest` in `tests/unit/providers/nvidia/test_eval.py`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Part of https://github.com/llamastack/llama-stack/issues/2680 Supersedes https://github.com/llamastack/llama-stack/pull/2791 Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-11-04 10:29:07 +01:00
Omar Abdelwahab	9dbeeaca97	Removed the MCPAuthorization class relying on bearer token	2025-11-03 19:57:58 -08:00
Ashwin Bharambe	053fc0ac39	chore!: remove all deprecated routes (including /openai/v1/ ones) (#4054 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m13s Details This PR removes all routes which we had marked deprecated for the 0.3.0 release. This includes: - all the `/v1/openai/v1/` routes (the corresponding /v1 routes still exist of course) - the /agents API (which is superseded completely by Responses + Conversations) - several alpha routes which had a "v1" route to aide transitioning to "v1alpha" This is the corresponding client-python change: https://github.com/llamastack/llama-stack-client-python/pull/294	2025-11-03 19:00:59 -08:00
Nathan Weinberg	62b3ad349a	fix: return to hardcoded model IDs for Vertex AI (#4041 ) # What does this PR do? partial revert of `b67aef2` Vertex AI doesn't offer an endpoint for listing models from Google's Model Garden Return to hardcoded values until such an endpoint is available Closes #3988 ## Test Plan Server side, set up your Vertex AI env vars (`VERTEX_AI_PROJECT`, `VERTEX_AI_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`) and run the starter distribution ```bash $ llama stack list-deps starter \| xargs -L1 uv pip install $ llama stack run starter ``` Client side, formerly broken cURL requests now working ```bash $ curl http://127.0.0.1:8321/v1/models \| jq '.data \| map(select(.provider_id == "vertexai"))' [ { "identifier": "vertexai/vertex_ai/gemini-2.0-flash", "provider_resource_id": "vertex_ai/gemini-2.0-flash", "provider_id": "vertexai", "type": "model", "metadata": {}, "model_type": "llm" }, { "identifier": "vertexai/vertex_ai/gemini-2.5-flash", "provider_resource_id": "vertex_ai/gemini-2.5-flash", "provider_id": "vertexai", "type": "model", "metadata": {}, "model_type": "llm" }, { "identifier": "vertexai/vertex_ai/gemini-2.5-pro", "provider_resource_id": "vertex_ai/gemini-2.5-pro", "provider_id": "vertexai", "type": "model", "metadata": {}, "model_type": "llm" } ] $ curl -fsS http://127.0.0.1:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"vertexai/vertex_a i/gemini-2.5-flash\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}], \"max_tokens\": 128, \"temperature\": 0.0}" \| jq { "id": "p8oIaYiQF8_PptQPo-GH8QQ", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": null, "message": { "content": "Hello there! How can I help you today?", "refusal": null, "role": "assistant", "annotations": null, "audio": null, "function_call": null, "tool_calls": null } } ], ... ``` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-11-03 17:38:16 -08:00
Ashwin Bharambe	cb40da210f	fix: update tests for OpenAI-style models endpoint (#4053 ) The llama-stack-client now uses /`v1/openai/v1/models` which returns OpenAI-compatible model objects with 'id' and 'custom_metadata' fields instead of the Resource-style 'identifier' field. Updated api_recorder to handle the new endpoint and modified tests to access model metadata appropriately. Deleted stale model recordings for re-recording. NOTE: CI will be red on this one since it is dependent on https://github.com/llamastack/llama-stack-client-python/pull/291/files landing. I verified locally that it is green.	2025-11-03 17:30:08 -08:00
Omar Abdelwahab	376f0fcd23	minor fix	2025-11-03 17:02:30 -08:00
Omar Abdelwahab	1143db0f64	added a fix	2025-11-03 16:55:13 -08:00
Omar Abdelwahab	c49fef8087	precommit	2025-11-03 16:12:38 -08:00
Sébastien Han	4a5ef65286	chore!: remove SDG API (#4035 ) # What does this PR do? This API hasn't received any traction and close to zero interest from the community. Let's revisit in the future if things change. Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 16:12:06 -08:00
Omar Abdelwahab	57eb575ea1	Added minor changes	2025-11-03 15:57:45 -08:00
Ashwin Bharambe	44096512b5	feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051 ) We need to remove `/v1/openai/v1` paths shortly. There is one trouble -- our current `/v1/openai/v1/models` endpoint provides different data than `/v1/models`. Unfortunately our tests target the latter (llama-stack customized) behavior. We need to get to true OpenAI compatibility. This is step 1: adding `custom_metadata` field to `OpenAIModel` that includes all the extra stuff we add in the native `/v1/models` response. This can be extracted on the consumer end by look at `__pydantic_extra__` or other similar fields. This PR: - Adds `custom_metadata` field to `OpenAIModel` class in `src/llama_stack/apis/models/models.py` - Modified `openai_list_models()` in `src/llama_stack/core/routing_tables/models.py` to populate custom_metadata Next Steps 1. Update stainless client to use `/v1/openai/v1/models` instead of `/v1/models` 2. Migrate tests to read from `custom_metadata` 3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single `/v1/models` endpoint	2025-11-03 15:56:07 -08:00
Omar Abdelwahab	d0a8878337	MCP authentication parameter implementation	2025-11-03 15:48:56 -08:00
Ashwin Bharambe	2381714904	fix: enable SQLite WAL mode to prevent database locking errors (#4048 ) Fixes race condition causing "database is locked" errors during concurrent writes to SQLite, particularly in streaming responses with guardrails where multiple inference calls write simultaneously. Enable Write-Ahead Logging (WAL) mode for SQLite which allows multiple concurrent readers and one writer without blocking. Set busy_timeout to 5s so SQLite retries instead of failing immediately. Remove the logic that disabled write queues for SQLite since WAL mode eliminates the locking issues that prompted disabling them. Fixes: test_output_safety_guardrails_safe_content[stream=True] flake	2025-11-03 15:27:41 -08:00
ehhuang	628e38b3d5	test: always start a new server in integration-tests.sh (#4050 ) # What does this PR do? This prevents interference from already running servers, and allows multiple concurrent integration test runs. Unleash the AIs! ## Test Plan start a LS server at port 8321 Then observe test uses port 8322: ❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern '(telemetry or safety)' === Llama Stack Integration Test Runner === Stack Config: server:ci-tests Setup: ollama Inference Mode: replay Test Suite: base Test Subdirs: Test Pattern: (telemetry or safety) Checking llama packages llama-stack 0.4.0.dev0 /Users/erichuang/projects/new_test_server llama-stack-client 0.3.0 ollama 0.6.0 === Applying Setup Environment Variables === Setting SQLITE_STORE_DIR: /var/folders/cz/vyh7y1d11xg881lsxsshnc5c0000gn/T/tmp.bKLsaVAxyU Setting stack config type: server Setting up environment variables: export OLLAMA_URL='http://0.0.0.0:11434' export SAFETY_MODEL='ollama/llama-guard3:1b' Will use port: 8322 === Starting Llama Stack Server === Waiting for Llama Stack Server to start on port 8322... ✅ Llama Stack Server started successfully	2025-11-03 15:23:10 -08:00
Sébastien Han	da57b51fb6	ci: introduce Mergify bot to notify on PR conflicts (#4043 ) This commit introduces Mergify, a powerful bot designed to assist with automated merging and other CI-related tasks. As an initial step, we enable a basic feature: automatically notifying users when a pull request has merge conflicts. When a conflict is detected, Mergify will add a label to the PR. This label will be removed once the conflict is resolved. This is foundation PR to activate the bot and start using it for backports too. In the future, we plan to expand Mergify’s role to include auto-merging, as discussed in #1667, once the project is ready. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-03 12:21:19 -08:00
Derek Higgins	1562277cfd	ci: test adjustments for Qwen3-0.6B (#3978 ) Without this hint Qwen3-0.6B tends to reply with the full name and sometimes doesn't reply with the correct drafted year. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 12:19:35 -08:00

1 2 3 4 5 ...

3166 commits