llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

Author	SHA1	Message	Date
Charlie Doern	9df073450f	feat: remove core.telemetry as a dependency of llama_stack.apis (#4064 ) Some checks failed Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 55s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details # What does this PR do? Remove circular dependency by moving tracing from API protocol definitions to router implementation layer. This gets us closer to having a self contained API package with no other cross-cutting dependencies to other parts of the llama stack codebase. To the best of our ability, the llama_stack.api should only be type and protocol definitions. Changes: - Create apis/common/tracing.py with marker decorator (zero core dependencies) - Add the _new_ `@telemetry_traceable` marker decorator to 11 protocol classes - Apply actual tracing in core/resolver.py in `instantiate_provider` based on protocol marker - Move MetricResponseMixin from core to apis (it's an API response type) - APIs package is now self-contained with zero core dependencies The tracing functionality remains identical - actual trace_protocol from core is applied to router implementations at runtime when both telemetry is enabled and the protocol has the `__marked_for_tracing__` marker. ## Test Plan Manual integration test confirms identical behavior to main branch: ```bash llama stack list-deps --format uv starter \| sh export OLLAMA_URL=http://localhost:11434 llama stack run starter curl -X POST http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "ollama/gpt-oss:20b", "messages": [{"role": "user", "content": "Say hello"}], "max_tokens": 10}' ``` Verified identical between main and this branch: - trace_id present in response - metrics array with prompt_tokens, completion_tokens, total_tokens - Server logs show trace_protocol applied to all routers Existing telemetry integration tests (tests/integration/telemetry/) validate trace context propagation and span attributes. relates to #3895 --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-06 10:58:30 -08:00
Derek Higgins	dc9497a3b2	ci: Temperarily disable Telemetry during tests (#4090 ) Closes: #4089 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-11-06 17:53:02 +01:00
Derek Higgins	03d23db910	ci: vllm ci job update (#4088 ) Add missing recording for vllm in library mode Add Docker env (missed during rebase) Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-11-06 16:59:55 +01:00
Derek Higgins	c62a09ab76	ci: Add vLLM support to integration testing infrastructure (with qwen) (#3545 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Pre-commit / pre-commit (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 14s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 22s Details UI Tests / ui-tests (22) (push) Successful in 57s Details o Introduces vLLM provider support to the record/replay testing framework o Enabling both recording and replay of vLLM API interactions alongside existing Ollama support. The changes enable testing of vLLM functionality. vLLM tests focus on inference capabilities, while Ollama continues to exercise the full API surface including vision features. -- This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B appears to be more capable at structure output and tool calls. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-06 10:36:40 +01:00
Ashwin Bharambe	bef1b044bd	refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 48s Details We'd like to remove the dependence of `llama-stack` on `llama-stack-client`. This is a necessary step. A few small cleanups - Enables `embeddings` now also - Remove ModelRegistryHelper dependency (unused) - Consolidate to auth_credential field via RemoteInferenceProviderConfig - Implement list_models() to fetch from downstream /v1/models ## Test Plan Tested using this script https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581 Output: ``` Listing models from downstream server... Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct -v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5'] Using LLM model: passthrough/ollama/llama3.2-vision:11b Making inference request... Response: 4. --- Testing streaming --- Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None) ... 5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None) ```	2025-11-05 18:15:11 -08:00
ehhuang	b335419faa	fix: actualize chunking strategy in vector store create API (#4086 ) # What does this PR do? - when create vector store is called without chunk strategy, we actually the strategy used so that the value is persisted instead of strategy='None' ## Test Plan updated tests	2025-11-05 15:47:54 -08:00
Roy Belio	c672a5d792	feat: ability to use postgres as store for starter distro (#4076 ) ## What does this PR do? The starter distribution now comes with all the required packages to support persistent stores—like the agent store, metadata, and inference—using PostgreSQL. Users can enable PostgreSQL support by setting the `ENABLE_POSTGRES_STORE=1` environment variable. This PR consolidates the functionality from the removed `postgres-demo` distribution into the starter distribution, reducing maintenance overhead. Closes: #2619 Supersedes: #2851 (rebased and updated) ## Changes Made 1. Added PostgreSQL support to starter distribution - New `run-with-postgres-store.yaml` configuration - Automatic config switching via `ENABLE_POSTGRES_STORE` environment variable - Removed separate `postgres-demo` distribution 2. Updated to new build system - Integrated postgres switching logic into Containerfile entrypoint - Uses new `storage_backends` and `storage_stores` API - Properly configured both PostgreSQL KV store and SQL store 3. Updated dependencies - Added `psycopg2-binary` and `asyncpg` to starter distribution - All postgres-related dependencies automatically included ## How to Use ### With Docker (PostgreSQL): ```bash docker run \ -e ENABLE_POSTGRES_STORE=1 \ -e POSTGRES_HOST=your_postgres_host \ -e POSTGRES_PORT=5432 \ -e POSTGRES_DB=llamastack \ -e POSTGRES_USER=llamastack \ -e POSTGRES_PASSWORD=llamastack \ -e OPENAI_API_KEY=your_key \ llamastack/distribution-starter ``` ### PostgreSQL environment variables: - `POSTGRES_HOST`: Postgres host (default: `localhost`) - `POSTGRES_PORT`: Postgres port (default: `5432`) - `POSTGRES_DB`: Postgres database name (default: `llamastack`) - `POSTGRES_USER`: Postgres username (default: `llamastack`) - `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`) ## Test Plan All pre-commit hooks pass (mypy, ruff, distro-codegen) `llama stack list-deps starter` confirms psycopg2-binary is included Storage configuration correctly uses PostgreSQL backends Container builds successfully with postgres support ## Credits Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to work with latest main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Sébastien Han @leseb --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-11-05 15:37:06 -08:00
ehhuang	9d5c34af27	fix!: BREAKING CHANGE: vector_store: search API response fix (#4080 ) # What does this PR do? - search_query in the vector store search API should be a list, according to https://github.com/openai/openai-openapi ## Test Plan modified tests --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080). * #4086 * __->__ #4080	2025-11-05 15:01:48 -08:00
ehhuang	84a84ee85c	fix: last_id when listing files in vector store (#4079 ) # What does this PR do? the last_id should be the id of the last item in the returned list, not the unfiltered list. ## Test Plan fixed test	2025-11-05 14:10:10 -08:00
Ashwin Bharambe	d9cf5cd480	fix(ci): use --no-cache instead of --no-cache-dir (#4081 ) This is necessary to make sure GPU dockers can be built on CI without running out of space.	2025-11-05 12:14:02 -08:00
Charlie Doern	c899b50723	fix: print help for list-deps if no args (#4078 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / generate-matrix (push) Successful in 4s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test llama stack list-deps / generate-matrix (push) Successful in 5s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 5s Details Pre-commit / pre-commit (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 16s Details UI Tests / ui-tests (22) (push) Successful in 57s Details # What does this PR do? list-deps takes positional args OR things like --providers the issue with this, is that these args need to be optional since by nature, one or the other can be specified. add a check to list-deps that checks `if not args.providers and not args.config`. If this is true, help is printed and we exit. resolves #4075 ## Test Plan before: ``` ╰─ llama stack list-deps Traceback (most recent call last): File "/Users/charliedoern/projects/Documents/llama-stack/venv/bin/llama", line 10, in <module> sys.exit(main()) ^^^^^^ File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 52, in main parser.run(args) File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 43, in run args.func(args) File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/list_deps.py", line 51, in _run_stack_list_deps_command return run_stack_list_deps_command(args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/_list_deps.py", line 135, in run_stack_list_deps_command normal_deps, special_deps, external_provider_dependencies = get_provider_dependencies(build_config) ^^^^^^^^^^^^ UnboundLocalError: cannot access local variable 'build_config' where it is not associated with a value ``` after: ``` ╰─ llama stack list-deps usage: llama stack list-deps [-h] [--providers PROVIDERS] [--format {uv,deps-only}] [config \| distro] list the dependencies for a llama stack distribution positional arguments: config \| distro Path to config file to use or name of known distro (llama stack list for a list). (default: None) options: -h, --help show this help message and exit --providers PROVIDERS sync dependencies for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple providers per API. (default: None) --format {uv,deps-only} Output format: 'uv' shows shell commands, 'deps-only' shows just the list of dependencies without `uv` (default) (default: deps-only) ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-05 11:34:08 -08:00
Wojciech-Rebisz	07c28cd519	fix: Avoid model_limits KeyError (#4060 ) # What does this PR do? It avoids model_limit KeyError while trying to get embedding models for Watsonx <!-- If resolving an issue, uncomment and update the line below --> Closes https://github.com/llamastack/llama-stack/issues/4059 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Start server with watsonx distro: ```bash llama stack list-deps watsonx \| xargs -L1 uv pip install uv run llama stack run watsonx ``` Run ```python client = LlamaStackClient(base_url=base_url) client.models.list() ``` Check if there is any embedding model available (currently there is not a single one)	2025-11-05 10:34:40 -08:00
Emilio Garcia	ba50790a28	feat(tests): metrics tests (#3966 ) # What does this PR do? 1. Make telemetry tests as easy as possible for users by expanding the `SpanStub` data class and creating the `MetricStub` dataclass as a way to consistently marshal telemetry data in test fixtures and unmarshal and handle it in tests. 2. Structure server and client tests to always follow the same standards for consistent testing experience by using the `SpanStub` and `MetricStub` data class objects. 3. Enable Metrics Testing for completions endpoint 4. Correct token metrics to use histograms instead of counts to capture tokens per request rather than a cumulative count of tokens over the lifecycle of the server. ## Test Plan These are tests	2025-11-05 10:26:15 -08:00
Roy Belio	2619f3552e	fix: show built-in distributions in llama stack list (#4040 ) # What does this PR do? Fixes issue #3922 where `llama stack list` only showed distributions after they were run. This PR makes the command show all available distributions immediately on a fresh install. Closes #3922 ## Changes - Updated `_get_distribution_dirs()` to discover both built-in and built distributions: - Built-in distributions from `src/llama_stack/distributions/` (e.g., starter, nvidia, dell) - Built distributions from `~/.llama/distributions` - Added a "Source" column to distinguish between "built-in" and "built" distributions - Built distributions override built-in ones with the same name (expected behavior) - Updated config file detection logic to handle both naming conventions: - Built-in: `build.yaml` and `run.yaml` - Built: `{name}-build.yaml` and `{name}-run.yaml` ## Test Plan ### Unit Tests Added comprehensive unit tests in `tests/unit/distribution/test_stack_list.py`: ```bash uv run pytest tests/unit/distribution/test_stack_list.py -v ``` Result: ✅ All 8 tests pass - `test_builtin_distros_shown_without_running` - Verifies the core fix for issue #3922 - `test_builtin_and_built_distros_shown_together` - Ensures both types are shown - `test_built_distribution_overrides_builtin` - Tests override behavior - `test_empty_distributions` - Edge case handling - `test_config_files_detection_builtin` - Config file detection for built-in distros - `test_config_files_detection_built` - Config file detection for built distros - `test_llamastack_prefix_stripped` - Name normalization - `test_hidden_directories_ignored` - Filters hidden directories ### Manual Testing Before the fix (simulated with empty `~/.llama/distributions`): ```bash $ llama stack list No stacks found in ~/.llama/distributions ``` After the fix: ```bash $ llama stack list ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Stack Name ┃ Source ┃ Path ┃ Build Config ┃ Run Config ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ ci-tests │ built-in │ /path/to/src/... │ Yes │ Yes │ │ dell │ built-in │ /path/to/src/... │ Yes │ Yes │ │ meta-reference-g… │ built-in │ /path/to/src/... │ Yes │ Yes │ │ nvidia │ built-in │ /path/to/src/... │ Yes │ Yes │ │ open-benchmark │ built-in │ /path/to/src/... │ Yes │ Yes │ │ postgres-demo │ built-in │ /path/to/src/... │ Yes │ Yes │ │ starter │ built-in │ /path/to/src/... │ Yes │ Yes │ │ starter-gpu │ built-in │ /path/to/src/... │ Yes │ Yes │ │ watsonx │ built-in │ /path/to/src/... │ Yes │ Yes │ └───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘ ``` After running a distribution: ```bash $ llama stack run starter # Creates ~/.llama/distributions/starter $ llama stack list ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Stack Name ┃ Source ┃ Path ┃ Build Config ┃ Run Config ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ ... │ built-in │ ... │ Yes │ Yes │ │ starter │ built │ ~/.llama/distri… │ No │ No │ │ ... │ built-in │ ... │ Yes │ Yes │ └───────────────────┴──────────┴───────────────────┴──────────────┴────────────┘ ``` Note how `starter` now shows as "built" and points to `~/.llama/distributions`, overriding the built-in version. ## Breaking Changes No breaking changes - This is a bug fix that improves user experience with minimal risk: - No programmatic parsing of output found in the codebase - Table format is clearly for human consumption - The new "Source" column helps users understand where distributions come from - The behavior change is exactly what users expect (seeing all available distributions) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-11-05 10:16:28 -08:00
Ashwin Bharambe	4d3069bfa5	chore(ci): remove unused recordings (#4074 ) Added a script to cleanup recordings. While doing this, moved the CI matrix generation to a separate script so there is a single source of truth for the matrix. Ran the cleanup script as: ``` PYTHONPATH=. python scripts/cleanup_recordings.py ``` Also added this as part of the pre-commit workflow to ensure that the recordings are always up to date and that no stale recordings are left in the repo.	2025-11-05 09:21:58 -08:00
Sébastien Han	fd1603beef	chore: remove unused classes (#4077 ) # What does this PR do? These were maybe be included in the webmethod? The unit test was pointless too since the request was never used anywhere? This shouldn't be in the API definition, if we never consume it. ## Test Plan CI with pre-commit on OpenAPI spec generation. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-05 16:45:23 +01:00
Ashwin Bharambe	392e01dc79	chore: add stainless config Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Pre-commit / pre-commit (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 1m13s Details name it to indicate it is not yet source of truth to avoid confusion	2025-11-04 15:44:07 -08:00
ehhuang	95b0493fae	chore: move src/llama_stack/ui to src/llama_stack_ui (#4068 ) # What does this PR do? This better separates UI from backend code, which was a point of confusion often for our beloved AI friends. ## Test Plan CI	2025-11-04 15:21:49 -08:00
Ashwin Bharambe	5850e3473f	fix: remove straggler openapi HTML file	2025-11-04 14:54:33 -08:00
Ashwin Bharambe	0c49a53c97	chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067 ) RAG aka file search is implemented via the Responses API by specifying the file-search tool. The backend implementation remains unchanged. This PR merely removes the directly exposed API surface which allowed users to directly perform searches from the client. This facility is now available via the `client.vector_store.search()` OpenAI compatible API.	2025-11-04 14:50:54 -08:00
Ashwin Bharambe	a8a8aa56c0	chore!: remove the agents (sessions and turns) API (#4055 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m10s Details - Removes the deprecated agents (sessions and turns) API that was marked alpha in 0.3.0 - Cleans up unused imports and orphaned types after the API removal - Removes `SessionNotFoundError` and `AgentTurnInputType` which are no longer needed The agents API is completely superseded by the Responses + Conversations APIs, and the client SDK Agent class already uses those implementations. Corresponding client-side PR: https://github.com/llamastack/llama-stack-client-python/pull/295	2025-11-04 09:38:39 -08:00
Mustafa Elbehery	a6ddbae0ed	chore(test): migrate unit tests from `unittest` to `pytest` nvidia test eval (#3249 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details API Conformance Tests / check-schema-compatibility (push) Successful in 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details UI Tests / ui-tests (22) (push) Successful in 1m16s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR migrates `unittest` to `pytest` in `tests/unit/providers/nvidia/test_eval.py`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Part of https://github.com/llamastack/llama-stack/issues/2680 Supersedes https://github.com/llamastack/llama-stack/pull/2791 Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-11-04 10:29:07 +01:00
Ashwin Bharambe	053fc0ac39	chore!: remove all deprecated routes (including /openai/v1/ ones) (#4054 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 1m13s Details This PR removes all routes which we had marked deprecated for the 0.3.0 release. This includes: - all the `/v1/openai/v1/` routes (the corresponding /v1 routes still exist of course) - the /agents API (which is superseded completely by Responses + Conversations) - several alpha routes which had a "v1" route to aide transitioning to "v1alpha" This is the corresponding client-python change: https://github.com/llamastack/llama-stack-client-python/pull/294	2025-11-03 19:00:59 -08:00
Nathan Weinberg	62b3ad349a	fix: return to hardcoded model IDs for Vertex AI (#4041 ) # What does this PR do? partial revert of `b67aef2` Vertex AI doesn't offer an endpoint for listing models from Google's Model Garden Return to hardcoded values until such an endpoint is available Closes #3988 ## Test Plan Server side, set up your Vertex AI env vars (`VERTEX_AI_PROJECT`, `VERTEX_AI_LOCATION`, and `GOOGLE_APPLICATION_CREDENTIALS`) and run the starter distribution ```bash $ llama stack list-deps starter \| xargs -L1 uv pip install $ llama stack run starter ``` Client side, formerly broken cURL requests now working ```bash $ curl http://127.0.0.1:8321/v1/models \| jq '.data \| map(select(.provider_id == "vertexai"))' [ { "identifier": "vertexai/vertex_ai/gemini-2.0-flash", "provider_resource_id": "vertex_ai/gemini-2.0-flash", "provider_id": "vertexai", "type": "model", "metadata": {}, "model_type": "llm" }, { "identifier": "vertexai/vertex_ai/gemini-2.5-flash", "provider_resource_id": "vertex_ai/gemini-2.5-flash", "provider_id": "vertexai", "type": "model", "metadata": {}, "model_type": "llm" }, { "identifier": "vertexai/vertex_ai/gemini-2.5-pro", "provider_resource_id": "vertex_ai/gemini-2.5-pro", "provider_id": "vertexai", "type": "model", "metadata": {}, "model_type": "llm" } ] $ curl -fsS http://127.0.0.1:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\": \"vertexai/vertex_a i/gemini-2.5-flash\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}], \"max_tokens\": 128, \"temperature\": 0.0}" \| jq { "id": "p8oIaYiQF8_PptQPo-GH8QQ", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": null, "message": { "content": "Hello there! How can I help you today?", "refusal": null, "role": "assistant", "annotations": null, "audio": null, "function_call": null, "tool_calls": null } } ], ... ``` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-11-03 17:38:16 -08:00
Ashwin Bharambe	cb40da210f	fix: update tests for OpenAI-style models endpoint (#4053 ) The llama-stack-client now uses /`v1/openai/v1/models` which returns OpenAI-compatible model objects with 'id' and 'custom_metadata' fields instead of the Resource-style 'identifier' field. Updated api_recorder to handle the new endpoint and modified tests to access model metadata appropriately. Deleted stale model recordings for re-recording. NOTE: CI will be red on this one since it is dependent on https://github.com/llamastack/llama-stack-client-python/pull/291/files landing. I verified locally that it is green.	2025-11-03 17:30:08 -08:00
Sébastien Han	4a5ef65286	chore!: remove SDG API (#4035 ) # What does this PR do? This API hasn't received any traction and close to zero interest from the community. Let's revisit in the future if things change. Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 16:12:06 -08:00
Ashwin Bharambe	44096512b5	feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051 ) We need to remove `/v1/openai/v1` paths shortly. There is one trouble -- our current `/v1/openai/v1/models` endpoint provides different data than `/v1/models`. Unfortunately our tests target the latter (llama-stack customized) behavior. We need to get to true OpenAI compatibility. This is step 1: adding `custom_metadata` field to `OpenAIModel` that includes all the extra stuff we add in the native `/v1/models` response. This can be extracted on the consumer end by look at `__pydantic_extra__` or other similar fields. This PR: - Adds `custom_metadata` field to `OpenAIModel` class in `src/llama_stack/apis/models/models.py` - Modified `openai_list_models()` in `src/llama_stack/core/routing_tables/models.py` to populate custom_metadata Next Steps 1. Update stainless client to use `/v1/openai/v1/models` instead of `/v1/models` 2. Migrate tests to read from `custom_metadata` 3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single `/v1/models` endpoint	2025-11-03 15:56:07 -08:00
Ashwin Bharambe	2381714904	fix: enable SQLite WAL mode to prevent database locking errors (#4048 ) Fixes race condition causing "database is locked" errors during concurrent writes to SQLite, particularly in streaming responses with guardrails where multiple inference calls write simultaneously. Enable Write-Ahead Logging (WAL) mode for SQLite which allows multiple concurrent readers and one writer without blocking. Set busy_timeout to 5s so SQLite retries instead of failing immediately. Remove the logic that disabled write queues for SQLite since WAL mode eliminates the locking issues that prompted disabling them. Fixes: test_output_safety_guardrails_safe_content[stream=True] flake	2025-11-03 15:27:41 -08:00
ehhuang	628e38b3d5	test: always start a new server in integration-tests.sh (#4050 ) # What does this PR do? This prevents interference from already running servers, and allows multiple concurrent integration test runs. Unleash the AIs! ## Test Plan start a LS server at port 8321 Then observe test uses port 8322: ❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern '(telemetry or safety)' === Llama Stack Integration Test Runner === Stack Config: server:ci-tests Setup: ollama Inference Mode: replay Test Suite: base Test Subdirs: Test Pattern: (telemetry or safety) Checking llama packages llama-stack 0.4.0.dev0 /Users/erichuang/projects/new_test_server llama-stack-client 0.3.0 ollama 0.6.0 === Applying Setup Environment Variables === Setting SQLITE_STORE_DIR: /var/folders/cz/vyh7y1d11xg881lsxsshnc5c0000gn/T/tmp.bKLsaVAxyU Setting stack config type: server Setting up environment variables: export OLLAMA_URL='http://0.0.0.0:11434' export SAFETY_MODEL='ollama/llama-guard3:1b' Will use port: 8322 === Starting Llama Stack Server === Waiting for Llama Stack Server to start on port 8322... ✅ Llama Stack Server started successfully	2025-11-03 15:23:10 -08:00
Sébastien Han	da57b51fb6	ci: introduce Mergify bot to notify on PR conflicts (#4043 ) This commit introduces Mergify, a powerful bot designed to assist with automated merging and other CI-related tasks. As an initial step, we enable a basic feature: automatically notifying users when a pull request has merge conflicts. When a conflict is detected, Mergify will add a label to the PR. This label will be removed once the conflict is resolved. This is foundation PR to activate the bot and start using it for backports too. In the future, we plan to expand Mergify’s role to include auto-merging, as discussed in #1667, once the project is ready. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-03 12:21:19 -08:00
Derek Higgins	1562277cfd	ci: test adjustments for Qwen3-0.6B (#3978 ) Without this hint Qwen3-0.6B tends to reply with the full name and sometimes doesn't reply with the correct drafted year. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 12:19:35 -08:00
Matthew Farrellee	1263448de2	fix: allowed_models config did not filter models (#4030 ) # What does this PR do? closes #4022 ## Test Plan ci w/ new tests Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 11:43:39 -08:00
Charlie Doern	30f8921240	fix: generate provider config when using --providers (#4044 ) # What does this PR do? call the sample_run_config method for providers that have it when generating a run config using `llama stack run --providers`. This will propagate API keys resolves #4032 ## Test Plan new unit test checks the output of using `--providers` to ensure `api_key` is in the config. manual testing: ``` ╰─ llama stack list-deps --providers=inference=remote::openai --format uv \| sh Using Python 3.12.11 environment at: venv Audited 7 packages in 8ms ╰─ llama stack run --providers=inference=remote::openai INFO 2025-11-03 14:33:02,094 llama_stack.cli.stack.run:161 cli: Writing generated config to: /Users/charliedoern/.llama/distributions/providers-run/run.yaml INFO 2025-11-03 14:33:02,096 llama_stack.cli.stack.run:169 cli: Using run configuration: /Users/charliedoern/.llama/distributions/providers-run/run.yaml INFO 2025-11-03 14:33:02,099 llama_stack.cli.stack.run:228 cli: HTTPS enabled with certificates: Key: None Cert: None INFO 2025-11-03 14:33:02,099 llama_stack.cli.stack.run:230 cli: Listening on 0.0.0.0:8321 INFO 2025-11-03 14:33:02,145 llama_stack.core.server.server:513 core::server: Run configuration: INFO 2025-11-03 14:33:02,146 llama_stack.core.server.server:516 core::server: apis: - inference image_name: providers-run providers: inference: - config: api_key: '********' base_url: https://api.openai.com/v1 provider_id: openai provider_type: remote::openai registered_resources: benchmarks: [] datasets: [] models: [] scoring_fns: [] shields: [] tool_groups: [] vector_stores: [] server: port: 8321 workers: 1 storage: backends: kv_default: db_path: /Users/charliedoern/.llama/distributions/providers-run/kvstore.db type: kv_sqlite sql_default: db_path: /Users/charliedoern/.llama/distributions/providers-run/sql_store.db type: sql_sqlite stores: conversations: backend: sql_default table_name: openai_conversations inference: backend: sql_default max_write_queue_size: 10000 num_writers: 4 table_name: inference_store metadata: backend: kv_default namespace: registry prompts: backend: kv_default namespace: prompts telemetry: enabled: false version: 2 INFO 2025-11-03 14:33:02,299 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues INFO 2025-11-03 14:33:05,272 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models INFO 2025-11-03 14:33:05,368 uvicorn.error:84 uncategorized: Started server process [69109] INFO 2025-11-03 14:33:05,369 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-11-03 14:33:05,370 llama_stack.core.server.server:172 core::server: Starting up Llama Stack server (version: 0.3.0) INFO 2025-11-03 14:33:05,370 llama_stack.core.stack:495 core: starting registry refresh task INFO 2025-11-03 14:33:05,370 uvicorn.error:62 uncategorized: Application startup complete. INFO 2025-11-03 14:33:05,371 uvicorn.error:216 uncategorized: Uvicorn running on http://0.0.0.0:8321 (Press CTRL+C to quit) INFO 2025-11-03 14:34:19,242 uvicorn.access:473 uncategorized: 127.0.0.1:63102 - "POST /v1/chat/completions HTTP/1.1" 200 ``` client: ``` curl http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5", "messages": [ {"role": "user", "content": "What is 1 + 2"} ] }' {"id":"...","choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"3","refusal":null,"role":"assistant","annotations":[],"audio":null,"function_call":null,"tool_calls":null}}],"created":1762198455,"model":"openai/gpt-5","object":"chat.completion","service_tier":"default","system_fingerprint":null,"usage":{"completion_tokens":10,"prompt_tokens":13,"total_tokens":23,"completion_tokens_details":{"accepted_prediction_tokens":0,"audio_tokens":0,"reasoning_tokens":0,"rejected_prediction_tokens":0},"prompt_tokens_details":{"audio_tokens":0,"cached_tokens":0}}}% ``` --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 11:37:58 -08:00
Ashwin Bharambe	415fd9e36b	chore: bump version to 0.4.0.dev0 (#4018 ) Some checks failed Test llama stack list-deps / generate-matrix (push) Successful in 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Test llama stack list-deps / show-single-provider (push) Failing after 5s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 16s Details Pre-commit / pre-commit (push) Failing after 21s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 20s Details Test Llama Stack Build / build (push) Failing after 15s Details UI Tests / ui-tests (22) (push) Successful in 1m12s Details Automated version bump after releasing 0.3.1 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-11-03 09:36:04 -08:00
Sébastien Han	d4aa348b60	chore: remove HTML generation for openapi spec (#4039 ) # What does this PR do? This seems to be an ancient artifact when we were using readthedocs? Now docusaurus read the specs directly. --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-03 18:03:40 +01:00
dependabot[bot]	7e294d33d9	chore(github-deps): bump astral-sh/setup-uv from 6.0.1 to 7.1.2 (#4023 ) Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 6.0.1 to 7.1.2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/setup-uv/releases">astral-sh/setup-uv's releases</a>.</em></p> <blockquote> <h2>v7.1.2 🌈 Speed up extraction on Windows</h2> <h2>Changes</h2> <p><a href="https://github.com/lazka"><code>@lazka</code></a> fixed a bug that caused extracting uv to take up to 30s. Thank you!</p> <h2>🐛 Bug fixes</h2> <ul> <li>Use tar for extracting the uv zip file on Windows too <a href="https://github.com/lazka"><code>@lazka</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known checksums for 0.9.5 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li> </ul> <h2>⬆️ Dependency updates</h2> <ul> <li>Bump dependencies <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li> <li>Bump github/codeql-action from 4.30.8 to 4.30.9 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li> </ul> <h2>v7.1.1 🌈 Fix empty workdir detection and lowest resolution strategy</h2> <h2>Changes</h2> <p>This release fixes a bug where the <code>working-directory</code> input was not used to detect an empty work dir. It also fixes the <code>lowest</code> resolution strategy resolving to latest when only a lower bound was specified.</p> <p>Special thanks to <a href="https://github.com/tpgillam"><code>@tpgillam</code></a> for the first contribution!</p> <h2>🐛 Bug fixes</h2> <ul> <li>Fix "lowest" resolution strategy with lower-bound only <a href="https://github.com/tpgillam"><code>@tpgillam</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li> <li>Use working-directory to detect empty workdir <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li> </ul> <h2>🧰 Maintenance</h2> <ul> <li>chore: update known checksums for 0.9.4 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li> <li>chore: update known checksums for 0.9.3 @<a href="https://github.com/apps/github-actions">github-actions[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li> </ul> <h2>📚 Documentation</h2> <ul> <li>Change version in docs to v7 <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li> </ul> <h2>⬆️ Dependency updates</h2> <ul> <li>Bump github/codeql-action from 4.30.7 to 4.30.8 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li> <li>Bump actions/setup-node from 5.0.0 to 6.0.0 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/641">#641</a>)</li> <li>Bump eifinger/actionlint-action from 1.9.1 to 1.9.2 @<a href="https://github.com/apps/dependabot">dependabot[bot]</a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/634">#634</a>)</li> <li>Update lockfile with latest npm <a href="https://github.com/eifinger"><code>@eifinger</code></a> (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/636">#636</a>)</li> </ul> <h2>v7.1.0 🌈 Support all the use cases</h2> <h2>Changes</h2> <p><strong>Support all the use cases!!!</strong></p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`85856786d1`"><code>8585678</code></a> Bump dependencies (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/664">#664</a>)</li> <li><a href="`22d500a65c`"><code>22d500a</code></a> Bump github/codeql-action from 4.30.8 to 4.30.9 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/652">#652</a>)</li> <li><a href="`14d557131d`"><code>14d5571</code></a> chore: update known checksums for 0.9.5 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/663">#663</a>)</li> <li><a href="`29cd2350cd`"><code>29cd235</code></a> Use tar for extracting the uv zip file on Windows too (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/660">#660</a>)</li> <li><a href="`2ddd2b9cb3`"><code>2ddd2b9</code></a> chore: update known checksums for 0.9.4 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/651">#651</a>)</li> <li><a href="`b7bf78939d`"><code>b7bf789</code></a> Fix "lowest" resolution strategy with lower-bound only (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/649">#649</a>)</li> <li><a href="`cb6c0a53d9`"><code>cb6c0a5</code></a> Change version in docs to v7 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/647">#647</a>)</li> <li><a href="`dffc6292f2`"><code>dffc629</code></a> Use working-directory to detect empty workdir (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/645">#645</a>)</li> <li><a href="`6e346e1653`"><code>6e346e1</code></a> chore: update known checksums for 0.9.3 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/644">#644</a>)</li> <li><a href="`3ccd0fd498`"><code>3ccd0fd</code></a> Bump github/codeql-action from 4.30.7 to 4.30.8 (<a href="https://redirect.github.com/astral-sh/setup-uv/issues/639">#639</a>)</li> <li>Additional commits viewable in <a href="https://github.com/astral-sh/setup-uv/compare/v6.0.1...85856786d1ce8acfbcc2f13a5f3fbd6b938f9f41">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=astral-sh/setup-uv&package-manager=github_actions&previous-version=6.0.1&new-version=7.1.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-11-03 13:43:04 +01:00
Sébastien Han	3dbff6bf3f	fix: help mypy & fix precommit on main (#4037 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Pre-commit / pre-commit (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details API Conformance Tests / check-schema-compatibility (push) Successful in 21s Details UI Tests / ui-tests (22) (push) Successful in 1m15s Details # What does this PR do? Add type to help mypy figure out. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-11-03 05:39:50 -05:00
Ashwin Bharambe	d45137a399	fix(ci): export UV_INDEX_STRATEGY to current shell before running uv sync (#4020 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 16s Details UI Tests / ui-tests (22) (push) Successful in 1m6s Details Fixes latent bug where UV_INDEX_STRATEGY was only exported to GITHUB_ENV but not to the current shell. While this bug doesn't currently affect main (since UV_EXTRA_INDEX_URL is only set on release branches), it's a latent bug that could cause issues if the logic changes in the future or if someone tests with UV_EXTRA_INDEX_URL set. The setup-runner action only exported UV_INDEX_STRATEGY to GITHUB_ENV (for subsequent steps), not to the current shell environment. Since uv sync runs in the same step, it would never see the variable if it were set. This fix adds `export UV_INDEX_STRATEGY=unsafe-best-match` to make the variable available in the current shell before running uv commands. Related: #4019 (same fix for release-0.3.x where the bug is actively triggered)	2025-11-01 12:57:24 -07:00
Charlie Doern	93401836b7	feat: llama stack run --providers (#3989 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Pre-commit / pre-commit (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Test Llama Stack Build / build (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 56s Details # What does this PR do? llama stack run --providers takes a list of providers in the format of api1=provider1,api2=provider2 this allows users to run with a simple list of providers. given the architecture of `create_app`, this run config needs to be written to disk. use ~/.llama/distribution/providers-run/run.yaml each time for consistency resolves #3956 ## Test Plan new unit tests to ensure --providers. Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-31 16:21:32 -07:00
Ashwin Bharambe	b2a5428a14	fix(ci): unset empty UV index env vars to prevent uv errors (#4012 ) Fixes container builds failing with UV index strategy errors when build args are passed with empty values. Docker ARGs declared with empty defaults (ARG UV_INDEX_STRATEGY="") become environment variables with empty string values in RUN commands. UV interprets these as if --index-strategy "" was passed on the command line, causing build failures with "error: a value is required for '--index-strategy <UV_INDEX_STRATEGY>'". This is a footgun because empty string ≠ unset variable, and ARGs silently propagate to all RUN commands, only failing when declared with empty defaults. The fix unsets UV_EXTRA_INDEX_URL and UV_INDEX_STRATEGY at the start of RUN blocks, saves the values early, and only restores them for editable installs with RC dependencies. All other install modes (PyPI, test-pypi, client) now run with a clean environment.	2025-10-31 13:29:14 -07:00
Ashwin Bharambe	f8fe3018af	fix(ci): use test.pypi as extra index for RC dependencies (#4009 ) Backports UV index configuration fixes from `release-0.3.x` (PR #4002). The main issue: when we created the release branch infrastructure, we configured UV to use `test.pypi` as the PRIMARY index to resolve RC dependencies. This caused UV to look for ALL packages there first, which led to problems - some packages don't have binary wheels on `test.pypi`, so UV tried building from source and failed (like the `psycopg2-binary` issue we hit). The fix is simple: use PyPI as primary (default) and `test.pypi` as an EXTRA index. UV will check PyPI first for everything, and only fall back to `test.pypi` for packages not found there (like our RC client versions). This PR includes: - Fixed `install-llama-stack-client` action to output `UV_EXTRA_INDEX_URL` instead of `UV_INDEX_URL` - New `uv-run-with-index.sh` wrapper that auto-detects release branches and sets UV env vars - Updated pre-commit hooks (`uv-lock`, codegen, etc.) to use the wrapper - Pass UV env vars as Docker build args in all locations - Scope UV env vars properly in Containerfile (inline for llama-stack install, explicitly unset before distribution deps) - Export UV env vars to `GITHUB_ENV` in setup-runner for cross-step persistence The wrapper detects release branches automatically in both CI and local environments, so this "just works" without manual configuration. On main (non-release branch), the wrapper becomes a no-op. Tested and validated on `release-0.3.x` where all CI checks pass.	2025-10-31 12:55:43 -07:00
raghotham	62603d25c2	chore(api)!: /v1/inspect only lists v1 apis by default (#3948 ) # What does this PR do? Allow filtering for v1alpha, v1beta, deprecated and v1. Backward incompatible change since by default it only returns v1 apis now. ## Test Plan added unit test	2025-10-31 11:55:46 -07:00
Ashwin Bharambe	61aab1889b	fix(ci): remove precommit trigger workflow (#4008 ) Not safe!	2025-10-31 11:41:26 -07:00
Francisco Arceo	7b79cd05d5	feat: Adding Prompts to admin UI (#3987 ) # What does this PR do? 1. Updates Llama Stack Typescript client to include `prompts`api in playground client. 2. Updates the UI to display prompts and execute basic CRUD operations for prompts. (2) adds an explicit "Preview" section when creating the prompt to show users how the Prompts API behaves as you dynamically edit the prompt content. See example here: <p align="center"><img width="468.5" height="333" alt="Screenshot 2025-10-31 at 12 22 34 PM" src="https://github.com/user-attachments/assets/3542ce7f-56fe-4fb4-b0a3-5cfba5917f6d" /></p> Some screen shots: <details><Summary>Click me to expand!</Summary> ### Prompts List with Prompts <img width="1906" height="1108" alt="Screenshot 2025-10-31 at 12 20 05 PM" src="https://github.com/user-attachments/assets/494a4748-ea6a-4527-8cfe-8959cb741c0f" /> ### Empty Prompts List <img width="1889" height="1123" alt="Screenshot 2025-10-31 at 12 08 44 PM" src="https://github.com/user-attachments/assets/ac95b807-d311-4725-86da-0258b3cce81a" /> ### Create Prompt <img width="1918" height="1167" alt="Screenshot 2025-10-31 at 11 03 29 AM" src="https://github.com/user-attachments/assets/b3100a78-f4f3-410f-af89-f7e7fe4a89e7" /> ### Submit Prompt with error <img width="1901" height="1213" alt="Screenshot 2025-10-31 at 12 09 28 PM" src="https://github.com/user-attachments/assets/dca71354-a602-449d-a0d8-0ed3d009a275" /> </details> ## Closes https://github.com/llamastack/llama-stack/issues/3322 ## Test Plan Added tests and manual testing. Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-31 11:37:25 -07:00
Ashwin Bharambe	c2fd17474e	fix: stop printing server log, it is confusing Some checks failed Pre-commit / pre-commit (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 54s Details	2025-10-31 11:22:08 -07:00
Ashwin Bharambe	5f95c1f8cc	fix(ci): install client from release branch before uv sync (#4001 ) Fixes CI failures on release branches where uv sync can't resolve RC dependencies. The problem: on release branches like `release-0.3.x`, pyproject.toml requires `llama-stack-client>=0.3.1rc1`. But RC versions only exist on test.pypi, not PyPI. So uv sync fails before we even get a chance to install the client from git. The fix is simple - on release branches, pre-install the client from the matching git branch first, then run uv sync. This satisfies the RC requirement and lets dependency resolution succeed. Modified setup-runner and pre-commit workflows to do this. Also cleaned up some duplicate logic in setup-test-environment that's now handled centrally. Example failure: `5415478835`	2025-10-31 06:16:20 -07:00
Ashwin Bharambe	6d80ca4bf7	fix(ci): replace unused LLAMA_STACK_CLIENT_DIR with direct install (#4000 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Pre-commit / pre-commit (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details Unit Tests / unit-tests (3.13) (push) Failing after 11s Details UI Tests / ui-tests (22) (push) Successful in 27s Details Replace unused `LLAMA_STACK_CLIENT_DIR` env var (from old `llama stack build`) with direct `uv pip install` for release branch client installation. cc @ehhuang	2025-10-30 22:09:25 -07:00
Jiayi Ni	fa7699d2c3	feat: Add rerank API for NVIDIA Inference Provider (#3329 ) # What does this PR do? Add rerank API for NVIDIA Inference Provider. <!-- If resolving an issue, uncomment and update the line below --> Closes #3278 ## Test Plan Unit test: ``` pytest tests/unit/providers/nvidia/test_rerank_inference.py ``` Integration test: ``` pytest -s -v tests/integration/inference/test_rerank.py --stack-config="inference=nvidia" --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3 --env NVIDIA_API_KEY="" --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" ```	2025-10-30 21:42:09 -07:00
Ashwin Bharambe	c396de57a4	ci: standardize release branch pattern to release-X.Y.x (#3999 ) Standardize CI workflows to use `release-X.Y.x` branch pattern instead of multiple numeric variants. That's the pattern we are settling on. See https://github.com/llamastack/llama-stack-ops/pull/20 for reference.	2025-10-30 21:33:32 -07:00
Doug Edgar	e8cd8508b5	fix: handle missing external_providers_dir (#3974 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 50s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes the handling of the external_providers_dir configuration field to align with its ongoing deprecation, in favor of the provider `module` specification approach. It addresses the issue in #3950, where using the default provided run.yaml config resulted in the `external_providers_dir` parameter being set to the literal string `None`, and crashing the llama-stack server when starting. <!-- If resolving an issue, uncomment and update the line below --> Closes #3950 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> - Built a new container image from `podman build . -f containers/Containerfile --build-arg DISTRO_NAME=starter --tag llama-stack:starter` - Tested it locally with `podman run -it localhost/llama-stack:starter` - Tested it on an OpenShift 4.19 cluster, deployed via the llama-stack-k8s-operator. Signed-off-by: Doug Edgar <dedgar@redhat.com>	2025-10-30 17:01:31 -07:00

1 2 3 4 5 ...

3197 commits