llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 02:03:44 +00:00

Author	SHA1	Message	Date
ehhuang	b335419faa	fix: actualize chunking strategy in vector store create API (#4086 ) # What does this PR do? - when create vector store is called without chunk strategy, we actually the strategy used so that the value is persisted instead of strategy='None' ## Test Plan updated tests	2025-11-05 15:47:54 -08:00
Roy Belio	c672a5d792	feat: ability to use postgres as store for starter distro (#4076 ) ## What does this PR do? The starter distribution now comes with all the required packages to support persistent stores—like the agent store, metadata, and inference—using PostgreSQL. Users can enable PostgreSQL support by setting the `ENABLE_POSTGRES_STORE=1` environment variable. This PR consolidates the functionality from the removed `postgres-demo` distribution into the starter distribution, reducing maintenance overhead. Closes: #2619 Supersedes: #2851 (rebased and updated) ## Changes Made 1. Added PostgreSQL support to starter distribution - New `run-with-postgres-store.yaml` configuration - Automatic config switching via `ENABLE_POSTGRES_STORE` environment variable - Removed separate `postgres-demo` distribution 2. Updated to new build system - Integrated postgres switching logic into Containerfile entrypoint - Uses new `storage_backends` and `storage_stores` API - Properly configured both PostgreSQL KV store and SQL store 3. Updated dependencies - Added `psycopg2-binary` and `asyncpg` to starter distribution - All postgres-related dependencies automatically included ## How to Use ### With Docker (PostgreSQL): ```bash docker run \ -e ENABLE_POSTGRES_STORE=1 \ -e POSTGRES_HOST=your_postgres_host \ -e POSTGRES_PORT=5432 \ -e POSTGRES_DB=llamastack \ -e POSTGRES_USER=llamastack \ -e POSTGRES_PASSWORD=llamastack \ -e OPENAI_API_KEY=your_key \ llamastack/distribution-starter ``` ### PostgreSQL environment variables: - `POSTGRES_HOST`: Postgres host (default: `localhost`) - `POSTGRES_PORT`: Postgres port (default: `5432`) - `POSTGRES_DB`: Postgres database name (default: `llamastack`) - `POSTGRES_USER`: Postgres username (default: `llamastack`) - `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`) ## Test Plan All pre-commit hooks pass (mypy, ruff, distro-codegen) `llama stack list-deps starter` confirms psycopg2-binary is included Storage configuration correctly uses PostgreSQL backends Container builds successfully with postgres support ## Credits Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to work with latest main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Sébastien Han @leseb --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-11-05 15:37:06 -08:00
Emilio Garcia	ba50790a28	feat(tests): metrics tests (#3966 ) # What does this PR do? 1. Make telemetry tests as easy as possible for users by expanding the `SpanStub` data class and creating the `MetricStub` dataclass as a way to consistently marshal telemetry data in test fixtures and unmarshal and handle it in tests. 2. Structure server and client tests to always follow the same standards for consistent testing experience by using the `SpanStub` and `MetricStub` data class objects. 3. Enable Metrics Testing for completions endpoint 4. Correct token metrics to use histograms instead of counts to capture tokens per request rather than a cumulative count of tokens over the lifecycle of the server. ## Test Plan These are tests	2025-11-05 10:26:15 -08:00
Ashwin Bharambe	0c49a53c97	chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067 ) RAG aka file search is implemented via the Responses API by specifying the file-search tool. The backend implementation remains unchanged. This PR merely removes the directly exposed API surface which allowed users to directly perform searches from the client. This facility is now available via the `client.vector_store.search()` OpenAI compatible API.	2025-11-04 14:50:54 -08:00
Ashwin Bharambe	cb40da210f	fix: update tests for OpenAI-style models endpoint (#4053 ) The llama-stack-client now uses /`v1/openai/v1/models` which returns OpenAI-compatible model objects with 'id' and 'custom_metadata' fields instead of the Resource-style 'identifier' field. Updated api_recorder to handle the new endpoint and modified tests to access model metadata appropriately. Deleted stale model recordings for re-recording. NOTE: CI will be red on this one since it is dependent on https://github.com/llamastack/llama-stack-client-python/pull/291/files landing. I verified locally that it is green.	2025-11-03 17:30:08 -08:00
Sébastien Han	4a5ef65286	chore!: remove SDG API (#4035 ) # What does this PR do? This API hasn't received any traction and close to zero interest from the community. Let's revisit in the future if things change. Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-03 16:12:06 -08:00
Ashwin Bharambe	44096512b5	feat: add custom_metadata to OpenAIModel to unify /v1/models with /v1/openai/v1/models (#4051 ) We need to remove `/v1/openai/v1` paths shortly. There is one trouble -- our current `/v1/openai/v1/models` endpoint provides different data than `/v1/models`. Unfortunately our tests target the latter (llama-stack customized) behavior. We need to get to true OpenAI compatibility. This is step 1: adding `custom_metadata` field to `OpenAIModel` that includes all the extra stuff we add in the native `/v1/models` response. This can be extracted on the consumer end by look at `__pydantic_extra__` or other similar fields. This PR: - Adds `custom_metadata` field to `OpenAIModel` class in `src/llama_stack/apis/models/models.py` - Modified `openai_list_models()` in `src/llama_stack/core/routing_tables/models.py` to populate custom_metadata Next Steps 1. Update stainless client to use `/v1/openai/v1/models` instead of `/v1/models` 2. Migrate tests to read from `custom_metadata` 3. Remove `/v1/openai/v1/` prefix entirely and consolidate to single `/v1/models` endpoint	2025-11-03 15:56:07 -08:00
raghotham	62603d25c2	chore(api)!: /v1/inspect only lists v1 apis by default (#3948 ) # What does this PR do? Allow filtering for v1alpha, v1beta, deprecated and v1. Backward incompatible change since by default it only returns v1 apis now. ## Test Plan added unit test	2025-10-31 11:55:46 -07:00
Doug Edgar	e8cd8508b5	fix: handle missing external_providers_dir (#3974 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 13s Details UI Tests / ui-tests (22) (push) Successful in 50s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes the handling of the external_providers_dir configuration field to align with its ongoing deprecation, in favor of the provider `module` specification approach. It addresses the issue in #3950, where using the default provided run.yaml config resulted in the `external_providers_dir` parameter being set to the literal string `None`, and crashing the llama-stack server when starting. <!-- If resolving an issue, uncomment and update the line below --> Closes #3950 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> - Built a new container image from `podman build . -f containers/Containerfile --build-arg DISTRO_NAME=starter --tag llama-stack:starter` - Tested it locally with `podman run -it localhost/llama-stack:starter` - Tested it on an OpenShift 4.19 cluster, deployed via the llama-stack-k8s-operator. Signed-off-by: Doug Edgar <dedgar@redhat.com>	2025-10-30 17:01:31 -07:00
ehhuang	0e384a55a1	feat: support `workers` in run config (#3992 ) # What does this PR do? ## Test Plan Set workers: 4 in run.yaml. Start server and observe logs multiple times.	2025-10-30 16:34:12 -07:00
Sébastien Han	b4ea05ada9	chore: add batches to openapi schema (#3980 ) # What does this PR do? While working on https://github.com/llamastack/llama-stack/pull/3944 I realized that the batches API wasn't generated. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-30 07:08:35 -07:00
Ashwin Bharambe	da8f014b96	feat(models): list models available via provider_data header (#3968 ) ## Summary When users provide API keys via `X-LlamaStack-Provider-Data` header, `models.list()` now returns models they can access from those providers, not just pre-registered models from the registry. This complements the routing fix from `f88416ef8` which enabled inference calls with `provider_id/model_id` format for unregistered models. Users can now discover which models are available to them before making inference requests. The implementation reuses `NeedsRequestProviderData.get_request_provider_data()` to validate credentials, then dynamically fetches models from providers without caching them since they're user-specific. Registry models take precedence to respect any pre-configured aliases. ## Test Script ```python #!/usr/bin/env python3 import json import os from openai import OpenAI # Test 1: Without provider_data header client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="dummy") models = client.models.list() anthropic_without = [m.id for m in models.data if m.id and "anthropic" in m.id] print(f"Without header: {len(models.data)} models, {len(anthropic_without)} anthropic") # Test 2: With provider_data header containing Anthropic API key anthropic_api_key = os.environ["ANTHROPIC_API_KEY"] client_with_key = OpenAI( base_url="http://localhost:8321/v1/openai/v1", api_key="dummy", default_headers={ "X-LlamaStack-Provider-Data": json.dumps({"anthropic_api_key": anthropic_api_key}) } ) models_with_key = client_with_key.models.list() anthropic_with = [m.id for m in models_with_key.data if m.id and "anthropic" in m.id] print(f"With header: {len(models_with_key.data)} models, {len(anthropic_with)} anthropic") print(f"Anthropic models: {anthropic_with}") assert len(anthropic_with) > len(anthropic_without), "Should have more anthropic models with API key" print("\n✓ Test passed!") ``` Run with a stack that has Anthropic provider configured (but without API key in config): ```bash ANTHROPIC_API_KEY=sk-ant-... python test_provider_data_models.py ```	2025-10-29 14:03:03 -07:00
Derek Higgins	c678682cdd	chore: remove unused methods from InferenceRouter (#3953 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 24s Details Test llama stack list-deps / generate-matrix (push) Successful in 25s Details Python Package Build Test / build (3.13) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (push) Failing after 32s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 40s Details UI Tests / ui-tests (22) (push) Successful in 59s Details Test Llama Stack Build / build (push) Failing after 1m1s Details Pre-commit / pre-commit (push) Successful in 5m23s Details Remove unused methods that became obsolete after `d266c59c`: o _compute_and_log_token_usage o _count_tokens o stream_tokens_and_compute_metrics o count_tokens_and_compute_metrics These methods are no longer referenced anywhere in the codebase following the removal of deprecated inference.chat_completion implementations. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-28 17:12:41 -07:00
Ashwin Bharambe	f88416ef87	fix(inference): enable routing of models with provider_data alone (#3928 ) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. ## Test Plan Added an integration test Closes #3929 --------- Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>	2025-10-28 11:16:37 -07:00
Ashwin Bharambe	d009dc29f7	fix(mypy): resolve provider utility and testing type issues (#3935 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 51s Details Pre-commit / pre-commit (push) Successful in 2m0s Details Fixes mypy type errors in provider utilities and testing infrastructure: - `mcp.py`: Cast incompatible client types, wrap image data properly - `batches.py`: Rename walrus variable to avoid shadowing - `api_recorder.py`: Use cast for Pydantic field annotation No functional changes. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:37:27 -07:00
Ashwin Bharambe	6ce59b5df8	fix(mypy): resolve type issues in MongoDB, batches, and auth providers (#3933 ) Fixes mypy type errors in provider utilities: - MongoDB: Fix AsyncMongoClient parameters, use async iteration for cursor - Batches: Handle memoryview\|bytes union for file decoding - Auth: Add missing imports, validate JWKS URI, conditionally pass parameters Fixes 11 type errors. No functional changes.	2025-10-28 10:23:39 -07:00
Ashwin Bharambe	4a2ea278c5	fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3943 ) Fixes mypy type errors in OpenTelemetry integration: - Add type aliases for AttributeValue and Attributes - Add helper to filter None values from attributes (OpenTelemetry doesn't accept None) - Cast metric and tracer objects to proper types - Update imports after refactoring No functional changes.	2025-10-28 10:10:18 -07:00
Ashwin Bharambe	85887d724f	Revert "fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931 )" This reverts commit `9afc52a36a`.	2025-10-28 09:48:46 -07:00
Ashwin Bharambe	9afc52a36a	fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931 ) ## Summary Fix all 11 mypy type checking errors in `telemetry.py` without using any type suppressions. Changes: - Add type aliases for OpenTelemetry attribute types (`AttributeValue`, `Attributes`) - Create `_clean_attributes()` helper to filter None values from attribute dicts - Use `cast()` for TracerProvider methods (`add_span_processor`, `force_flush`) - Use `cast()` for metric creation methods returning from global storage - Fix variable reuse by renaming `span` to `end_span` in SpanEndPayload branch - Add None check for `parent_span` before `set_span_in_context` Errors Fixed: - TracerProvider attribute access: 2 errors - Counter/UpDownCounter/ObservableGauge return types: 3 errors - Attribute dict type mismatches: 4 errors - Span assignment type conflicts: 2 errors Testing: ```bash uv run mypy src/llama_stack/core/telemetry/telemetry.py # Success: no issues found ``` Part of: Mypy suppression removal plan (Phase 2a/4) Stack: - [Phase 1] Add type stubs (#3930) - [Phase 2a] Fix OpenTelemetry types (this PR) - [Phase 2b+] Fix remaining errors (upcoming) - [Phase 3] Remove inline suppressions (upcoming) - [Phase 4] Un-exclude files from mypy (upcoming)	2025-10-28 09:47:20 -07:00
Ashwin Bharambe	4e6c769cc4	fix(context): prevent provider data leak between streaming requests (#3924 ) ## Summary - `preserve_contexts_async_generator` left `PROVIDER_DATA_VAR` (and other context vars) populated after a streaming generator completed on HEAD~1, so the asyncio context for request N+1 started with request N's provider payload. - FastAPI dependencies and middleware execute before `request_provider_data_context` rebinds the header data, meaning auth/logging hooks could observe a prior tenant's credentials or treat them as authenticated. Traces and any background work that inspects the context outside the `with` block leak as well—this is a real security regression, not just a CLI artifact. - The wrapper now restores each tracked `ContextVar` to the value it held before the iteration (falling back to clearing when necessary) after every yield and when the generator terminates, so provider data is wiped while callers that set their own defaults keep them. ## Test Plan - `uv run pytest tests/unit/core/test_provider_data_context.py -q` - `uv run pytest tests/unit/distribution/test_context.py -q` Both suites fail on HEAD~1 and pass with this change.	2025-10-27 23:01:12 -07:00
ehhuang	c077d01ddf	chore(telemetry): more cleanup: remove apis.telemetry (#3919 ) # What does this PR do? ## Test Plan CI	2025-10-27 22:20:15 -07:00
ehhuang	b7dd3f5c56	chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923 ) # What does this PR do? ## Test Plan CI vector_io tests will fail until next client sync passed with https://github.com/llamastack/llama-stack-client-python/pull/286 checked out locally	2025-10-27 14:26:06 -07:00
Nathan Weinberg	b6954c9882	fix: add missing shutdown methods to PromptServiceImpl and ConversationServiceImpl (#3925 ) Change is visible in server shutdown logs, changes `WARNING` loglines to `INFO` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-27 13:41:38 -07:00
Ashwin Bharambe	471b1b248b	chore(package): migrate to src/ layout (#3920 ) Migrates package structure to src/ layout following Python packaging best practices. All code moved from `llama_stack/` to `src/llama_stack/`. Public API unchanged - imports remain `import llama_stack.`. Updated build configs, pre-commit hooks, scripts, and GitHub workflows accordingly. All hooks pass, package builds cleanly. Developer note*: Reinstall after pulling: `pip install -e .`	2025-10-27 12:02:21 -07:00

24 commits