llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	da8f014b96	feat(models): list models available via provider_data header (#3968 ) ## Summary When users provide API keys via `X-LlamaStack-Provider-Data` header, `models.list()` now returns models they can access from those providers, not just pre-registered models from the registry. This complements the routing fix from `f88416ef8` which enabled inference calls with `provider_id/model_id` format for unregistered models. Users can now discover which models are available to them before making inference requests. The implementation reuses `NeedsRequestProviderData.get_request_provider_data()` to validate credentials, then dynamically fetches models from providers without caching them since they're user-specific. Registry models take precedence to respect any pre-configured aliases. ## Test Script ```python #!/usr/bin/env python3 import json import os from openai import OpenAI # Test 1: Without provider_data header client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="dummy") models = client.models.list() anthropic_without = [m.id for m in models.data if m.id and "anthropic" in m.id] print(f"Without header: {len(models.data)} models, {len(anthropic_without)} anthropic") # Test 2: With provider_data header containing Anthropic API key anthropic_api_key = os.environ["ANTHROPIC_API_KEY"] client_with_key = OpenAI( base_url="http://localhost:8321/v1/openai/v1", api_key="dummy", default_headers={ "X-LlamaStack-Provider-Data": json.dumps({"anthropic_api_key": anthropic_api_key}) } ) models_with_key = client_with_key.models.list() anthropic_with = [m.id for m in models_with_key.data if m.id and "anthropic" in m.id] print(f"With header: {len(models_with_key.data)} models, {len(anthropic_with)} anthropic") print(f"Anthropic models: {anthropic_with}") assert len(anthropic_with) > len(anthropic_without), "Should have more anthropic models with API key" print("\n✓ Test passed!") ``` Run with a stack that has Anthropic provider configured (but without API key in config): ```bash ANTHROPIC_API_KEY=sk-ant-... python test_provider_data_models.py ```	2025-10-29 14:03:03 -07:00
Ashwin Bharambe	c9d4b6c54f	chore(mypy): part-04 resolve mypy errors in meta_reference agents (#3969 ) ## Summary Fixes all mypy type errors in `providers/inline/agents/meta_reference/` and removes exclusions from pyproject.toml. ## Changes - Fix type annotations for Safety API message parameters (OpenAIMessageParam) - Add Action enum usage in access control checks - Correct method signatures to match API supertype (parameter ordering) - Handle optional return types with proper None checks - Remove 3 meta_reference exclusions from mypy config Files fixed: 25 errors across 3 files (safety.py, persistence.py, agents.py)	2025-10-29 13:37:28 -07:00
Ashwin Bharambe	a4f97559d1	fix(mypy): part-03 completely resolve meta reference responses impl typing issues (#3951 ) ## Summary Resolves all mypy errors in meta reference agent OpenAI responses implementation by adding proper type narrowing, None checks, and Sequence type support. ## Changes - Fixed streaming.py, openai_responses.py, utils.py, tool_executor.py, agent_instance.py - Added Sequence type support to schema generator (ensures correct JSON schema generation) - Applied union type narrowing and None checks throughout ## Test plan - All modified files pass mypy type checking (0 errors) - Schema generator produces correct `type: array` for Sequence types --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-29 08:07:15 -07:00
Ashwin Bharambe	e5c27dbcbf	fix(mypy): part-02 resolve OpenAI compatibility layer type issues (#3947 ) ## Summary Fixes 111 mypy type errors in OpenAI compatibility layer (PR3 in mypy remediation series). Changes: - `litellm_openai_mixin.py`: Added type annotations, None checks for tool_config/model_store access - `openai_compat.py`: Added None checks throughout, fixed TypedDict expansions, proper type conversions for messages/tool_calls Result: 23 → 1 errors in litellm file, 88 → 0 errors in openai_compat file --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-29 08:06:40 -07:00
Ashwin Bharambe	ce31aa1704	fix(mypy-cleanup): part-01 resolve meta reference agent type issues (126 errors) (#3945 ) Error fixes in Agents implementation (`meta-reference` provider) -- adding proper type annotations and using type narrowing for optional attributes. Essentially a bunch of `if x and x_foo := getattr(x, "foo")` instead of `x.foo` directly Part of ongoing mypy remediation effort. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-29 07:54:30 -07:00
Derek Higgins	c678682cdd	chore: remove unused methods from InferenceRouter (#3953 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 24s Details Test llama stack list-deps / generate-matrix (push) Successful in 25s Details Python Package Build Test / build (3.13) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (push) Failing after 32s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 40s Details UI Tests / ui-tests (22) (push) Successful in 59s Details Test Llama Stack Build / build (push) Failing after 1m1s Details Pre-commit / pre-commit (push) Successful in 5m23s Details Remove unused methods that became obsolete after `d266c59c`: o _compute_and_log_token_usage o _count_tokens o stream_tokens_and_compute_metrics o count_tokens_and_compute_metrics These methods are no longer referenced anywhere in the codebase following the removal of deprecated inference.chat_completion implementations. --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-28 17:12:41 -07:00
ehhuang	1f9d48cd54	feat: openai files provider (#3946 ) # What does this PR do? - Adds OpenAI files provider - Note that file content retrieval is pretty limited by `purpose` https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com ## Test Plan Modify run yaml to use openai files provider: ``` files: - provider_id: openai provider_type: remote::openai config: api_key: ${env.OPENAI_API_KEY:=} metadata_store: backend: sql_default table_name: openai_files_metadata # Then run files tests ❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files ```	2025-10-28 16:25:03 -07:00
raghotham	feabcdd67b	docs: add documentation on how to use custom run yaml in docker (#3949 ) as title test plan: ```yaml # custom-ollama-run.yaml version: 2 image_name: starter external_providers_dir: /.llama/providers.d apis: - inference - vector_io - files - safety - tool_runtime - agents providers: inference: # Single Ollama provider for all models - provider_id: ollama provider_type: remote::ollama config: url: ${env.OLLAMA_URL:=http://localhost:11434} vector_io: - provider_id: faiss provider_type: inline::faiss config: persistence: namespace: vector_io::faiss backend: kv_default files: - provider_id: meta-reference-files provider_type: inline::localfs config: storage_dir: /.llama/files metadata_store: table_name: files_metadata backend: sql_default safety: - provider_id: llama-guard provider_type: inline::llama-guard config: excluded_categories: [] tool_runtime: - provider_id: rag-runtime provider_type: inline::rag-runtime agents: - provider_id: meta-reference provider_type: inline::meta-reference config: persistence: agent_state: namespace: agents backend: kv_default responses: table_name: responses backend: sql_default max_write_queue_size: 10000 num_writers: 4 storage: backends: kv_default: type: kv_sqlite db_path: /.llama/kvstore.db sql_default: type: sql_sqlite db_path: /.llama/sql_store.db stores: metadata: namespace: registry backend: kv_default inference: table_name: inference_store backend: sql_default max_write_queue_size: 10000 num_writers: 4 conversations: table_name: openai_conversations backend: sql_default registered_resources: models: # All models use the same 'ollama' provider - model_id: llama3.2-vision:latest provider_id: ollama provider_model_id: llama3.2-vision:latest model_type: llm - model_id: llama3.2:3b provider_id: ollama provider_model_id: llama3.2:3b model_type: llm # Embedding models - model_id: nomic-embed-text-v2-moe provider_id: ollama provider_model_id: toshk0/nomic-embed-text-v2-moe:Q6_K model_type: embedding metadata: embedding_dimension: 768 shields: [] vector_dbs: [] datasets: [] scoring_fns: [] benchmarks: [] tool_groups: [] server: port: 8321 telemetry: enabled: true vector_stores: default_provider_id: faiss default_embedding_model: provider_id: ollama model_id: toshk0/nomic-embed-text-v2-moe:Q6_K ``` ```bash docker run -it --pull always -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT -v ~/.llama:/root/.llama -v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml -e RUN_CONFIG_PATH=/app/custom-run.yaml -e OLLAMA_URL=http://host.docker.internal:11434/ llamastack/distribution-starter:0.3.0 --port $LLAMA_STACK_PORT ```	2025-10-28 16:05:44 -07:00
Ashwin Bharambe	f88416ef87	fix(inference): enable routing of models with provider_data alone (#3928 ) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. ## Test Plan Added an integration test Closes #3929 --------- Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>	2025-10-28 11:16:37 -07:00
Ashwin Bharambe	94b0592240	fix(mypy): add type stubs and fix typing issues (#3938 ) Adds type stubs and fixes mypy errors for better type coverage. Changes: - Added type_checking dependency group with type stubs (torchtune, trl, etc.) - Added lm-format-enforcer to pre-commit hook - Created HFAutoModel Protocol for type-safe HuggingFace model handling - Added mypy.overrides for untyped libraries (torchtune, fairscale, etc.) - Fixed type issues in post-training providers, databricks, and api_recorder Note: ~1,200 errors remain in excluded files (see pyproject.toml exclude list). --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 11:00:09 -07:00
Ashwin Bharambe	1d385b5b75	fix(mypy): resolve OpenAI SDK and provider type issues (#3936 ) ## Summary - Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls - Fix incorrect OpenAIChatCompletionChunk import in vllm provider - Refactor to avoid type:ignore comments by using conditional kwargs ## Changes openai_mixin.py (9 errors fixed): - Build kwargs conditionally for embeddings.create() to avoid NotGiven/Omit mismatch - Only include parameters when they have actual values (not None) gemini.py (9 errors fixed): - Apply same conditional kwargs pattern - Add missing Any import vllm.py (2 errors fixed): - Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference - Remove incorrect alias from openai package ## Technical Notes The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type `NotGiven` but parameter signatures expect `Omit`. By only passing parameters with actual values, we avoid this mismatch entirely without needing `# type: ignore` comments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:54:29 -07:00
Ashwin Bharambe	d009dc29f7	fix(mypy): resolve provider utility and testing type issues (#3935 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 51s Details Pre-commit / pre-commit (push) Successful in 2m0s Details Fixes mypy type errors in provider utilities and testing infrastructure: - `mcp.py`: Cast incompatible client types, wrap image data properly - `batches.py`: Rename walrus variable to avoid shadowing - `api_recorder.py`: Use cast for Pydantic field annotation No functional changes. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:37:27 -07:00
Ashwin Bharambe	fcf07790c8	fix(mypy): resolve model implementation typing issues (#3934 ) ## Summary Fixes mypy type errors across 4 model implementation files (Phase 2d of mypy suppression removal plan): - `src/llama_stack/models/llama/llama3/multimodal/image_transform.py` (10 errors fixed) - `src/llama_stack/models/llama/checkpoint.py` (2 errors fixed) - `src/llama_stack/models/llama/hadamard_utils.py` (1 error fixed) - `src/llama_stack/models/llama/llama3/multimodal/encoder_utils.py` (1 error fixed) ## Changes ### image_transform.py - Fixed return type annotation for `find_supported_resolutions` from `Tensor` to `list[tuple[int, int]]` - Fixed parameter and return type annotations for `resize_without_distortion` from `Tensor` to `Image.Image` - Resolved variable shadowing by using separate names: `possible_resolutions_list` for the list and `possible_resolutions_tensor` for the tensor ### checkpoint.py - Replaced deprecated `torch.BFloat16Tensor` and `torch.cuda.BFloat16Tensor` with `torch.set_default_dtype(torch.bfloat16)` - Fixed variable shadowing by renaming numpy array to `ckpt_paths_array` to distinguish from the parameter `ckpt_paths: list[Path]` ### hadamard_utils.py - Added `isinstance` assertion to narrow type from `nn.Module` to `nn.Linear` before accessing `in_features` attribute ### encoder_utils.py - Fixed variable shadowing by using `masks_list` for list accumulation and `masks` for the final Tensor result ## Test plan - Verified all files pass mypy type checking (only optional dependency import warnings remain) - No functional changes - only type annotations and variable naming improvements Stacks on PR #3933 Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:28:29 -07:00
Ashwin Bharambe	6ce59b5df8	fix(mypy): resolve type issues in MongoDB, batches, and auth providers (#3933 ) Fixes mypy type errors in provider utilities: - MongoDB: Fix AsyncMongoClient parameters, use async iteration for cursor - Batches: Handle memoryview\|bytes union for file decoding - Auth: Add missing imports, validate JWKS URI, conditionally pass parameters Fixes 11 type errors. No functional changes.	2025-10-28 10:23:39 -07:00
Ashwin Bharambe	4a2ea278c5	fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3943 ) Fixes mypy type errors in OpenTelemetry integration: - Add type aliases for AttributeValue and Attributes - Add helper to filter None values from attributes (OpenTelemetry doesn't accept None) - Cast metric and tracer objects to proper types - Update imports after refactoring No functional changes.	2025-10-28 10:10:18 -07:00
Ashwin Bharambe	85887d724f	Revert "fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931 )" This reverts commit `9afc52a36a`.	2025-10-28 09:48:46 -07:00
Ashwin Bharambe	9afc52a36a	fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931 ) ## Summary Fix all 11 mypy type checking errors in `telemetry.py` without using any type suppressions. Changes: - Add type aliases for OpenTelemetry attribute types (`AttributeValue`, `Attributes`) - Create `_clean_attributes()` helper to filter None values from attribute dicts - Use `cast()` for TracerProvider methods (`add_span_processor`, `force_flush`) - Use `cast()` for metric creation methods returning from global storage - Fix variable reuse by renaming `span` to `end_span` in SpanEndPayload branch - Add None check for `parent_span` before `set_span_in_context` Errors Fixed: - TracerProvider attribute access: 2 errors - Counter/UpDownCounter/ObservableGauge return types: 3 errors - Attribute dict type mismatches: 4 errors - Span assignment type conflicts: 2 errors Testing: ```bash uv run mypy src/llama_stack/core/telemetry/telemetry.py # Success: no issues found ``` Part of: Mypy suppression removal plan (Phase 2a/4) Stack: - [Phase 1] Add type stubs (#3930) - [Phase 2a] Fix OpenTelemetry types (this PR) - [Phase 2b+] Fix remaining errors (upcoming) - [Phase 3] Remove inline suppressions (upcoming) - [Phase 4] Un-exclude files from mypy (upcoming)	2025-10-28 09:47:20 -07:00
Ian Miller	5598f61e12	feat(responses)!: introduce OpenAI compatible prompts to Responses API (#3942 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for making changes to Responses API scheme to introduce OpenAI compatible prompts there. Change to the API only, therefore currently no implementation at all. However, the follow up PR with actual implementation will be submitted after current PR lands. The need of this functionality was initiated in #3514. > Note, #3514 is divided on three separate PRs. Current PR is the second of three. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> CI	2025-10-28 09:31:27 -07:00
Sébastien Han	d10bfb5121	chore: remove leftover llama_stack directory (#3940 ) # What does this PR do? Followup on https://github.com/llamastack/llama-stack/pull/3920 where the llama_stack directory was moved under src. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-28 05:09:08 -07:00
Ashwin Bharambe	4e6c769cc4	fix(context): prevent provider data leak between streaming requests (#3924 ) ## Summary - `preserve_contexts_async_generator` left `PROVIDER_DATA_VAR` (and other context vars) populated after a streaming generator completed on HEAD~1, so the asyncio context for request N+1 started with request N's provider payload. - FastAPI dependencies and middleware execute before `request_provider_data_context` rebinds the header data, meaning auth/logging hooks could observe a prior tenant's credentials or treat them as authenticated. Traces and any background work that inspects the context outside the `with` block leak as well—this is a real security regression, not just a CLI artifact. - The wrapper now restores each tracked `ContextVar` to the value it held before the iteration (falling back to clearing when necessary) after every yield and when the generator terminates, so provider data is wiped while callers that set their own defaults keep them. ## Test Plan - `uv run pytest tests/unit/core/test_provider_data_context.py -q` - `uv run pytest tests/unit/distribution/test_context.py -q` Both suites fail on HEAD~1 and pass with this change.	2025-10-27 23:01:12 -07:00
ehhuang	c077d01ddf	chore(telemetry): more cleanup: remove apis.telemetry (#3919 ) # What does this PR do? ## Test Plan CI	2025-10-27 22:20:15 -07:00
ehhuang	b7dd3f5c56	chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923 ) # What does this PR do? ## Test Plan CI vector_io tests will fail until next client sync passed with https://github.com/llamastack/llama-stack-client-python/pull/286 checked out locally	2025-10-27 14:26:06 -07:00
Nathan Weinberg	b6954c9882	fix: add missing shutdown methods to PromptServiceImpl and ConversationServiceImpl (#3925 ) Change is visible in server shutdown logs, changes `WARNING` loglines to `INFO` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-27 13:41:38 -07:00
Matthew Farrellee	a9b00db421	feat: add provider data keys for Cerebras, Databricks, NVIDIA, and RunPod (#3734 ) # What does this PR do? add provider-data key passing support to Cerebras, Databricks, NVIDIA and RunPod also, added missing tests for Fireworks, Anthropic, Gemini, SambaNova, and vLLM addresses #3517 ## Test Plan ci w/ new tests --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-27 13:09:35 -07:00
Ashwin Bharambe	471b1b248b	chore(package): migrate to src/ layout (#3920 ) Migrates package structure to src/ layout following Python packaging best practices. All code moved from `llama_stack/` to `src/llama_stack/`. Public API unchanged - imports remain `import llama_stack.`. Updated build configs, pre-commit hooks, scripts, and GitHub workflows accordingly. All hooks pass, package builds cleanly. Developer note*: Reinstall after pulling: `pip install -e .`	2025-10-27 12:02:21 -07:00

25 commits