llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Eric Huang	889ce058b9	feat: openai files provider # What does this PR do? ## Test Plan	2025-10-28 12:46:45 -07:00
ehhuang	0745d308a8	Merge `a7b506718d` into sapling-pr-archive-ehhuang	2025-10-28 12:43:55 -07:00
Eric Huang	a7b506718d	feat: openai files provider # What does this PR do? ## Test Plan	2025-10-28 12:38:18 -07:00
Eric Huang	6b386db8fc	merge commit for archive created by Sapling	2025-10-28 12:20:02 -07:00
Eric Huang	177958657f	tele tests # What does this PR do? ## Test Plan	2025-10-28 12:19:50 -07:00
Eric Huang	2e468e0d1f	merge commit for archive created by Sapling	2025-10-28 12:08:33 -07:00
Eric Huang	b3c9d5a15f	tele tests # What does this PR do? ## Test Plan	2025-10-28 12:08:27 -07:00
Ashwin Bharambe	f88416ef87	fix(inference): enable routing of models with provider_data alone (#3928 ) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. ## Test Plan Added an integration test Closes #3929 --------- Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>	2025-10-28 11:16:37 -07:00
Eric Huang	ad1ca3b2c2	merge commit for archive created by Sapling	2025-10-28 11:04:10 -07:00
Eric Huang	1407edbf52	tele tests # What does this PR do? ## Test Plan	2025-10-28 11:04:02 -07:00
Ashwin Bharambe	94b0592240	fix(mypy): add type stubs and fix typing issues (#3938 ) Adds type stubs and fixes mypy errors for better type coverage. Changes: - Added type_checking dependency group with type stubs (torchtune, trl, etc.) - Added lm-format-enforcer to pre-commit hook - Created HFAutoModel Protocol for type-safe HuggingFace model handling - Added mypy.overrides for untyped libraries (torchtune, fairscale, etc.) - Fixed type issues in post-training providers, databricks, and api_recorder Note: ~1,200 errors remain in excluded files (see pyproject.toml exclude list). --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 11:00:09 -07:00
Ashwin Bharambe	1d385b5b75	fix(mypy): resolve OpenAI SDK and provider type issues (#3936 ) ## Summary - Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls - Fix incorrect OpenAIChatCompletionChunk import in vllm provider - Refactor to avoid type:ignore comments by using conditional kwargs ## Changes openai_mixin.py (9 errors fixed): - Build kwargs conditionally for embeddings.create() to avoid NotGiven/Omit mismatch - Only include parameters when they have actual values (not None) gemini.py (9 errors fixed): - Apply same conditional kwargs pattern - Add missing Any import vllm.py (2 errors fixed): - Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference - Remove incorrect alias from openai package ## Technical Notes The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type `NotGiven` but parameter signatures expect `Omit`. By only passing parameters with actual values, we avoid this mismatch entirely without needing `# type: ignore` comments. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:54:29 -07:00
Ashwin Bharambe	d009dc29f7	fix(mypy): resolve provider utility and testing type issues (#3935 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test llama stack list-deps / generate-matrix (push) Successful in 4s Details Test llama stack list-deps / show-single-provider (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 51s Details Pre-commit / pre-commit (push) Successful in 2m0s Details Fixes mypy type errors in provider utilities and testing infrastructure: - `mcp.py`: Cast incompatible client types, wrap image data properly - `batches.py`: Rename walrus variable to avoid shadowing - `api_recorder.py`: Use cast for Pydantic field annotation No functional changes. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:37:27 -07:00
Ashwin Bharambe	fcf07790c8	fix(mypy): resolve model implementation typing issues (#3934 ) ## Summary Fixes mypy type errors across 4 model implementation files (Phase 2d of mypy suppression removal plan): - `src/llama_stack/models/llama/llama3/multimodal/image_transform.py` (10 errors fixed) - `src/llama_stack/models/llama/checkpoint.py` (2 errors fixed) - `src/llama_stack/models/llama/hadamard_utils.py` (1 error fixed) - `src/llama_stack/models/llama/llama3/multimodal/encoder_utils.py` (1 error fixed) ## Changes ### image_transform.py - Fixed return type annotation for `find_supported_resolutions` from `Tensor` to `list[tuple[int, int]]` - Fixed parameter and return type annotations for `resize_without_distortion` from `Tensor` to `Image.Image` - Resolved variable shadowing by using separate names: `possible_resolutions_list` for the list and `possible_resolutions_tensor` for the tensor ### checkpoint.py - Replaced deprecated `torch.BFloat16Tensor` and `torch.cuda.BFloat16Tensor` with `torch.set_default_dtype(torch.bfloat16)` - Fixed variable shadowing by renaming numpy array to `ckpt_paths_array` to distinguish from the parameter `ckpt_paths: list[Path]` ### hadamard_utils.py - Added `isinstance` assertion to narrow type from `nn.Module` to `nn.Linear` before accessing `in_features` attribute ### encoder_utils.py - Fixed variable shadowing by using `masks_list` for list accumulation and `masks` for the final Tensor result ## Test plan - Verified all files pass mypy type checking (only optional dependency import warnings remain) - No functional changes - only type annotations and variable naming improvements Stacks on PR #3933 Co-authored-by: Claude <noreply@anthropic.com>	2025-10-28 10:28:29 -07:00
Ashwin Bharambe	6ce59b5df8	fix(mypy): resolve type issues in MongoDB, batches, and auth providers (#3933 ) Fixes mypy type errors in provider utilities: - MongoDB: Fix AsyncMongoClient parameters, use async iteration for cursor - Batches: Handle memoryview\|bytes union for file decoding - Auth: Add missing imports, validate JWKS URI, conditionally pass parameters Fixes 11 type errors. No functional changes.	2025-10-28 10:23:39 -07:00
Ashwin Bharambe	4a2ea278c5	fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3943 ) Fixes mypy type errors in OpenTelemetry integration: - Add type aliases for AttributeValue and Attributes - Add helper to filter None values from attributes (OpenTelemetry doesn't accept None) - Cast metric and tracer objects to proper types - Update imports after refactoring No functional changes.	2025-10-28 10:10:18 -07:00
Ashwin Bharambe	85887d724f	Revert "fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931 )" This reverts commit `9afc52a36a`.	2025-10-28 09:48:46 -07:00
Ashwin Bharambe	9afc52a36a	fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931 ) ## Summary Fix all 11 mypy type checking errors in `telemetry.py` without using any type suppressions. Changes: - Add type aliases for OpenTelemetry attribute types (`AttributeValue`, `Attributes`) - Create `_clean_attributes()` helper to filter None values from attribute dicts - Use `cast()` for TracerProvider methods (`add_span_processor`, `force_flush`) - Use `cast()` for metric creation methods returning from global storage - Fix variable reuse by renaming `span` to `end_span` in SpanEndPayload branch - Add None check for `parent_span` before `set_span_in_context` Errors Fixed: - TracerProvider attribute access: 2 errors - Counter/UpDownCounter/ObservableGauge return types: 3 errors - Attribute dict type mismatches: 4 errors - Span assignment type conflicts: 2 errors Testing: ```bash uv run mypy src/llama_stack/core/telemetry/telemetry.py # Success: no issues found ``` Part of: Mypy suppression removal plan (Phase 2a/4) Stack: - [Phase 1] Add type stubs (#3930) - [Phase 2a] Fix OpenTelemetry types (this PR) - [Phase 2b+] Fix remaining errors (upcoming) - [Phase 3] Remove inline suppressions (upcoming) - [Phase 4] Un-exclude files from mypy (upcoming)	2025-10-28 09:47:20 -07:00
Ian Miller	5598f61e12	feat(responses)!: introduce OpenAI compatible prompts to Responses API (#3942 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for making changes to Responses API scheme to introduce OpenAI compatible prompts there. Change to the API only, therefore currently no implementation at all. However, the follow up PR with actual implementation will be submitted after current PR lands. The need of this functionality was initiated in #3514. > Note, #3514 is divided on three separate PRs. Current PR is the second of three. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> CI	2025-10-28 09:31:27 -07:00
Ashwin Bharambe	e5ca7e6450	chore(mypy): add mypy and type stub packages to dev deps (#3930 ) ## Summary This PR adds mypy and essential type stub packages to dev dependencies as Phase 1 of the mypy suppression removal plan. Changes: - Add `mypy` to dev dependencies - Add type stubs: `types-jsonschema`, `pandas-stubs`, `types-psutil`, `types-tqdm`, `boto3-stubs` Impact: - Enables static type checking across the codebase - Eliminates ~30 type checking errors related to missing type information for third-party packages - Provides foundation for subsequent PRs to remove type suppressions Part of: Mypy suppression removal plan (Phase 1/4) Testing: ```bash uv sync --group dev uv run mypy ```	2025-10-28 06:02:38 -07:00
Sébastien Han	d10bfb5121	chore: remove leftover llama_stack directory (#3940 ) # What does this PR do? Followup on https://github.com/llamastack/llama-stack/pull/3920 where the llama_stack directory was moved under src. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-28 05:09:08 -07:00
Sébastien Han	b47afac7c2	chore: bump openai package version (#3918 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test llama stack list-deps / show-single-provider (push) Failing after 10s Details Test External API and Providers / test-external (venv) (push) Failing after 10s Details Python Package Build Test / build (3.13) (push) Failing after 24s Details Test llama stack list-deps / generate-matrix (push) Successful in 26s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 28s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (push) Failing after 32s Details Test llama stack list-deps / list-deps (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 39s Details Test Llama Stack Build / build (push) Failing after 33s Details UI Tests / ui-tests (22) (push) Successful in 1m25s Details Pre-commit / pre-commit (push) Successful in 3m49s Details # What does this PR do? To match https://github.com/llamastack/llama-stack/pull/3847 We must not update the lock manually, but always reflect the update in the pyproject.toml. The lock is a state at build time. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-28 09:18:48 +01:00
Ashwin Bharambe	4e6c769cc4	fix(context): prevent provider data leak between streaming requests (#3924 ) ## Summary - `preserve_contexts_async_generator` left `PROVIDER_DATA_VAR` (and other context vars) populated after a streaming generator completed on HEAD~1, so the asyncio context for request N+1 started with request N's provider payload. - FastAPI dependencies and middleware execute before `request_provider_data_context` rebinds the header data, meaning auth/logging hooks could observe a prior tenant's credentials or treat them as authenticated. Traces and any background work that inspects the context outside the `with` block leak as well—this is a real security regression, not just a CLI artifact. - The wrapper now restores each tracked `ContextVar` to the value it held before the iteration (falling back to clearing when necessary) after every yield and when the generator terminates, so provider data is wiped while callers that set their own defaults keep them. ## Test Plan - `uv run pytest tests/unit/core/test_provider_data_context.py -q` - `uv run pytest tests/unit/distribution/test_context.py -q` Both suites fail on HEAD~1 and pass with this change.	2025-10-27 23:01:12 -07:00
ehhuang	c077d01ddf	chore(telemetry): more cleanup: remove apis.telemetry (#3919 ) # What does this PR do? ## Test Plan CI	2025-10-27 22:20:15 -07:00
Eric Huang	d93ca96bf5	merge commit for archive created by Sapling Some checks failed Installer CI / lint (push) Failing after 2s Details Installer CI / smoke-test-on-dev (push) Failing after 5s Details	2025-10-27 15:58:09 -07:00
Eric Huang	7eed264070	tele tests # What does this PR do? ## Test Plan	2025-10-27 15:58:03 -07:00
ehhuang	50cab73504	Merge `0c488b6b63` into sapling-pr-archive-ehhuang	2025-10-27 15:57:14 -07:00
Eric Huang	0c488b6b63	chore(telemetry): more cleanup: remove apis.telemetry # What does this PR do? ## Test Plan	2025-10-27 15:57:09 -07:00
ehhuang	045ea154fd	Merge `11b076810f` into sapling-pr-archive-ehhuang	2025-10-27 15:56:18 -07:00
Eric Huang	11b076810f	chore(telemetry): more cleanup: remove apis.telemetry # What does this PR do? ## Test Plan	2025-10-27 15:56:14 -07:00
ehhuang	f0ac4e7ca9	Merge `c7b3b10ef2` into sapling-pr-archive-ehhuang	2025-10-27 15:33:36 -07:00
Eric Huang	c7b3b10ef2	chore(telemetry): more cleanup: remove apis.telemetry # What does this PR do? ## Test Plan	2025-10-27 15:33:32 -07:00
ehhuang	e9a8967ed5	Merge `9aef325934` into sapling-pr-archive-ehhuang	2025-10-27 15:32:50 -07:00
Eric Huang	9aef325934	chore(telemetry): more cleanup: remove apis.telemetry # What does this PR do? ## Test Plan	2025-10-27 15:32:41 -07:00
Eric Huang	a0b6c424de	merge commit for archive created by Sapling	2025-10-27 15:31:46 -07:00
Eric Huang	9a04f7b646	tele tests # What does this PR do? ## Test Plan	2025-10-27 15:31:41 -07:00
Eric Huang	d8ce23df30	merge commit for archive created by Sapling	2025-10-27 15:07:59 -07:00
Eric Huang	2c0aad4dba	tele tests # What does this PR do? ## Test Plan	2025-10-27 15:07:52 -07:00
Eric Huang	f4f8c7e8fe	merge commit for archive created by Sapling	2025-10-27 15:02:33 -07:00
Eric Huang	e5370ffa74	tele tests # What does this PR do? ## Test Plan	2025-10-27 15:02:25 -07:00
ehhuang	1c9a31d8bd	chore(telemetry): add grafana dashboards (#3921 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Installer CI / lint (push) Failing after 3s Details Installer CI / smoke-test-on-dev (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test llama stack list-deps / generate-matrix (push) Successful in 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 7s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Details Test llama stack list-deps / show-single-provider (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Test llama stack list-deps / list-deps (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 10s Details API Conformance Tests / check-schema-compatibility (push) Successful in 47s Details Test Llama Stack Build / build (push) Failing after 41s Details Test Llama Stack Build / build-single-provider (push) Failing after 48s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 45s Details UI Tests / ui-tests (22) (push) Successful in 1m18s Details Pre-commit / pre-commit (push) Successful in 1m48s Details # What does this PR do? - add a dashboard in grafana (vibe-coded) ## Test Plan <img width="2416" height="1114" alt="image" src="https://github.com/user-attachments/assets/8927aad2-cc14-4a1d-847e-350522cac02f" />	2025-10-27 14:58:27 -07:00
ehhuang	4670264b89	Merge `d9a4f016d0` into sapling-pr-archive-ehhuang	2025-10-27 14:58:20 -07:00
Eric Huang	d9a4f016d0	tele tests # What does this PR do? ## Test Plan	2025-10-27 14:58:14 -07:00
ehhuang	b7dd3f5c56	chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923 ) # What does this PR do? ## Test Plan CI vector_io tests will fail until next client sync passed with https://github.com/llamastack/llama-stack-client-python/pull/286 checked out locally	2025-10-27 14:26:06 -07:00
Nathan Weinberg	b6954c9882	fix: add missing shutdown methods to PromptServiceImpl and ConversationServiceImpl (#3925 ) Change is visible in server shutdown logs, changes `WARNING` loglines to `INFO` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-10-27 13:41:38 -07:00
ehhuang	bf8f6b6914	Merge `960f5e4cd4` into sapling-pr-archive-ehhuang	2025-10-27 13:23:09 -07:00
Eric Huang	960f5e4cd4	chore!: BREAKING CHANGE: vector_db_id -> vector_store_id # What does this PR do? ## Test Plan	2025-10-27 13:22:58 -07:00
Matthew Farrellee	a9b00db421	feat: add provider data keys for Cerebras, Databricks, NVIDIA, and RunPod (#3734 ) # What does this PR do? add provider-data key passing support to Cerebras, Databricks, NVIDIA and RunPod also, added missing tests for Fireworks, Anthropic, Gemini, SambaNova, and vLLM addresses #3517 ## Test Plan ci w/ new tests --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-10-27 13:09:35 -07:00
ehhuang	075990d5e5	Merge `4db31e1c59` into sapling-pr-archive-ehhuang	2025-10-27 12:04:10 -07:00
Eric Huang	4db31e1c59	chore(telemetry): more cleanup: remove apis.telemetry # What does this PR do? ## Test Plan	2025-10-27 12:04:00 -07:00

... 2 3 4 5 6 ...

3537 commits