Commit graph

3537 commits

Author SHA1 Message Date
Eric Huang
889ce058b9 feat: openai files provider
# What does this PR do?


## Test Plan
2025-10-28 12:46:45 -07:00
ehhuang
0745d308a8
Merge a7b506718d into sapling-pr-archive-ehhuang 2025-10-28 12:43:55 -07:00
Eric Huang
a7b506718d feat: openai files provider
# What does this PR do?


## Test Plan
2025-10-28 12:38:18 -07:00
Eric Huang
6b386db8fc merge commit for archive created by Sapling 2025-10-28 12:20:02 -07:00
Eric Huang
177958657f tele tests
# What does this PR do?


## Test Plan
2025-10-28 12:19:50 -07:00
Eric Huang
2e468e0d1f merge commit for archive created by Sapling 2025-10-28 12:08:33 -07:00
Eric Huang
b3c9d5a15f tele tests
# What does this PR do?


## Test Plan
2025-10-28 12:08:27 -07:00
Ashwin Bharambe
f88416ef87
fix(inference): enable routing of models with provider_data alone (#3928)
This PR enables routing of fully qualified model IDs of the form
`provider_id/model_id` even when the models are not registered with the
Stack.

Here's the situation: assume a remote inference provider which works
only when users provide their own API keys via
`X-LlamaStack-Provider-Data` header. By definition, we cannot list
models and hence update our routing registry. But because we _require_ a
provider ID in the models now, we can identify which provider to route
to and let that provider decide.

Note that we still try to look up our registry since it may have a
pre-registered alias. Just that we don't outright fail when we are not
able to look it up.

Also, updated inference router so that the responses have the _exact_
model that the request had.

## Test Plan

Added an integration test

Closes #3929

---------

Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
2025-10-28 11:16:37 -07:00
Eric Huang
ad1ca3b2c2 merge commit for archive created by Sapling 2025-10-28 11:04:10 -07:00
Eric Huang
1407edbf52 tele tests
# What does this PR do?


## Test Plan
2025-10-28 11:04:02 -07:00
Ashwin Bharambe
94b0592240
fix(mypy): add type stubs and fix typing issues (#3938)
Adds type stubs and fixes mypy errors for better type coverage.

Changes:
- Added type_checking dependency group with type stubs (torchtune, trl,
etc.)
- Added lm-format-enforcer to pre-commit hook
- Created HFAutoModel Protocol for type-safe HuggingFace model handling
- Added mypy.overrides for untyped libraries (torchtune, fairscale,
etc.)
- Fixed type issues in post-training providers, databricks, and
api_recorder

Note: ~1,200 errors remain in excluded files (see pyproject.toml exclude
list).

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 11:00:09 -07:00
Ashwin Bharambe
1d385b5b75
fix(mypy): resolve OpenAI SDK and provider type issues (#3936)
## Summary
- Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls
- Fix incorrect OpenAIChatCompletionChunk import in vllm provider
- Refactor to avoid type:ignore comments by using conditional kwargs

## Changes
**openai_mixin.py (9 errors fixed):**
- Build kwargs conditionally for embeddings.create() to avoid
NotGiven/Omit mismatch
- Only include parameters when they have actual values (not None)

**gemini.py (9 errors fixed):**
- Apply same conditional kwargs pattern
- Add missing Any import

**vllm.py (2 errors fixed):**
- Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference
- Remove incorrect alias from openai package

## Technical Notes
The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type
`NotGiven` but parameter signatures expect `Omit`. By only passing
parameters with actual values, we avoid this mismatch entirely without
needing `# type: ignore` comments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:54:29 -07:00
Ashwin Bharambe
d009dc29f7
fix(mypy): resolve provider utility and testing type issues (#3935)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 51s
Pre-commit / pre-commit (push) Successful in 2m0s
Fixes mypy type errors in provider utilities and testing infrastructure:
- `mcp.py`: Cast incompatible client types, wrap image data properly
- `batches.py`: Rename walrus variable to avoid shadowing
- `api_recorder.py`: Use cast for Pydantic field annotation

No functional changes.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:37:27 -07:00
Ashwin Bharambe
fcf07790c8
fix(mypy): resolve model implementation typing issues (#3934)
## Summary

Fixes mypy type errors across 4 model implementation files (Phase 2d of
mypy suppression removal plan):
- `src/llama_stack/models/llama/llama3/multimodal/image_transform.py`
(10 errors fixed)
- `src/llama_stack/models/llama/checkpoint.py` (2 errors fixed)
- `src/llama_stack/models/llama/hadamard_utils.py` (1 error fixed)
- `src/llama_stack/models/llama/llama3/multimodal/encoder_utils.py` (1
error fixed)

## Changes

### image_transform.py
- Fixed return type annotation for `find_supported_resolutions` from
`Tensor` to `list[tuple[int, int]]`
- Fixed parameter and return type annotations for
`resize_without_distortion` from `Tensor` to `Image.Image`
- Resolved variable shadowing by using separate names:
`possible_resolutions_list` for the list and
`possible_resolutions_tensor` for the tensor

### checkpoint.py
- Replaced deprecated `torch.BFloat16Tensor` and
`torch.cuda.BFloat16Tensor` with
`torch.set_default_dtype(torch.bfloat16)`
- Fixed variable shadowing by renaming numpy array to `ckpt_paths_array`
to distinguish from the parameter `ckpt_paths: list[Path]`

### hadamard_utils.py
- Added `isinstance` assertion to narrow type from `nn.Module` to
`nn.Linear` before accessing `in_features` attribute

### encoder_utils.py
- Fixed variable shadowing by using `masks_list` for list accumulation
and `masks` for the final Tensor result

## Test plan

- Verified all files pass mypy type checking (only optional dependency
import warnings remain)
- No functional changes - only type annotations and variable naming
improvements

Stacks on PR #3933

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:28:29 -07:00
Ashwin Bharambe
6ce59b5df8
fix(mypy): resolve type issues in MongoDB, batches, and auth providers (#3933)
Fixes mypy type errors in provider utilities:
- MongoDB: Fix AsyncMongoClient parameters, use async iteration for
cursor
- Batches: Handle memoryview|bytes union for file decoding
- Auth: Add missing imports, validate JWKS URI, conditionally pass
parameters

Fixes 11 type errors. No functional changes.
2025-10-28 10:23:39 -07:00
Ashwin Bharambe
4a2ea278c5
fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3943)
Fixes mypy type errors in OpenTelemetry integration:
- Add type aliases for AttributeValue and Attributes
- Add helper to filter None values from attributes (OpenTelemetry
doesn't accept None)
- Cast metric and tracer objects to proper types
- Update imports after refactoring

No functional changes.
2025-10-28 10:10:18 -07:00
Ashwin Bharambe
85887d724f Revert "fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931)"
This reverts commit 9afc52a36a.
2025-10-28 09:48:46 -07:00
Ashwin Bharambe
9afc52a36a
fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931)
## Summary

Fix all 11 mypy type checking errors in `telemetry.py` without using any
type suppressions.

**Changes:**
- Add type aliases for OpenTelemetry attribute types (`AttributeValue`,
`Attributes`)
- Create `_clean_attributes()` helper to filter None values from
attribute dicts
- Use `cast()` for TracerProvider methods (`add_span_processor`,
`force_flush`)
- Use `cast()` for metric creation methods returning from global storage
- Fix variable reuse by renaming `span` to `end_span` in SpanEndPayload
branch
- Add None check for `parent_span` before `set_span_in_context`

**Errors Fixed:**
- TracerProvider attribute access: 2 errors
- Counter/UpDownCounter/ObservableGauge return types: 3 errors
- Attribute dict type mismatches: 4 errors
- Span assignment type conflicts: 2 errors

**Testing:**
```bash
uv run mypy src/llama_stack/core/telemetry/telemetry.py
# Success: no issues found
```

**Part of:** Mypy suppression removal plan (Phase 2a/4)

**Stack:**
- [Phase 1] Add type stubs (#3930)
- [Phase 2a] Fix OpenTelemetry types (this PR)
- [Phase 2b+] Fix remaining errors (upcoming)
- [Phase 3] Remove inline suppressions (upcoming)
- [Phase 4] Un-exclude files from mypy (upcoming)
2025-10-28 09:47:20 -07:00
Ian Miller
5598f61e12
feat(responses)!: introduce OpenAI compatible prompts to Responses API (#3942)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for making changes to Responses API scheme to
introduce OpenAI compatible prompts there. Change to the API only,
therefore currently no implementation at all. However, the follow up PR
with actual implementation will be submitted after current PR lands.

The need of this functionality was initiated in #3514. 

> Note, #3514 is divided on three separate PRs. Current PR is the second
of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
CI
2025-10-28 09:31:27 -07:00
Ashwin Bharambe
e5ca7e6450
chore(mypy): add mypy and type stub packages to dev deps (#3930)
## Summary

This PR adds mypy and essential type stub packages to dev dependencies
as Phase 1 of the mypy suppression removal plan.

**Changes:**
- Add `mypy` to dev dependencies
- Add type stubs: `types-jsonschema`, `pandas-stubs`, `types-psutil`,
`types-tqdm`, `boto3-stubs`

**Impact:**
- Enables static type checking across the codebase
- Eliminates ~30 type checking errors related to missing type
information for third-party packages
- Provides foundation for subsequent PRs to remove type suppressions

**Part of:** Mypy suppression removal plan (Phase 1/4)

**Testing:**
```bash
uv sync --group dev
uv run mypy
```
2025-10-28 06:02:38 -07:00
Sébastien Han
d10bfb5121
chore: remove leftover llama_stack directory (#3940)
# What does this PR do?

Followup on https://github.com/llamastack/llama-stack/pull/3920 where
the llama_stack directory was moved under src.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-28 05:09:08 -07:00
Sébastien Han
b47afac7c2
chore: bump openai package version (#3918)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test llama stack list-deps / list-deps-from-config (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test llama stack list-deps / show-single-provider (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 24s
Test llama stack list-deps / generate-matrix (push) Successful in 26s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 28s
Unit Tests / unit-tests (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (push) Failing after 32s
Test llama stack list-deps / list-deps (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 39s
Test Llama Stack Build / build (push) Failing after 33s
UI Tests / ui-tests (22) (push) Successful in 1m25s
Pre-commit / pre-commit (push) Successful in 3m49s
# What does this PR do?

To match https://github.com/llamastack/llama-stack/pull/3847 We must not
update the lock manually, but always reflect the update in the
pyproject.toml. The lock is a state at build time.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-28 09:18:48 +01:00
Ashwin Bharambe
4e6c769cc4
fix(context): prevent provider data leak between streaming requests (#3924)
## Summary

- `preserve_contexts_async_generator` left `PROVIDER_DATA_VAR` (and
other context vars) populated after a streaming generator completed on
HEAD~1, so the asyncio context for request N+1 started with request N's
provider payload.
- FastAPI dependencies and middleware execute before
`request_provider_data_context` rebinds the header data, meaning
auth/logging hooks could observe a prior tenant's credentials or treat
them as authenticated. Traces and any background work that inspects the
context outside the `with` block leak as well—this is a real security
regression, not just a CLI artifact.
- The wrapper now restores each tracked `ContextVar` to the value it
held before the iteration (falling back to clearing when necessary)
after every yield and when the generator terminates, so provider data is
wiped while callers that set their own defaults keep them.

## Test Plan

- `uv run pytest tests/unit/core/test_provider_data_context.py -q`
- `uv run pytest tests/unit/distribution/test_context.py -q`

Both suites fail on HEAD~1 and pass with this change.
2025-10-27 23:01:12 -07:00
ehhuang
c077d01ddf
chore(telemetry): more cleanup: remove apis.telemetry (#3919)
# What does this PR do?


## Test Plan
CI
2025-10-27 22:20:15 -07:00
Eric Huang
d93ca96bf5 merge commit for archive created by Sapling
Some checks failed
Installer CI / lint (push) Failing after 2s
Installer CI / smoke-test-on-dev (push) Failing after 5s
2025-10-27 15:58:09 -07:00
Eric Huang
7eed264070 tele tests
# What does this PR do?


## Test Plan
2025-10-27 15:58:03 -07:00
ehhuang
50cab73504
Merge 0c488b6b63 into sapling-pr-archive-ehhuang 2025-10-27 15:57:14 -07:00
Eric Huang
0c488b6b63 chore(telemetry): more cleanup: remove apis.telemetry
# What does this PR do?


## Test Plan
2025-10-27 15:57:09 -07:00
ehhuang
045ea154fd
Merge 11b076810f into sapling-pr-archive-ehhuang 2025-10-27 15:56:18 -07:00
Eric Huang
11b076810f chore(telemetry): more cleanup: remove apis.telemetry
# What does this PR do?


## Test Plan
2025-10-27 15:56:14 -07:00
ehhuang
f0ac4e7ca9
Merge c7b3b10ef2 into sapling-pr-archive-ehhuang 2025-10-27 15:33:36 -07:00
Eric Huang
c7b3b10ef2 chore(telemetry): more cleanup: remove apis.telemetry
# What does this PR do?


## Test Plan
2025-10-27 15:33:32 -07:00
ehhuang
e9a8967ed5
Merge 9aef325934 into sapling-pr-archive-ehhuang 2025-10-27 15:32:50 -07:00
Eric Huang
9aef325934 chore(telemetry): more cleanup: remove apis.telemetry
# What does this PR do?


## Test Plan
2025-10-27 15:32:41 -07:00
Eric Huang
a0b6c424de merge commit for archive created by Sapling 2025-10-27 15:31:46 -07:00
Eric Huang
9a04f7b646 tele tests
# What does this PR do?


## Test Plan
2025-10-27 15:31:41 -07:00
Eric Huang
d8ce23df30 merge commit for archive created by Sapling 2025-10-27 15:07:59 -07:00
Eric Huang
2c0aad4dba tele tests
# What does this PR do?


## Test Plan
2025-10-27 15:07:52 -07:00
Eric Huang
f4f8c7e8fe merge commit for archive created by Sapling 2025-10-27 15:02:33 -07:00
Eric Huang
e5370ffa74 tele tests
# What does this PR do?


## Test Plan
2025-10-27 15:02:25 -07:00
ehhuang
1c9a31d8bd
chore(telemetry): add grafana dashboards (#3921)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Installer CI / lint (push) Failing after 3s
Installer CI / smoke-test-on-dev (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test llama stack list-deps / generate-matrix (push) Successful in 2s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
Test llama stack list-deps / list-deps (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
API Conformance Tests / check-schema-compatibility (push) Successful in 47s
Test Llama Stack Build / build (push) Failing after 41s
Test Llama Stack Build / build-single-provider (push) Failing after 48s
Test llama stack list-deps / list-deps-from-config (push) Failing after 45s
UI Tests / ui-tests (22) (push) Successful in 1m18s
Pre-commit / pre-commit (push) Successful in 1m48s
# What does this PR do?
- add a dashboard in grafana (vibe-coded)

## Test Plan
<img width="2416" height="1114" alt="image"
src="https://github.com/user-attachments/assets/8927aad2-cc14-4a1d-847e-350522cac02f"
/>
2025-10-27 14:58:27 -07:00
ehhuang
4670264b89
Merge d9a4f016d0 into sapling-pr-archive-ehhuang 2025-10-27 14:58:20 -07:00
Eric Huang
d9a4f016d0 tele tests
# What does this PR do?


## Test Plan
2025-10-27 14:58:14 -07:00
ehhuang
b7dd3f5c56
chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923)
# What does this PR do?


## Test Plan
CI
vector_io tests will fail until next client sync

passed with
https://github.com/llamastack/llama-stack-client-python/pull/286 checked
out locally
2025-10-27 14:26:06 -07:00
Nathan Weinberg
b6954c9882
fix: add missing shutdown methods to PromptServiceImpl and ConversationServiceImpl (#3925)
Change is visible in server shutdown logs, changes `WARNING` loglines to
`INFO`

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-10-27 13:41:38 -07:00
ehhuang
bf8f6b6914
Merge 960f5e4cd4 into sapling-pr-archive-ehhuang 2025-10-27 13:23:09 -07:00
Eric Huang
960f5e4cd4 chore!: BREAKING CHANGE: vector_db_id -> vector_store_id
# What does this PR do?


## Test Plan
2025-10-27 13:22:58 -07:00
Matthew Farrellee
a9b00db421
feat: add provider data keys for Cerebras, Databricks, NVIDIA, and RunPod (#3734)
# What does this PR do?

add provider-data key passing support to Cerebras, Databricks, NVIDIA
and RunPod

also, added missing tests for Fireworks, Anthropic, Gemini, SambaNova,
and vLLM

addresses #3517 

## Test Plan

ci w/ new tests

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-27 13:09:35 -07:00
ehhuang
075990d5e5
Merge 4db31e1c59 into sapling-pr-archive-ehhuang 2025-10-27 12:04:10 -07:00
Eric Huang
4db31e1c59 chore(telemetry): more cleanup: remove apis.telemetry
# What does this PR do?


## Test Plan
2025-10-27 12:04:00 -07:00