Commit graph

34 commits

Author SHA1 Message Date
Ashwin Bharambe
ec1bae78e6 fix 2025-10-28 16:22:25 -07:00
Ashwin Bharambe
1f5adff5a7 Simplify type signature using only Sequence covariance 2025-10-28 15:53:32 -07:00
Ashwin Bharambe
53c6f846d4 Address PR feedback: improve code clarity and fix AllowedToolsFilter bug
- streaming.py: Extract has_tool_calls boolean for readability
- streaming.py: Replace nested function checks with assertions
- streaming.py: Fix AllowedToolsFilter to use tool_names instead of allowed/disallowed
- streaming.py: Add comment explaining tool_context can be None
- streaming.py, utils.py: Clarify Pydantic/dict compatibility comments
- utils.py: Document list invariance vs Sequence covariance in type signature
- utils.py: Clarify list_shields runtime availability comment
2025-10-28 15:47:31 -07:00
Ashwin Bharambe
84d78ff48a fix(mypy): complete streaming.py type fixes (24→0 errors)
Resolved all remaining mypy errors in streaming.py by adding proper None checks, union type narrowing, and type annotations:

- Fixed ToolContext None checks before accessing attributes
- Added isinstance checks for OpenAIAssistantMessageParam before accessing tool_calls
- Added None checks for response_tool_call and tool_call.function
- Fixed AllowedToolsFilter attribute access (allowed/disallowed)
- Added explicit type annotations to ensure str types (not str | None)
- Added type ignore comments for dict/TypedDict compatibility issues

All meta reference agent files now pass mypy with 0 errors (down from 280 errors).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 15:24:41 -07:00
Ashwin Bharambe
693e99c4ba fix(mypy): resolve OpenAI responses type issues (280→30 errors)
- Fixed openai_responses.py: proper type narrowing with match statements,
  assertions for None checks, explicit list typing
- Fixed utils.py: added Sequence support, union type narrowing, None handling
- Fixed streaming.py signature: accept optional instructions parameter
- tool_executor.py and agent_instance.py: automatically fixed by API changes

Remaining: 30 errors in streaming.py and one other file

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 15:24:41 -07:00
Ashwin Bharambe
35e251090b fix(schema): add Sequence type support to schema generator
Added support for collections.abc.Sequence types in the schema generator to fix OpenAPI spec generation after changing API types from list to Sequence.

Changes:
- Added is_generic_sequence() and unwrap_generic_sequence() helper functions in inspection.py
- Updated python_type_to_name() in name.py to handle Sequence types (treats them as List for schema purposes)
- Fixed force parameter propagation in recursive python_type_to_name() calls to ensure unions within generic types are handled correctly
- Updated docstring_to_schema() to call python_type_to_name() with force=True
- Regenerated OpenAPI specs with updated type handling

This enables using Sequence instead of list in API definitions while maintaining schema generation compatibility.
2025-10-28 15:24:41 -07:00
Ashwin Bharambe
aba98f49db fix(mypy): complete openai_responses.py type fixes (76 errors resolved)
Fixed all 76 type errors in openai_responses.py through proper type-safe refactoring:

Changes made:
- Fixed return type signature for _process_input_with_previous_response to include ToolContext
- Replaced .copy() with list() for Sequence types (line 94)
- Added assertions for text and max_infer_iters to narrow types from None
- Properly typed input_items_data as list[OpenAIResponseInput] to avoid list invariance
- Properly typed conversation_items as list[ConversationItem] to avoid list invariance
- Properly typed output_items as list[ConversationItem] to avoid list invariance
- Fixed StreamingResponseOrchestrator signature to accept str | None for instructions

All fixes use proper type narrowing and union typing without any type: ignore comments.
2025-10-28 15:24:40 -07:00
Ashwin Bharambe
d4d55bc0fe Improve OpenAI responses type safety with Sequence and match statements
- Change list to Sequence in OpenAI response API types to fix list invariance issues
- Use match statements for proper union type narrowing in stream chunk handling
- Reduces errors in openai_responses.py from 76 to 12 (84% reduction)
2025-10-28 15:24:40 -07:00
Ashwin Bharambe
c3f817f344 Update type: ignore comments with clearer explanation
Changed comment from "message.content uses list[AliasType] but mypy expects Iterable[BaseType]" to "OpenAI SDK uses aliased types internally that mypy sees as incompatible with base types".

This is more accurate - the OpenAI SDK's message parameter types use aliased names (like OpenAIChatCompletionContentPartTextParam) internally in their type annotations, and mypy cannot match these with base type names (ChatCompletionContentPartTextParam) even though they're the same types at runtime.

Verified that importing and using base types directly doesn't resolve the issue because the SDK's internal type annotations still use the aliased names.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 13:53:02 -07:00
Ashwin Bharambe
dbd036e7b4 Address PR review feedback
- Simplify provider_resource_id assignment with assertion (review comment 1)
- Fix comment placement order (review comment 2)
- Refactor tool_calls list building to avoid union-attr suppression (review comment 3)
- Rename response_format to response_format_dict to avoid shadowing (review comment 4)
- Update type: ignore comments for message.content with accurate explanation of OpenAI SDK type alias resolution issue (review comment 5)
- Add assertions in litellm_openai_mixin to validate provider_resource_id is not None

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 13:49:34 -07:00
Ashwin Bharambe
9032ba9097 add comments explaining the suppressions 2025-10-28 13:32:21 -07:00
Ashwin Bharambe
a8d51a1a8b fix(mypy): resolve OpenAI compatibility layer type issues
Resolves 111 mypy errors across OpenAI inference compatibility utilities:

litellm_openai_mixin.py (23 errors → 1 unavoidable):
- Add type annotation for input_dict parameter
- Fix JsonSchemaResponseFormat dict conversion and manipulation
- Add None checks for tool_config access with walrus operator
- Fix get_api_key() to properly handle None key_field
- Add model_store None checks in all three OpenAI methods
- Add type annotations for provider_resource_id variable
- Add type: ignore for litellm external library returns

openai_compat.py (88 errors → 0):
- Add None checks for TopPSamplingStrategy temperature and top_p
- Add type: ignore for no-any-return in text_from_choice
- Fix union-attr errors in logprobs iteration with None checks
- Add None checks for choice.text and finish_reason
- Fix OpenAICompatCompletionChoice.message attribute access
- Filter tool_calls to only valid ToolCall objects
- Add type annotations for converted_messages and choices lists
- Fix TypedDict ** expansion issues in message conversions
- Add type: ignore for function dict index operations
- Fix tool_choice and response_format type conversions
- Add type annotations for lls_tools variable
- Fix sampling strategy assignment with proper ordering
- Add None checks for buffer string concatenations
- Rename shadowed tool_call variable to parsed_tool_call
- Fix message content and tool_calls type conversions
- Add isinstance checks before attribute access on content
- Fix OpenAI finish_reason type literal conversions

Code remains clean and readable with strategic use of type: ignore
pragmas only where necessary for external library compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 13:12:43 -07:00
Ashwin Bharambe
dd74b2033c fix(mypy): resolve litellm_openai_mixin type issues (23 errors fixed) 2025-10-28 13:12:43 -07:00
Ashwin Bharambe
2882ae39b9 small stylistic fixes 2025-10-28 13:12:29 -07:00
Ashwin Bharambe
ce1392b3a8 fix(mypy): resolve agent_instance.py type issues (81 errors)
- Add None checks for optional shield and client_tools lists
- Convert StepType.X.value to StepType.X enum values
- Convert ISO timestamp strings to datetime objects
- Add type annotations (output_attachments, tool_name_to_def)
- Fix union type discrimination with isinstance checks
- Fix max_infer_iters optional comparison
- Filter tool_calls to exclude strings, keep only ToolCall objects
- Fix identifier handling for BuiltinTool enum conversion
- Fix Attachment API parameter (url → content)
- Add type: ignore for OpenAI response format compatibility

Fixes all 81 mypy errors in agent_instance.py.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-28 13:12:29 -07:00
Ashwin Bharambe
3cf36e665b fix(mypy): resolve agent_instance type issues (part 1 of 2)
Fixed 35 errors, 46 remaining:
- Add isinstance() checks for union type discrimination
- Fix list type annotations for Message types
- Convert strings to datetime/StepType where needed
- Use assert to narrow AgentTurnCreateRequest vs AgentTurnResumeRequest
- Add explicit type annotations to avoid inference issues

Still to fix:
- Remaining str to datetime/StepType conversions
- Optional list handling for shields
- Type annotations for tool maps
- List variance issues for input_messages
- Fix turn_id variable redefinition
2025-10-28 13:12:29 -07:00
Ashwin Bharambe
3a437d80af fix(mypy): resolve tool_executor type issues (45 errors fixed)
- Add proper type annotations using Any where needed
- Fix union-attr errors with getattr and walrus operator
- Fix arg-type errors for datetime/enum conversions
- Add type: ignore for list invariance issues
- Remove event variable reuse to satisfy type checker
- Use proper type narrowing for tool execution paths

Patterns established:
- Use getattr() with walrus operator for optional attributes
- Use type: ignore for runtime-correct but mypy-incompatible cases
- Separate event variables by type to avoid union conflicts
2025-10-28 13:12:29 -07:00
Ashwin Bharambe
f88416ef87
fix(inference): enable routing of models with provider_data alone (#3928)
This PR enables routing of fully qualified model IDs of the form
`provider_id/model_id` even when the models are not registered with the
Stack.

Here's the situation: assume a remote inference provider which works
only when users provide their own API keys via
`X-LlamaStack-Provider-Data` header. By definition, we cannot list
models and hence update our routing registry. But because we _require_ a
provider ID in the models now, we can identify which provider to route
to and let that provider decide.

Note that we still try to look up our registry since it may have a
pre-registered alias. Just that we don't outright fail when we are not
able to look it up.

Also, updated inference router so that the responses have the _exact_
model that the request had.

## Test Plan

Added an integration test

Closes #3929

---------

Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
2025-10-28 11:16:37 -07:00
Ashwin Bharambe
94b0592240
fix(mypy): add type stubs and fix typing issues (#3938)
Adds type stubs and fixes mypy errors for better type coverage.

Changes:
- Added type_checking dependency group with type stubs (torchtune, trl,
etc.)
- Added lm-format-enforcer to pre-commit hook
- Created HFAutoModel Protocol for type-safe HuggingFace model handling
- Added mypy.overrides for untyped libraries (torchtune, fairscale,
etc.)
- Fixed type issues in post-training providers, databricks, and
api_recorder

Note: ~1,200 errors remain in excluded files (see pyproject.toml exclude
list).

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 11:00:09 -07:00
Ashwin Bharambe
1d385b5b75
fix(mypy): resolve OpenAI SDK and provider type issues (#3936)
## Summary
- Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls
- Fix incorrect OpenAIChatCompletionChunk import in vllm provider
- Refactor to avoid type:ignore comments by using conditional kwargs

## Changes
**openai_mixin.py (9 errors fixed):**
- Build kwargs conditionally for embeddings.create() to avoid
NotGiven/Omit mismatch
- Only include parameters when they have actual values (not None)

**gemini.py (9 errors fixed):**
- Apply same conditional kwargs pattern
- Add missing Any import

**vllm.py (2 errors fixed):**
- Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference
- Remove incorrect alias from openai package

## Technical Notes
The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type
`NotGiven` but parameter signatures expect `Omit`. By only passing
parameters with actual values, we avoid this mismatch entirely without
needing `# type: ignore` comments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:54:29 -07:00
Ashwin Bharambe
d009dc29f7
fix(mypy): resolve provider utility and testing type issues (#3935)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 51s
Pre-commit / pre-commit (push) Successful in 2m0s
Fixes mypy type errors in provider utilities and testing infrastructure:
- `mcp.py`: Cast incompatible client types, wrap image data properly
- `batches.py`: Rename walrus variable to avoid shadowing
- `api_recorder.py`: Use cast for Pydantic field annotation

No functional changes.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:37:27 -07:00
Ashwin Bharambe
fcf07790c8
fix(mypy): resolve model implementation typing issues (#3934)
## Summary

Fixes mypy type errors across 4 model implementation files (Phase 2d of
mypy suppression removal plan):
- `src/llama_stack/models/llama/llama3/multimodal/image_transform.py`
(10 errors fixed)
- `src/llama_stack/models/llama/checkpoint.py` (2 errors fixed)
- `src/llama_stack/models/llama/hadamard_utils.py` (1 error fixed)
- `src/llama_stack/models/llama/llama3/multimodal/encoder_utils.py` (1
error fixed)

## Changes

### image_transform.py
- Fixed return type annotation for `find_supported_resolutions` from
`Tensor` to `list[tuple[int, int]]`
- Fixed parameter and return type annotations for
`resize_without_distortion` from `Tensor` to `Image.Image`
- Resolved variable shadowing by using separate names:
`possible_resolutions_list` for the list and
`possible_resolutions_tensor` for the tensor

### checkpoint.py
- Replaced deprecated `torch.BFloat16Tensor` and
`torch.cuda.BFloat16Tensor` with
`torch.set_default_dtype(torch.bfloat16)`
- Fixed variable shadowing by renaming numpy array to `ckpt_paths_array`
to distinguish from the parameter `ckpt_paths: list[Path]`

### hadamard_utils.py
- Added `isinstance` assertion to narrow type from `nn.Module` to
`nn.Linear` before accessing `in_features` attribute

### encoder_utils.py
- Fixed variable shadowing by using `masks_list` for list accumulation
and `masks` for the final Tensor result

## Test plan

- Verified all files pass mypy type checking (only optional dependency
import warnings remain)
- No functional changes - only type annotations and variable naming
improvements

Stacks on PR #3933

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 10:28:29 -07:00
Ashwin Bharambe
6ce59b5df8
fix(mypy): resolve type issues in MongoDB, batches, and auth providers (#3933)
Fixes mypy type errors in provider utilities:
- MongoDB: Fix AsyncMongoClient parameters, use async iteration for
cursor
- Batches: Handle memoryview|bytes union for file decoding
- Auth: Add missing imports, validate JWKS URI, conditionally pass
parameters

Fixes 11 type errors. No functional changes.
2025-10-28 10:23:39 -07:00
Ashwin Bharambe
4a2ea278c5
fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3943)
Fixes mypy type errors in OpenTelemetry integration:
- Add type aliases for AttributeValue and Attributes
- Add helper to filter None values from attributes (OpenTelemetry
doesn't accept None)
- Cast metric and tracer objects to proper types
- Update imports after refactoring

No functional changes.
2025-10-28 10:10:18 -07:00
Ashwin Bharambe
85887d724f Revert "fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931)"
This reverts commit 9afc52a36a.
2025-10-28 09:48:46 -07:00
Ashwin Bharambe
9afc52a36a
fix(mypy): resolve OpenTelemetry typing issues in telemetry.py (#3931)
## Summary

Fix all 11 mypy type checking errors in `telemetry.py` without using any
type suppressions.

**Changes:**
- Add type aliases for OpenTelemetry attribute types (`AttributeValue`,
`Attributes`)
- Create `_clean_attributes()` helper to filter None values from
attribute dicts
- Use `cast()` for TracerProvider methods (`add_span_processor`,
`force_flush`)
- Use `cast()` for metric creation methods returning from global storage
- Fix variable reuse by renaming `span` to `end_span` in SpanEndPayload
branch
- Add None check for `parent_span` before `set_span_in_context`

**Errors Fixed:**
- TracerProvider attribute access: 2 errors
- Counter/UpDownCounter/ObservableGauge return types: 3 errors
- Attribute dict type mismatches: 4 errors
- Span assignment type conflicts: 2 errors

**Testing:**
```bash
uv run mypy src/llama_stack/core/telemetry/telemetry.py
# Success: no issues found
```

**Part of:** Mypy suppression removal plan (Phase 2a/4)

**Stack:**
- [Phase 1] Add type stubs (#3930)
- [Phase 2a] Fix OpenTelemetry types (this PR)
- [Phase 2b+] Fix remaining errors (upcoming)
- [Phase 3] Remove inline suppressions (upcoming)
- [Phase 4] Un-exclude files from mypy (upcoming)
2025-10-28 09:47:20 -07:00
Ian Miller
5598f61e12
feat(responses)!: introduce OpenAI compatible prompts to Responses API (#3942)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for making changes to Responses API scheme to
introduce OpenAI compatible prompts there. Change to the API only,
therefore currently no implementation at all. However, the follow up PR
with actual implementation will be submitted after current PR lands.

The need of this functionality was initiated in #3514. 

> Note, #3514 is divided on three separate PRs. Current PR is the second
of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
CI
2025-10-28 09:31:27 -07:00
Sébastien Han
d10bfb5121
chore: remove leftover llama_stack directory (#3940)
# What does this PR do?

Followup on https://github.com/llamastack/llama-stack/pull/3920 where
the llama_stack directory was moved under src.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-28 05:09:08 -07:00
Ashwin Bharambe
4e6c769cc4
fix(context): prevent provider data leak between streaming requests (#3924)
## Summary

- `preserve_contexts_async_generator` left `PROVIDER_DATA_VAR` (and
other context vars) populated after a streaming generator completed on
HEAD~1, so the asyncio context for request N+1 started with request N's
provider payload.
- FastAPI dependencies and middleware execute before
`request_provider_data_context` rebinds the header data, meaning
auth/logging hooks could observe a prior tenant's credentials or treat
them as authenticated. Traces and any background work that inspects the
context outside the `with` block leak as well—this is a real security
regression, not just a CLI artifact.
- The wrapper now restores each tracked `ContextVar` to the value it
held before the iteration (falling back to clearing when necessary)
after every yield and when the generator terminates, so provider data is
wiped while callers that set their own defaults keep them.

## Test Plan

- `uv run pytest tests/unit/core/test_provider_data_context.py -q`
- `uv run pytest tests/unit/distribution/test_context.py -q`

Both suites fail on HEAD~1 and pass with this change.
2025-10-27 23:01:12 -07:00
ehhuang
c077d01ddf
chore(telemetry): more cleanup: remove apis.telemetry (#3919)
# What does this PR do?


## Test Plan
CI
2025-10-27 22:20:15 -07:00
ehhuang
b7dd3f5c56
chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923)
# What does this PR do?


## Test Plan
CI
vector_io tests will fail until next client sync

passed with
https://github.com/llamastack/llama-stack-client-python/pull/286 checked
out locally
2025-10-27 14:26:06 -07:00
Nathan Weinberg
b6954c9882
fix: add missing shutdown methods to PromptServiceImpl and ConversationServiceImpl (#3925)
Change is visible in server shutdown logs, changes `WARNING` loglines to
`INFO`

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-10-27 13:41:38 -07:00
Matthew Farrellee
a9b00db421
feat: add provider data keys for Cerebras, Databricks, NVIDIA, and RunPod (#3734)
# What does this PR do?

add provider-data key passing support to Cerebras, Databricks, NVIDIA
and RunPod

also, added missing tests for Fireworks, Anthropic, Gemini, SambaNova,
and vLLM

addresses #3517 

## Test Plan

ci w/ new tests

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-27 13:09:35 -07:00
Ashwin Bharambe
471b1b248b
chore(package): migrate to src/ layout (#3920)
Migrates package structure to src/ layout following Python packaging
best practices.

All code moved from `llama_stack/` to `src/llama_stack/`. Public API
unchanged - imports remain `import llama_stack.*`.

Updated build configs, pre-commit hooks, scripts, and GitHub workflows
accordingly. All hooks pass, package builds cleanly.

**Developer note**: Reinstall after pulling: `pip install -e .`
2025-10-27 12:02:21 -07:00