The `allowed_models` configuration was only filtering the model list endpoint but not enforcing restrictions during actual inference requests. This allowed users to bypass the restriction by directly requesting models not in the allowed list, potentially accessing expensive models when only cheaper ones were intended.
This change adds validation to all inference methods (`openai_chat_completion`, `openai_completion`, `openai_embeddings`) to reject requests for disallowed models with a clear error message.
**Implementation:**
- Added `_validate_model_allowed()` helper method that checks if a model is in the `allowed_models` list
- Called validation in all three inference methods before making API requests
- Validation occurs after resolving the provider model ID to ensure consistency
**Test Plan:**
- Added unit tests verifying all inference methods respect `allowed_models`
- Tests cover allowed models (success), disallowed models (rejection), and no restrictions (None allows all, empty list blocks all)
- All existing tests continue to pass
Fixes GHSA-5rjj-4jp6-fw39
# What does this PR do?
the directory structure was src/llama-stack-api/llama_stack_api
instead it should just be src/llama_stack_api to match the other
packages.
update the structure and pyproject/linting config
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Extract API definitions and provider specifications into a standalone
llama-stack-api package that can be published to PyPI independently of
the main llama-stack server.
see: https://github.com/llamastack/llama-stack/pull/2978 and
https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942
Motivation
External providers currently import from llama-stack, which overrides
the installed version and causes dependency conflicts. This separation
allows external providers to:
- Install only the type definitions they need without server
dependencies
- Avoid version conflicts with the installed llama-stack package
- Be versioned and released independently
This enables us to re-enable external provider module tests that were
previously blocked by these import conflicts.
Changes
- Created llama-stack-api package with minimal dependencies (pydantic,
jsonschema)
- Moved APIs, providers datatypes, strong_typing, and schema_utils
- Updated all imports from llama_stack.* to llama_stack_api.*
- Configured local editable install for development workflow
- Updated linting and type-checking configuration for both packages
Next Steps
- Publish llama-stack-api to PyPI
- Update external provider dependencies
- Re-enable external provider module tests
Pre-cursor PRs to this one:
- #4093
- #3954
- #4064
These PRs moved key pieces _out_ of the Api pkg, limiting the scope of
change here.
relates to #3237
## Test Plan
Package builds successfully and can be imported independently. All
pre-commit hooks pass with expected exclusions maintained.
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
This PR enables routing of fully qualified model IDs of the form
`provider_id/model_id` even when the models are not registered with the
Stack.
Here's the situation: assume a remote inference provider which works
only when users provide their own API keys via
`X-LlamaStack-Provider-Data` header. By definition, we cannot list
models and hence update our routing registry. But because we _require_ a
provider ID in the models now, we can identify which provider to route
to and let that provider decide.
Note that we still try to look up our registry since it may have a
pre-registered alias. Just that we don't outright fail when we are not
able to look it up.
Also, updated inference router so that the responses have the _exact_
model that the request had.
## Test Plan
Added an integration test
Closes#3929
---------
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
## Summary
- Fix OpenAI SDK NotGiven/Omit type mismatches in embeddings calls
- Fix incorrect OpenAIChatCompletionChunk import in vllm provider
- Refactor to avoid type:ignore comments by using conditional kwargs
## Changes
**openai_mixin.py (9 errors fixed):**
- Build kwargs conditionally for embeddings.create() to avoid
NotGiven/Omit mismatch
- Only include parameters when they have actual values (not None)
**gemini.py (9 errors fixed):**
- Apply same conditional kwargs pattern
- Add missing Any import
**vllm.py (2 errors fixed):**
- Use correct OpenAIChatCompletionChunk from llama_stack.apis.inference
- Remove incorrect alias from openai package
## Technical Notes
The OpenAI SDK has a type system quirk where `NOT_GIVEN` has type
`NotGiven` but parameter signatures expect `Omit`. By only passing
parameters with actual values, we avoid this mismatch entirely without
needing `# type: ignore` comments.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Migrates package structure to src/ layout following Python packaging
best practices.
All code moved from `llama_stack/` to `src/llama_stack/`. Public API
unchanged - imports remain `import llama_stack.*`.
Updated build configs, pre-commit hooks, scripts, and GitHub workflows
accordingly. All hooks pass, package builds cleanly.
**Developer note**: Reinstall after pulling: `pip install -e .`
2025-10-27 12:02:21 -07:00
Renamed from llama_stack/providers/utils/inference/openai_mixin.py (Browse further)