llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Ashwin Bharambe db6488b379 fix: enforce allowed_models during inference requests The `allowed_models` configuration was only filtering the model list endpoint but not enforcing restrictions during actual inference requests. This allowed users to bypass the restriction by directly requesting models not in the allowed list, potentially accessing expensive models when only cheaper ones were intended. This change adds validation to all inference methods (`openai_chat_completion`, `openai_completion`, `openai_embeddings`) to reject requests for disallowed models with a clear error message. Implementation: - Added `_validate_model_allowed()` helper method that checks if a model is in the `allowed_models` list - Called validation in all three inference methods before making API requests - Validation occurs after resolving the provider model ID to ensure consistency Test Plan: - Added unit tests verifying all inference methods respect `allowed_models` - Tests cover allowed models (success), disallowed models (rejection), and no restrictions (None allows all, empty list blocks all) - All existing tests continue to pass Fixes GHSA-5rjj-4jp6-fw39		2025-11-19 12:12:28 -08:00
..
test_openai_mixin.py	fix: enforce allowed_models during inference requests	2025-11-19 12:12:28 -08:00
test_prompt_adapter.py	fix: rename llama_stack_api dir (#4155 )	2025-11-13 15:04:36 -08:00
test_remote_inference_provider_config.py	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00