llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Matthew Farrellee 88a796ca5a fix: allow use of models registered at runtime (#1980 ) # What does this PR do? fix a bug where models registered at runtime could not be used. ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct $ curl http://localhost:8321/v1/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "test-model", "messages": [{"role": "user", "content": "What is the weather like in Boston today?"}] }' =(client)=> {"detail":"Internal server error: An unexpected error occurred."} =(server)=> TypeError: Missing required arguments; Expected either ('messages' and 'model') or ('messages', 'model' and 'stream') arguments to be given ``` root cause: test-model is not added to ModelRegistryHelper's alias_to_provider_id_map. as part of the fix, this adds tests for ModelRegistryHelper and defines its expected behavior. user visible behavior changes - \| action \| existing behavior \| new behavior \| \| -- \| -- \| -- \| \| double register \| success (but no change) \| error \| \| register unknown \| success (fail when used) \| error \| existing behavior for register unknown model and double register - ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct-unknown Successfully registered model test-model $ llama-stack-client models list \| grep test-model │ llm │ test-model │ meta/llama-3.1-70b-instruct-unknown │ │ nv… │ $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct Successfully registered model test-model $ llama-stack-client models list \| grep test-model │ llm │ test-model │ meta/llama-3.1-70b-instruct-unknown │ │ nv… │ ``` new behavior for register unknown - ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct-unknown ╭──────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Failed to register model │ │ │ │ Error Type: BadRequestError │ │ Details: Error code: 400 - {'detail': "Invalid value: Model id │ │ 'meta/llama-3.1-70b-instruct-unknown' is not supported. Supported ids are: │ │ meta/llama-3.1-70b-instruct, snowflake/arctic-embed-l, meta/llama-3.2-1b-instruct, │ │ nvidia/nv-embedqa-mistral-7b-v2, meta/llama-3.2-90b-vision-instruct, meta/llama-3.2-3b-instruct, │ │ meta/llama-3.2-11b-vision-instruct, meta/llama-3.1-405b-instruct, meta/llama3-8b-instruct, │ │ meta/llama3-70b-instruct, nvidia/llama-3.2-nv-embedqa-1b-v2, meta/llama-3.1-8b-instruct, │ │ nvidia/nv-embedqa-e5-v5"} │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` new behavior for double register - ``` $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct Successfully registered model test-model $ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.2-1b-instruct ╭──────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Failed to register model │ │ │ │ Error Type: BadRequestError │ │ Details: Error code: 400 - {'detail': "Invalid value: Model id 'test-model' is already │ │ registered. Please use a different id or unregister it first."} │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` ## Test Plan ``` uv run pytest -v tests/unit/providers/utils/test_model_registry.py ```		2025-05-01 12:00:58 -07:00
..
agents	test: add unit test to ensure all config types are instantiable (#1601 )	2025-03-12 22:29:58 -07:00
datasetio	feat: Add NVIDIA NeMo datastore (#1852 )	2025-04-28 09:41:59 -07:00
eval	feat: Add NVIDIA Eval integration (#1890 )	2025-04-24 17:12:42 -07:00
inference	fix: allow use of models registered at runtime (#1980 )	2025-05-01 12:00:58 -07:00
post_training	fix: Correctly parse algorithm_config when launching NVIDIA customization job; fix internal request handler (#2025 )	2025-04-25 13:21:50 -07:00
safety	fix: Fix messages format in NVIDIA safety check request body (#2063 )	2025-04-30 18:01:28 +02:00
tool_runtime	fix(api): don't return list for runtime tools (#1686 )	2025-04-01 09:53:11 +02:00
vector_io	chore: Updating Milvus Client calls to be non-blocking (#1830 )	2025-03-28 22:14:07 -04:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00