llama-stack/llama_stack/providers/remote/inference
Matthew Farrellee 88a796ca5a
fix: allow use of models registered at runtime (#1980)
# What does this PR do?

fix a bug where models registered at runtime could not be used.

```
$ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct

$ curl http://localhost:8321/v1/openai/v1/chat/completions \                                                        
-H "Content-Type: application/json" \
-d '{
  "model": "test-model",
  "messages": [{"role": "user", "content": "What is the weather like in Boston today?"}]
}'

=(client)=> {"detail":"Internal server error: An unexpected error occurred."}
=(server)=> TypeError: Missing required arguments; Expected either ('messages' and 'model') or ('messages', 'model' and 'stream') arguments to be given
```

*root cause:* test-model is not added to ModelRegistryHelper's
alias_to_provider_id_map.

as part of the fix, this adds tests for ModelRegistryHelper and defines
its expected behavior.

user visible behavior changes -

| action | existing behavior | new behavior |
| -- | -- | -- |
| double register | success (but no change) | error |
| register unknown | success (fail when used) | error |

existing behavior for register unknown model and double register -
```
$ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct-unknown
Successfully registered model test-model

$ llama-stack-client models list | grep test-model
│ llm │ test-model                               │ meta/llama-3.1-70b-instruct-unknown │     │ nv… │

$ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct       
Successfully registered model test-model

$ llama-stack-client models list | grep test-model
│ llm │ test-model                               │ meta/llama-3.1-70b-instruct-unknown │     │ nv… │
```

new behavior for register unknown -
```
$ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct-unknown
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Failed to register model                                                                         │
│                                                                                                  │
│ Error Type: BadRequestError                                                                      │
│ Details: Error code: 400 - {'detail': "Invalid value: Model id                                   │
│ 'meta/llama-3.1-70b-instruct-unknown' is not supported. Supported ids are:                       │
│ meta/llama-3.1-70b-instruct, snowflake/arctic-embed-l, meta/llama-3.2-1b-instruct,               │
│ nvidia/nv-embedqa-mistral-7b-v2, meta/llama-3.2-90b-vision-instruct, meta/llama-3.2-3b-instruct, │
│ meta/llama-3.2-11b-vision-instruct, meta/llama-3.1-405b-instruct, meta/llama3-8b-instruct,       │
│ meta/llama3-70b-instruct, nvidia/llama-3.2-nv-embedqa-1b-v2, meta/llama-3.1-8b-instruct,         │
│ nvidia/nv-embedqa-e5-v5"}                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
```

new behavior for double register -
```
$ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.1-70b-instruct
Successfully registered model test-model

$ llama-stack-client models register test-model --provider-id nvidia --provider-model-id meta/llama-3.2-1b-instruct 
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Failed to register model                                                                         │
│                                                                                                  │
│ Error Type: BadRequestError                                                                      │
│ Details: Error code: 400 - {'detail': "Invalid value: Model id 'test-model' is already           │
│ registered. Please use a different id or unregister it first."}                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
```


## Test Plan

```
uv run pytest -v tests/unit/providers/utils/test_model_registry.py
```
2025-05-01 12:00:58 -07:00
..
anthropic feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
bedrock feat(pre-commit): enhance pre-commit hooks with additional checks (#2014) 2025-04-30 11:35:49 -07:00
cerebras fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
cerebras_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
databricks fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
fireworks fix: OpenAI Completions API and Fireworks (#1997) 2025-04-21 11:49:12 -07:00
fireworks_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
gemini feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
groq fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
groq_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
llama_openai_compat feat: add api.llama provider, llama-guard-4 model (#2058) 2025-04-29 10:07:41 -07:00
nvidia feat: NVIDIA allow non-llama model registration (#1859) 2025-04-24 17:13:33 -07:00
ollama fix: allow use of models registered at runtime (#1980) 2025-05-01 12:00:58 -07:00
openai feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
passthrough fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
runpod fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
sambanova fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
sambanova_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
tgi fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
together fix: Together provider shutdown and default to non-streaming (#2001) 2025-04-22 17:47:53 +02:00
together_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
vllm fix: allow use of models registered at runtime (#1980) 2025-05-01 12:00:58 -07:00
watsonx fix: updated watsonx inference chat apis with new repo changes (#2033) 2025-04-26 10:17:52 -07:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00