mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-12 04:00:42 +00:00
fix(inference): enable routing of models with provider_data alone
Assume a remote inference provider which works only when users provide their own API keys via provider_data. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. Added an integration test
This commit is contained in:
parent
471b1b248b
commit
d089a6d106
6 changed files with 209 additions and 57 deletions
|
|
@ -64,7 +64,7 @@ def test_telemetry_format_completeness(mock_otlp_collector, llama_stack_client,
|
|||
|
||||
# Verify spans
|
||||
spans = mock_otlp_collector.get_spans()
|
||||
assert len(spans) == 5
|
||||
assert len(spans) == 5, f"Expected 5 spans, got {len(spans)}"
|
||||
|
||||
# we only need this captured one time
|
||||
logged_model_id = None
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue