Commit graph

9 commits

Author SHA1 Message Date
Akram Ben Aissi
1970b4aa4b
fix: improve model availability checks: Allows use of unavailable models on startup (#3717)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m28s
- Allows use of unavailable models on startup
- Add has_model method to ModelsRoutingTable for checking pre-registered
models
- Update check_model_availability to check model_store before provider
APIs

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->


Start llama stack and point unavailable vLLM

```
VLLM_URL=https://my-unavailable-vllm/v1 MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run
```

llama stack will start without crashing but only notifying error. 

```


         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2

INFO     2025-10-07 06:40:41,804 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-07 06:40:42,066 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-07 06:40:58,882 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out.
WARNING  2025-10-07 06:40:58,883 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out.
[...]
INFO     2025-10-07 06:40:59,036 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO     2025-10-07 06:41:04,064 openai._base_client:1618 uncategorized: Retrying request to /models in 0.398814 seconds
INFO     2025-10-07 06:41:09,497 openai._base_client:1618 uncategorized: Retrying request to /models in 0.781908 seconds
ERROR    2025-10-07 06:41:15,282 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out.
WARNING  2025-10-07 06:41:15,283 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out.
```
2025-10-07 14:27:24 -04:00
Matthew Farrellee
e892a3f7f4
feat: add refresh_models support to inference adapters (default: false) (#3719)
# What does this PR do?

inference adapters can now configure `refresh_models: bool` to control
periodic model listing from their providers

BREAKING CHANGE: together inference adapter default changed. previously
always refreshed, now follows config.

addresses "models: refresh" on #3517

## Test Plan

ci w/ new tests
2025-10-07 15:19:56 +02:00
Matthew Farrellee
d23ed26238
chore: turn OpenAIMixin into a pydantic.BaseModel (#3671)
# What does this PR do?

- implement get_api_key instead of relying on
LiteLLMOpenAIMixin.get_api_key
 - remove use of LiteLLMOpenAIMixin
 - add default initialize/shutdown methods to OpenAIMixin
 - remove __init__s to allow proper pydantic construction
- remove dead code from vllm adapter and associated / duplicate unit
tests
 - update vllm adapter to use openaimixin for model registration
 - remove ModelRegistryHelper from fireworks & together adapters
 - remove Inference from nvidia adapter
 - complete type hints on embedding_model_metadata
- allow extra fields on OpenAIMixin, for model_store, __provider_id__,
etc
 - new recordings for ollama
 - enhance the list models error handling
- update cerebras (remove cerebras-cloud-sdk) and anthropic (custom
model listing) inference adapters
 - parametrized test_inference_client_caching
- remove cerebras, databricks, fireworks, together from blanket mypy
exclude
 - removed unnecessary litellm deps

## Test Plan

ci
2025-10-06 11:33:19 -04:00
Matthew Farrellee
724dac498c
chore: give OpenAIMixin subcalsses a change to list models without leaking _model_cache details (#3682)
# What does this PR do?

close the _model_cache abstraction leak

## Test Plan

ci w/ new tests
2025-10-06 09:44:33 -04:00
Matthew Farrellee
0a41c4ead0
chore: OpenAIMixin implements ModelsProtocolPrivate (#3662)
# What does this PR do?

add ModelsProtocolPrivate methods to OpenAIMixin

this will allow providers using OpenAIMixin to use a common interface


## Test Plan

ci w/ new tests
2025-10-02 21:32:02 -07:00
Matthew Farrellee
4dbe0593f9
chore: add provider-data-api-key support to openaimixin (#3639)
# What does this PR do?

the LiteLLMOpenAIMixin provides support for reading key from provider
data (headers users send).

this adds the same functionality to the OpenAIMixin.

this is infrastructure for migrating providers.


## Test Plan

ci w/ new tests
2025-10-01 13:44:59 -07:00
Matthew Farrellee
b48d5cfed7
feat(internal): add image_url download feature to OpenAIMixin (#3516)
# What does this PR do?

simplify Ollama inference adapter by -
 - moving image_url download code to OpenAIMixin
- being a ModelRegistryHelper instead of having one (mypy blocks
check_model_availability method assignment)

## Test Plan

 - add unit tests for new download feature
- add integration tests for openai_chat_completion w/ image_url (close
test gap)
2025-09-26 17:32:16 -04:00
Matthew Farrellee
b67aef2fc4
feat: add static embedding metadata to dynamic model listings for providers using OpenAIMixin (#3547)
# What does this PR do?

- remove auto-download of ollama embedding models
- add embedding model metadata to dynamic listing w/ unit test
- add support and tests for allowed_models
- removed inference provider models.py files where dynamic listing is
enabled
- store embedding metadata in embedding_model_metadata field on
inference providers
- make model_entries optional on ModelRegistryHelper and
LiteLLMOpenAIMixin
- make OpenAIMixin a ModelRegistryHelper
- skip base64 embedding test for remote::ollama, always returns floats
- only use OpenAI client for ollama model listing
- remove unused build_model_entry function
- remove unused get_huggingface_repo function


## Test Plan

ci w/ new tests
2025-09-25 17:17:00 -04:00
Matthew Farrellee
521865c388
feat: include all models from provider's /v1/models (#3471)
# What does this PR do?

this replaces the static model listing for any provider using
OpenAIMixin

currently -
 - anthropic
 - azure openai
 - gemini
 - groq
 - llama-api
 - nvidia
 - openai
 - sambanova
 - tgi
 - vertexai
 - vllm
 - not changed: together has its own impl

## Test Plan

 - new unit tests
 - manual for llama-api, openai, groq, gemini

```
for provider in llama-openai-compat openai groq gemini; do
   uv run llama stack build --image-type venv --providers inference=remote::provider --run &
   uv run --with llama-stack-client llama-stack-client models list | grep Total
```

results (17 sep 2025):
 - llama-api: 4
 - openai: 86
 - groq: 21
 - gemini: 66


closes #3467
2025-09-18 05:17:11 -04:00