llama-stack-mirror/llama_stack/providers/remote/inference
Akram Ben Aissi a548169b99
fix: allow skipping model availability check for vLLM (#3739)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Allows model check to fail gracefully instead of crashing on startup.


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

set VLLM_URL to your VLLM server 

```
(base) akram@Mac llama-stack % LAMA_STACK_LOGGING="all=debug" VLLM_ENABLE_MODEL_DISCOVERY=false  MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter  --image-type venv --run
```



```

INFO     2025-10-08 20:11:24,637 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-08 20:11:24,866 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-08 20:11:26,160 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a
         href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&amp;client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&amp;redirect_uri=ht
         tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&amp;response_type=code&amp;scope=user%3Ainfo+user%3Acheck-access&amp;state=9fba207425
         5851c718aca717a5887d76%3A%2Fmodels">Found</a>.
         
[...]
INFO     2025-10-08 20:11:26,295 uvicorn.error:84 uncategorized: Started server process [83144]
INFO     2025-10-08 20:11:26,296 uvicorn.error:48 uncategorized: Waiting for application startup.
INFO     2025-10-08 20:11:26,297 llama_stack.core.server.server:170 core::server: Starting up
INFO     2025-10-08 20:11:26,297 llama_stack.core.stack:399 core: starting registry refresh task
INFO     2025-10-08 20:11:26,311 uvicorn.error:62 uncategorized: Application startup complete.
INFO     2025-10-08 20:11:26,312 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
ERROR    2025-10-08 20:11:26,791 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a
         href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&amp;client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&amp;redirect_uri=ht
         tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&amp;response_type=code&amp;scope=user%3Ainfo+user%3Acheck-access&amp;state=8ef0cba3e1
         71a4f8b04cb445cfb91a4c%3A%2Fmodels">Found</a>.

```
2025-10-10 07:23:13 -07:00
..
anthropic chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
azure chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
bedrock chore: remove deprecated inference.chat_completion implementations (#3654) 2025-10-03 07:55:34 -04:00
cerebras chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
databricks feat: add refresh_models support to inference adapters (default: false) (#3719) 2025-10-07 15:19:56 +02:00
fireworks chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
gemini chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
groq chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
llama_openai_compat chore: disable openai_embeddings on inference=remote::llama-openai-compat (#3704) 2025-10-06 13:27:40 -04:00
nvidia chore: remove dead code (#3729) 2025-10-07 20:26:02 -07:00
ollama feat: add refresh_models support to inference adapters (default: false) (#3719) 2025-10-07 15:19:56 +02:00
openai chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
passthrough chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
runpod feat: enable Runpod inference adapter (#3707) 2025-10-07 12:24:50 +02:00
sambanova chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
tgi chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
together feat: add refresh_models support to inference adapters (default: false) (#3719) 2025-10-07 15:19:56 +02:00
vertexai chore: turn OpenAIMixin into a pydantic.BaseModel (#3671) 2025-10-06 11:33:19 -04:00
vllm fix: allow skipping model availability check for vLLM (#3739) 2025-10-10 07:23:13 -07:00
watsonx fix: Update watsonx.ai provider to use LiteLLM mixin and list all models (#3674) 2025-10-08 07:29:43 -04:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00