chore: kill inline::vllm (#2824)

Inline _inference_ providers haven't proved to be very useful -- they
are rarely used. And for good reason -- it is almost never a good idea
to include a complex (distributed) inference engine bundled into a
distributed stateful front-end server serving many other things.
Responsibility should be split properly.

See Discord discussion:
1395849853
This commit is contained in:
Ashwin Bharambe 2025-07-18 15:52:18 -07:00 committed by GitHub
parent 68a2dfbad7
commit ade075152e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 0 additions and 1388 deletions

View file

@ -37,16 +37,6 @@ def available_providers() -> list[ProviderSpec]:
config_class="llama_stack.providers.inline.inference.meta_reference.MetaReferenceInferenceConfig",
description="Meta's reference implementation of inference with support for various model formats and optimization techniques.",
),
InlineProviderSpec(
api=Api.inference,
provider_type="inline::vllm",
pip_packages=[
"vllm",
],
module="llama_stack.providers.inline.inference.vllm",
config_class="llama_stack.providers.inline.inference.vllm.VLLMConfig",
description="vLLM inference provider for high-performance model serving with PagedAttention and continuous batching.",
),
InlineProviderSpec(
api=Api.inference,
provider_type="inline::sentence-transformers",