chore: kill inline::vllm (#2824)

Inline _inference_ providers haven't proved to be very useful -- they
are rarely used. And for good reason -- it is almost never a good idea
to include a complex (distributed) inference engine bundled into a
distributed stateful front-end server serving many other things.
Responsibility should be split properly.

See Discord discussion:
1395849853
This commit is contained in:
Ashwin Bharambe 2025-07-18 15:52:18 -07:00 committed by GitHub
parent 68a2dfbad7
commit ade075152e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 0 additions and 1388 deletions

View file

@ -1,35 +0,0 @@
version: 2
distribution_spec:
description: Use a built-in vLLM engine for running LLM inference
providers:
inference:
- inline::vllm
- inline::sentence-transformers
vector_io:
- inline::faiss
- remote::chromadb
- remote::pgvector
safety:
- inline::llama-guard
agents:
- inline::meta-reference
telemetry:
- inline::meta-reference
eval:
- inline::meta-reference
datasetio:
- remote::huggingface
- inline::localfs
scoring:
- inline::basic
- inline::llm-as-judge
- inline::braintrust
tool_runtime:
- remote::brave-search
- remote::tavily-search
- inline::rag-runtime
- remote::model-context-protocol
image_type: conda
additional_pip_packages:
- aiosqlite
- sqlalchemy[asyncio]