chore: kill inline::vllm (#2824)

Inline _inference_ providers haven't proved to be very useful -- they
are rarely used. And for good reason -- it is almost never a good idea
to include a complex (distributed) inference engine bundled into a
distributed stateful front-end server serving many other things.
Responsibility should be split properly.

See Discord discussion:
1395849853
This commit is contained in:
Ashwin Bharambe 2025-07-18 15:52:18 -07:00 committed by GitHub
parent 68a2dfbad7
commit ade075152e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 0 additions and 1388 deletions

View file

@ -257,7 +257,6 @@ exclude = [
"^llama_stack/models/llama/llama4/",
"^llama_stack/providers/inline/inference/meta_reference/quantization/fp8_impls\\.py$",
"^llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers\\.py$",
"^llama_stack/providers/inline/inference/vllm/",
"^llama_stack/providers/inline/post_training/common/validator\\.py$",
"^llama_stack/providers/inline/safety/code_scanner/",
"^llama_stack/providers/inline/safety/llama_guard/",