mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-23 21:04:29 +00:00
chore: kill inline::vllm (#2824)
Inline _inference_ providers haven't proved to be very useful -- they
are rarely used. And for good reason -- it is almost never a good idea
to include a complex (distributed) inference engine bundled into a
distributed stateful front-end server serving many other things.
Responsibility should be split properly.
See Discord discussion:
1395849853
This commit is contained in:
parent
68a2dfbad7
commit
ade075152e
12 changed files with 0 additions and 1388 deletions
|
@ -257,7 +257,6 @@ exclude = [
|
|||
"^llama_stack/models/llama/llama4/",
|
||||
"^llama_stack/providers/inline/inference/meta_reference/quantization/fp8_impls\\.py$",
|
||||
"^llama_stack/providers/inline/inference/sentence_transformers/sentence_transformers\\.py$",
|
||||
"^llama_stack/providers/inline/inference/vllm/",
|
||||
"^llama_stack/providers/inline/post_training/common/validator\\.py$",
|
||||
"^llama_stack/providers/inline/safety/code_scanner/",
|
||||
"^llama_stack/providers/inline/safety/llama_guard/",
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue