llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

History

Ashwin Bharambe ade075152e chore: kill inline::vllm (#2824 ) Inline _inference_ providers haven't proved to be very useful -- they are rarely used. And for good reason -- it is almost never a good idea to include a complex (distributed) inference engine bundled into a distributed stateful front-end server serving many other things. Responsibility should be split properly. See Discord discussion: `1395849853`		2025-07-18 15:52:18 -07:00
..
meta_reference	chore: add `mypy` inference parallel utils (#2670 )	2025-07-18 12:01:10 +02:00
sentence_transformers	feat: introduce APIs for retrieving chat completion requests (#2145 )	2025-05-18 21:43:19 -07:00
__init__.py	precommit	2024-11-08 17:58:58 -08:00