llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-06 12:37:33 +00:00

History

Ashwin Bharambe ade075152e chore: kill inline::vllm (#2824 ) Inline _inference_ providers haven't proved to be very useful -- they are rarely used. And for good reason -- it is almost never a good idea to include a complex (distributed) inference engine bundled into a distributed stateful front-end server serving many other things. Responsibility should be split properly. See Discord discussion: `1395849853`		2025-07-18 15:52:18 -07:00
..
agents	chore(api): add `mypy` coverage to `meta_reference_safety` (#2661 )	2025-07-09 10:22:34 +02:00
datasetio	chore(refact): move paginate_records fn outside of datasetio (#2137 )	2025-05-12 10:56:14 -07:00
eval	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
files/localfs	fix: add shutdown function for localfs provider (#2781 )	2025-07-16 08:24:57 -07:00
inference	chore: kill inline::vllm (#2824 )	2025-07-18 15:52:18 -07:00
ios/inference	chore: removed executorch submodule (#1265 )	2025-02-25 21:57:21 -08:00
post_training	chore: add `mypy` post training (#2675 )	2025-07-09 15:44:39 +02:00
safety	ci: test safety with starter (#2628 )	2025-07-09 16:53:50 +02:00
scoring	fix: allow default empty vars for conditionals (#2570 )	2025-07-01 14:42:05 +02:00
telemetry	feat: improve telemetry (#2590 )	2025-07-04 17:29:09 +02:00
tool_runtime	feat: Add ChunkMetadata to Chunk (#2497 )	2025-06-25 15:55:23 -04:00
vector_io	fix: SQLiteVecIndex.create(..., bank_id="test_bank.123") - bank_id with a dot - leads to sqlite3.OperationalError (#2770 ) (#2771 )	2025-07-16 08:25:44 -07:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00