llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Ashwin Bharambe 68a2dfbad7 feat(ollama): periodically refresh models (#2805 ) For self-hosted providers like Ollama (or vLLM), the backing server is running a set of models. That server should be treated as the source of truth and the Stack registry should just be a cache for those models. Of course, in production environments, you may not want this (because you know what model you are running statically) hence there's a config boolean to control this behavior. _This is part of a series of PRs aimed at removing the requirement of needing to set `INFERENCE_MODEL` env variables for running Llama Stack server._ ## Test Plan Copy and modify the starter.yaml template / config and enable `refresh_models: true, refresh_models_interval: 10` for the ollama provider. Then, run: ``` LLAMA_STACK_LOGGING=all=debug \ ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml ``` See a gargantuan amount of logs, but verify that the provider is periodically refreshing models. Stop and prune a model from ollama server, restart the server. Verify that the model goes away when I call `uv run llama-stack-client models list`		2025-07-18 12:20:36 -07:00
..
access_control	fix: auth sql store: user is owner policy (#2674 )	2025-07-10 14:40:32 -07:00
routers	fix: Fix `/vector-stores/create` API when vector store with duplicate `name` (#2617 )	2025-07-15 11:24:41 -04:00
routing_tables	feat(ollama): periodically refresh models (#2805 )	2025-07-18 12:20:36 -07:00
server	fix: remove disabled providers from model dump (#2784 )	2025-07-18 10:44:35 -07:00
store	fix: store configs (#2593 )	2025-07-03 10:07:23 -07:00
ui	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
utils	chore: update pre-commit hook versions (#2708 )	2025-07-10 16:47:59 +02:00
__init__.py	API Updates (#73 )	2024-09-17 19:51:35 -07:00
build.py	chore: bump python supported version to 3.12 (#2475 )	2025-06-24 09:22:04 +05:30
build_conda_env.sh	chore: fix build script bug (#2507 )	2025-06-24 12:05:22 -07:00
build_container.sh	fix: container build on podman (#2723 )	2025-07-11 16:25:33 +02:00
build_venv.sh	chore: remove straggler references to llama-models (#1345 )	2025-03-01 14:26:03 -08:00
client.py	chore: make cprint write to stderr (#2250 )	2025-05-24 23:39:57 -07:00
common.sh	feat(pre-commit): enhance pre-commit hooks with additional checks (#2014 )	2025-04-30 11:35:49 -07:00
configure.py	fix: stop image_name from being cast to an integer (#2759 )	2025-07-15 09:44:21 -07:00
datatypes.py	feat(auth): support github tokens (#2509 )	2025-07-08 11:02:36 -07:00
distribution.py	ci: fix external provider test (#2438 )	2025-06-12 16:14:32 +02:00
inspect.py	chore: use starlette built-in Route class (#2267 )	2025-05-28 09:53:33 -07:00
library_client.py	feat(ollama): periodically refresh models (#2805 )	2025-07-18 12:20:36 -07:00
providers.py	feat: consolidate most distros into "starter" (#2516 )	2025-07-04 15:58:03 +02:00
request_headers.py	feat: fine grained access control policy (#2264 )	2025-06-03 14:51:12 -07:00
resolver.py	fix: de-clutter `llama stack run` logs (#2783 )	2025-07-16 09:44:26 -07:00
stack.py	fix: remove disabled providers from model dump (#2784 )	2025-07-18 10:44:35 -07:00
start_stack.sh	refactor: remove container from list of run image types (#2178 )	2025-06-02 09:57:55 +02:00