llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Ashwin Bharambe 68a2dfbad7 feat(ollama): periodically refresh models (#2805 ) For self-hosted providers like Ollama (or vLLM), the backing server is running a set of models. That server should be treated as the source of truth and the Stack registry should just be a cache for those models. Of course, in production environments, you may not want this (because you know what model you are running statically) hence there's a config boolean to control this behavior. _This is part of a series of PRs aimed at removing the requirement of needing to set `INFERENCE_MODEL` env variables for running Llama Stack server._ ## Test Plan Copy and modify the starter.yaml template / config and enable `refresh_models: true, refresh_models_interval: 10` for the ollama provider. Then, run: ``` LLAMA_STACK_LOGGING=all=debug \ ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml ``` See a gargantuan amount of logs, but verify that the provider is periodically refreshing models. Stop and prune a model from ollama server, restart the server. Verify that the model goes away when I call `uv run llama-stack-client models list`		2025-07-18 12:20:36 -07:00
..
agents	feat: Add webmethod for deleting openai responses (#2160 )	2025-06-30 11:28:02 +02:00
batch_inference	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
benchmarks	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
common	chore(api): add `mypy` coverage to `apis` (#2648 )	2025-07-09 12:55:16 +02:00
datasetio	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
datasets	fix: finish conversion to StrEnum (#2514 )	2025-06-26 08:01:26 +05:30
eval	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
files	fix: finish conversion to StrEnum (#2514 )	2025-06-26 08:01:26 +05:30
inference	feat(ollama): periodically refresh models (#2805 )	2025-07-18 12:20:36 -07:00
inspect	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
models	chore: internal change, make Model.provider_model_id non-optional (#2690 )	2025-07-17 08:26:57 -07:00
post_training	fix: DPOAlignmentConfig schema to use correct DPO parameters (#2804 )	2025-07-18 11:56:00 -07:00
providers	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
safety	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
scoring	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
scoring_functions	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
shields	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
synthetic_data_generation	chore: remove nested imports (#2515 )	2025-06-26 08:01:05 +05:30
telemetry	docs: Minor spelling fix (#2592 )	2025-07-02 20:26:51 -04:00
tools	feat: add input validation for search mode of rag query config (#2275 )	2025-07-14 09:11:34 -04:00
vector_dbs	fix: Fix `/vector-stores/create` API when vector store with duplicate `name` (#2617 )	2025-07-15 11:24:41 -04:00
vector_io	fix: Fix `/vector-stores/create` API when vector store with duplicate `name` (#2617 )	2025-07-15 11:24:41 -04:00
__init__.py	API Updates (#73 )	2024-09-17 19:51:35 -07:00
datatypes.py	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
resource.py	feat: drop python 3.10 support (#2469 )	2025-06-19 12:07:14 +05:30
version.py	llama-stack version alpha -> v1	2025-01-15 05:58:09 -08:00