llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Nathan Weinberg cf158f2cb9 feat: allow ollama to use 'latest' if available but not specified (#1903 ) # What does this PR do? ollama's CLI supports running models via commands such as 'ollama run llama3.2' this syntax does not work with the INFERENCE_MODEL llamastack var as currently specifying a tag such as 'latest' is required this commit will check to see if the 'latest' model is available and use that model if a user passes a model name without a tag but the 'latest' is available in ollama ## Test Plan Behavior pre-code change ```bash $ INFERENCE_MODEL=llama3.2 llama stack build --template ollama --image-type venv --run ... INFO 2025-04-08 13:42:42,842 llama_stack.providers.remote.inference.ollama.ollama:80 inference: checking connectivity to Ollama at `http://beanlab1.bss.redhat.com:11434`... Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/server/server.py", line 502, in <module> main() File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/server/server.py", line 401, in main impls = asyncio.run(construct_stack(config)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 195, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/base_events.py", line 691, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/stack.py", line 222, in construct_stack await register_resources(run_config, impls) File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/stack.py", line 99, in register_resources await method(*obj.model_dump()) File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper result = await method(self, args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 294, in register_model registered_model = await self.register_object(model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 228, in register_object registered_obj = await register_object_with_provider(obj, p) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 77, in register_object_with_provider return await p.register_model(obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper result = await method(self, args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nathan/ai/llama-stack/repos/llama-stack/llama_stack/providers/remote/inference/ollama/ollama.py", line 315, in register_model raise ValueError( ValueError: Model 'llama3.2' is not available in Ollama. Available models: llama3.2:latest ++ error_handler 108 ++ echo 'Error occurred in script at line: 108' Error occurred in script at line: 108 ++ exit 1 ``` Behavior post-code change ```bash $ INFERENCE_MODEL=llama3.2 llama stack build --template ollama --image-type venv --run ... INFO 2025-04-08 13:58:17,365 llama_stack.providers.remote.inference.ollama.ollama:80 inference: checking connectivity to Ollama at `http://beanlab1.bss.redhat.com:11434`... WARNING 2025-04-08 13:58:18,190 llama_stack.providers.remote.inference.ollama.ollama:317 inference: Imprecise provider resource id was used but 'latest' is available in Ollama - using 'llama3.2:latest' INFO 2025-04-08 13:58:18,191 llama_stack.providers.remote.inference.ollama.ollama:308 inference: Pulling embedding model `all-minilm:latest` if necessary... INFO 2025-04-08 13:58:18,799 __main__:478 server: Listening on ['::', '0.0.0.0']:8321 INFO: Started server process [28378] INFO: Waiting for application startup. INFO 2025-04-08 13:58:18,803 __main__:148 server: Starting up INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ... ``` ## Documentation Did not document this anywhere but happy to do so if there is an appropriate place Signed-off-by: Nathan Weinberg <nweinber@redhat.com>		2025-04-14 09:03:54 -07:00
..
anthropic	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
bedrock	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
cerebras	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
cerebras_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
databricks	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
fireworks	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
fireworks_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
gemini	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
groq	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
groq_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
nvidia	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
ollama	feat: allow ollama to use 'latest' if available but not specified (#1903 )	2025-04-14 09:03:54 -07:00
openai	feat(providers): Groq now uses LiteLLM openai-compat (#1303 )	2025-02-27 13:16:50 -08:00
passthrough	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
runpod	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
sambanova	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
sambanova_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
tgi	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
together	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
together_openai_compat	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
vllm	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00