feat: remote ramalama provider implementation

Implement remote ramalama provider using AsyncOpenAI as the client since ramalama doesn't have its own Async library. Ramalama is similar to ollama, as it is a lightweight local inference server. However, it runs by default in a containerized mode. RAMALAMA_URL is http://localhost:8080 by default Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-12-30 12:19:31 +00:00 · 2025-03-11 18:15:45 -04:00 · 2025-03-11 18:15:45 -04:00 · 4de45560bf
commit 4de45560bf
parent 94f83382eb
8 changed files with 680 additions and 0 deletions
--- a/llama_stack/providers/registry/inference.py
+++ b/llama_stack/providers/registry/inference.py
@ -77,6 +77,15 @@ def available_providers() -> List[ProviderSpec]:
                module="llama_stack.providers.remote.inference.ollama",
            ),
        ),
+        remote_provider_spec(
+            api=Api.inference,
+            adapter=AdapterSpec(
+                adapter_type="ramalama",
+                pip_packages=["ramalama", "aiohttp"],
+                config_class="llama_stack.providers.remote.inference.ramalama.RamalamaImplConfig",
+                module="llama_stack.providers.remote.inference.ramalama",
+            ),
+        ),
        remote_provider_spec(
            api=Api.inference,
            adapter=AdapterSpec(