feat: remote ramalama provider implementation

Implement remote ramalama provider using AsyncOpenAI as the client since ramalama doesn't have its own Async library.
Ramalama is similar to ollama, as it is a lightweight local inference server. However, it runs by default in a containerized mode.

RAMALAMA_URL is http://localhost:8080 by default

Signed-off-by: Charlie Doern <cdoern@redhat.com>
This commit is contained in:
Charlie Doern 2025-03-11 18:15:45 -04:00
parent 94f83382eb
commit 4de45560bf
8 changed files with 680 additions and 0 deletions

View file

@ -77,6 +77,15 @@ def available_providers() -> List[ProviderSpec]:
module="llama_stack.providers.remote.inference.ollama",
),
),
remote_provider_spec(
api=Api.inference,
adapter=AdapterSpec(
adapter_type="ramalama",
pip_packages=["ramalama", "aiohttp"],
config_class="llama_stack.providers.remote.inference.ramalama.RamalamaImplConfig",
module="llama_stack.providers.remote.inference.ramalama",
),
),
remote_provider_spec(
api=Api.inference,
adapter=AdapterSpec(