RamaLama is a fully Open Source AI Model tool that facilitate
local management of AI Models.
https://github.com/containers/ramalama
It is fully open source and supports pulling models from HuggingFace,
Ollama, OCI Images, and via URI file://, http://, https://
It uses the llama.cpp and vllm AI engines for running the MODELS.
It also defaults to running the models inside of containers.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Implement remote ramalama provider using AsyncOpenAI as the client since ramalama doesn't have its own Async library.
Ramalama is similar to ollama, as it is a lightweight local inference server. However, it runs by default in a containerized mode.
RAMALAMA_URL is http://localhost:8080 by default
Signed-off-by: Charlie Doern <cdoern@redhat.com>