mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-30 03:44:20 +00:00
* Removes a bunch of distros * Removed distros were added into the "starter" distribution * Doc for "starter" has been added * Partially reverts https://github.com/meta-llama/llama-stack/pull/2482 since inference providers are disabled by default and can be turned on manually via env variable. * Disables safety in starter distro Closes: #2502 Signed-off-by: Sébastien Han <seb@redhat.com>
1.2 KiB
1.2 KiB
Using Llama Stack as a Library
Setup Llama Stack without a Server
If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library. This avoids the overhead of setting up a server.
# setup
uv pip install llama-stack
llama stack build --template starter --image-type venv
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
client = LlamaStackAsLibraryClient(
"ollama",
# provider_data is optional, but if you need to pass in any provider specific data, you can do so here.
provider_data={"tavily_search_api_key": os.environ["TAVILY_SEARCH_API_KEY"]},
)
client.initialize()
This will parse your config and set up any inline implementations and remote clients needed for your implementation.
Then, you can access the APIs like models
and inference
on the client and call their methods directly:
response = client.models.list()
If you've created a custom distribution, you can also use the run.yaml configuration file directly:
client = LlamaStackAsLibraryClient(config_path)
client.initialize()