mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-04 04:04:14 +00:00
65 lines
2.2 KiB
Text
65 lines
2.2 KiB
Text
---
|
|
title: Using Llama Stack as a Library
|
|
description: How to use Llama Stack as a Python library instead of running a server
|
|
sidebar_label: Importing as Library
|
|
sidebar_position: 5
|
|
---
|
|
|
|
# Using Llama Stack as a Library
|
|
|
|
## Setup Llama Stack without a Server
|
|
|
|
If you are planning to use an external service for Inference (even Ollama or TGI counts as external), it is often easier to use Llama Stack as a library.
|
|
This avoids the overhead of setting up a server.
|
|
|
|
```bash
|
|
# setup
|
|
uv pip install llama-stack
|
|
llama stack build --distro starter --image-type venv
|
|
```
|
|
|
|
```python
|
|
from llama_stack.core.library_client import LlamaStackAsLibraryClient
|
|
|
|
client = LlamaStackAsLibraryClient(
|
|
"starter",
|
|
# provider_data is optional, but if you need to pass in any provider specific data, you can do so here.
|
|
provider_data={"tavily_search_api_key": os.environ["TAVILY_SEARCH_API_KEY"]},
|
|
)
|
|
```
|
|
|
|
This will parse your config and set up any inline implementations and remote clients needed for your implementation.
|
|
|
|
Then, you can access the APIs like `models` and `inference` on the client and call their methods directly:
|
|
|
|
```python
|
|
response = client.models.list()
|
|
```
|
|
|
|
If you've created a [custom distribution](./building_distro), you can also use the run.yaml configuration file directly:
|
|
|
|
```python
|
|
client = LlamaStackAsLibraryClient(config_path)
|
|
```
|
|
|
|
## Benefits of Library Mode
|
|
|
|
- **No server overhead**: Direct Python API calls without HTTP requests
|
|
- **Simplified deployment**: No need to manage server processes
|
|
- **Better integration**: Seamlessly embed in existing Python applications
|
|
- **Reduced latency**: Eliminate network round-trips for inline providers
|
|
|
|
## Use Cases
|
|
|
|
Library mode is ideal when:
|
|
|
|
- Using external services for most APIs (Ollama, remote inference providers, etc.)
|
|
- Building Python applications that need Llama Stack functionality
|
|
- Prototyping and development workflows
|
|
- Serverless or container environments where you want minimal overhead
|
|
|
|
## Related Guides
|
|
|
|
- **[Building Custom Distributions](./building_distro)** - Create your own distribution for library use
|
|
- **[Configuration Reference](./configuration)** - Understanding the configuration format
|
|
- **[Starting Llama Stack Server](./starting_llama_stack_server)** - Alternative server-based deployment
|