mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-27 06:28:50 +00:00
feat: use XDG directory standards
Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
This commit is contained in:
parent
9736f096f6
commit
407c3e3bad
50 changed files with 5611 additions and 508 deletions
172
docs/source/distributions/self_hosted_distro/ollama.md
Normal file
172
docs/source/distributions/self_hosted_distro/ollama.md
Normal file
|
@ -0,0 +1,172 @@
|
|||
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
|
||||
# Ollama Distribution
|
||||
|
||||
The `llamastack/distribution-ollama` distribution consists of the following provider configurations.
|
||||
|
||||
| API | Provider(s) |
|
||||
|-----|-------------|
|
||||
| agents | `inline::meta-reference` |
|
||||
| datasetio | `remote::huggingface`, `inline::localfs` |
|
||||
| eval | `inline::meta-reference` |
|
||||
| files | `inline::localfs` |
|
||||
| inference | `remote::ollama` |
|
||||
| post_training | `inline::huggingface` |
|
||||
| safety | `inline::llama-guard` |
|
||||
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
|
||||
| telemetry | `inline::meta-reference` |
|
||||
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol`, `remote::wolfram-alpha` |
|
||||
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
|
||||
|
||||
|
||||
### Environment Variables
|
||||
|
||||
The following environment variables can be configured:
|
||||
|
||||
- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`)
|
||||
- `OLLAMA_URL`: URL of the Ollama server (default: `http://127.0.0.1:11434`)
|
||||
- `INFERENCE_MODEL`: Inference model loaded into the Ollama server (default: `meta-llama/Llama-3.2-3B-Instruct`)
|
||||
- `SAFETY_MODEL`: Safety model loaded into the Ollama server (default: `meta-llama/Llama-Guard-3-1B`)
|
||||
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Ollama Server
|
||||
|
||||
This distribution requires an external Ollama server to be running. You can install and run Ollama by following these steps:
|
||||
|
||||
1. **Install Ollama**: Download and install Ollama from [https://ollama.ai/](https://ollama.ai/)
|
||||
|
||||
2. **Start the Ollama server**:
|
||||
```bash
|
||||
ollama serve
|
||||
```
|
||||
By default, Ollama serves on `http://127.0.0.1:11434`
|
||||
|
||||
3. **Pull the required models**:
|
||||
```bash
|
||||
# Pull the inference model
|
||||
ollama pull meta-llama/Llama-3.2-3B-Instruct
|
||||
|
||||
# Pull the embedding model
|
||||
ollama pull all-minilm:latest
|
||||
|
||||
# (Optional) Pull the safety model for run-with-safety.yaml
|
||||
ollama pull meta-llama/Llama-Guard-3-1B
|
||||
```
|
||||
|
||||
## Supported Services
|
||||
|
||||
### Inference: Ollama
|
||||
Uses an external Ollama server for running LLM inference. The server should be accessible at the URL specified in the `OLLAMA_URL` environment variable.
|
||||
|
||||
### Vector IO: FAISS
|
||||
Provides vector storage capabilities using FAISS for embeddings and similarity search operations.
|
||||
|
||||
### Safety: Llama Guard (Optional)
|
||||
When using the `run-with-safety.yaml` configuration, provides safety checks using Llama Guard models running on the Ollama server.
|
||||
|
||||
### Agents: Meta Reference
|
||||
Provides agent execution capabilities using the meta-reference implementation.
|
||||
|
||||
### Post-Training: Hugging Face
|
||||
Supports model fine-tuning using Hugging Face integration.
|
||||
|
||||
### Tool Runtime
|
||||
Supports various external tools including:
|
||||
- Brave Search
|
||||
- Tavily Search
|
||||
- RAG Runtime
|
||||
- Model Context Protocol
|
||||
- Wolfram Alpha
|
||||
|
||||
## Running Llama Stack with Ollama
|
||||
|
||||
You can do this via Conda or venv (build code), or Docker which has a pre-built image.
|
||||
|
||||
### Via Docker
|
||||
|
||||
This method allows you to get started quickly without having to build the distribution code.
|
||||
|
||||
```bash
|
||||
LLAMA_STACK_PORT=8321
|
||||
docker run \
|
||||
-it \
|
||||
--pull always \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ./run.yaml:/root/my-run.yaml \
|
||||
llamastack/distribution-ollama \
|
||||
--config /root/my-run.yaml \
|
||||
--port $LLAMA_STACK_PORT \
|
||||
--env OLLAMA_URL=$OLLAMA_URL \
|
||||
--env INFERENCE_MODEL=$INFERENCE_MODEL
|
||||
```
|
||||
|
||||
### Via Conda
|
||||
|
||||
```bash
|
||||
llama stack build --template ollama --image-type conda
|
||||
llama stack run ./run.yaml \
|
||||
--port 8321 \
|
||||
--env OLLAMA_URL=$OLLAMA_URL \
|
||||
--env INFERENCE_MODEL=$INFERENCE_MODEL
|
||||
```
|
||||
|
||||
### Via venv
|
||||
|
||||
If you've set up your local development environment, you can also build the image using your local virtual environment.
|
||||
|
||||
```bash
|
||||
llama stack build --template ollama --image-type venv
|
||||
llama stack run ./run.yaml \
|
||||
--port 8321 \
|
||||
--env OLLAMA_URL=$OLLAMA_URL \
|
||||
--env INFERENCE_MODEL=$INFERENCE_MODEL
|
||||
```
|
||||
|
||||
### Running with Safety
|
||||
|
||||
To enable safety checks, use the `run-with-safety.yaml` configuration:
|
||||
|
||||
```bash
|
||||
llama stack run ./run-with-safety.yaml \
|
||||
--port 8321 \
|
||||
--env OLLAMA_URL=$OLLAMA_URL \
|
||||
--env INFERENCE_MODEL=$INFERENCE_MODEL \
|
||||
--env SAFETY_MODEL=$SAFETY_MODEL
|
||||
```
|
||||
|
||||
## Example Usage
|
||||
|
||||
Once your Llama Stack server is running with Ollama, you can interact with it using the Llama Stack client:
|
||||
|
||||
```python
|
||||
from llama_stack_client import LlamaStackClient
|
||||
|
||||
client = LlamaStackClient(base_url="http://localhost:8321")
|
||||
|
||||
# Run inference
|
||||
response = client.inference.chat_completion(
|
||||
model_id="meta-llama/Llama-3.2-3B-Instruct",
|
||||
messages=[{"role": "user", "content": "Hello, how are you?"}],
|
||||
)
|
||||
print(response.completion_message.content)
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Connection refused errors**: Ensure your Ollama server is running and accessible at the configured URL.
|
||||
|
||||
2. **Model not found errors**: Make sure you've pulled the required models using `ollama pull <model-name>`.
|
||||
|
||||
3. **Performance issues**: Consider using more powerful models or adjusting the Ollama server configuration for better performance.
|
||||
|
||||
### Logs
|
||||
|
||||
Check the Ollama server logs for any issues:
|
||||
```bash
|
||||
# Ollama logs are typically available in:
|
||||
# - macOS: ~/Library/Logs/Ollama/
|
||||
# - Linux: ~/.ollama/logs/
|
||||
```
|
Loading…
Add table
Add a link
Reference in a new issue