llama-stack-mirror/llama_stack/templates/ollama/doc_template.md
Mustafa Elbehery 407c3e3bad feat: use XDG directory standards
Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-22 23:29:23 +02:00

168 lines
No EOL
4.4 KiB
Markdown

# Ollama Distribution
The `llamastack/distribution-{{ name }}` distribution consists of the following provider configurations.
{{ providers_table }}
{% if run_config_env_vars %}
### Environment Variables
The following environment variables can be configured:
{% for var, (default_value, description) in run_config_env_vars.items() %}
- `{{ var }}`: {{ description }} (default: `{{ default_value }}`)
{% endfor %}
{% endif %}
{% if default_models %}
### Models
The following models are available by default:
{% for model in default_models %}
- `{{ model.model_id }} {{ model.doc_string }}`
{% endfor %}
{% endif %}
## Prerequisites
### Ollama Server
This distribution requires an external Ollama server to be running. You can install and run Ollama by following these steps:
1. **Install Ollama**: Download and install Ollama from [https://ollama.ai/](https://ollama.ai/)
2. **Start the Ollama server**:
```bash
ollama serve
```
By default, Ollama serves on `http://127.0.0.1:11434`
3. **Pull the required models**:
```bash
# Pull the inference model
ollama pull meta-llama/Llama-3.2-3B-Instruct
# Pull the embedding model
ollama pull all-minilm:latest
# (Optional) Pull the safety model for run-with-safety.yaml
ollama pull meta-llama/Llama-Guard-3-1B
```
## Supported Services
### Inference: Ollama
Uses an external Ollama server for running LLM inference. The server should be accessible at the URL specified in the `OLLAMA_URL` environment variable.
### Vector IO: FAISS
Provides vector storage capabilities using FAISS for embeddings and similarity search operations.
### Safety: Llama Guard (Optional)
When using the `run-with-safety.yaml` configuration, provides safety checks using Llama Guard models running on the Ollama server.
### Agents: Meta Reference
Provides agent execution capabilities using the meta-reference implementation.
### Post-Training: Hugging Face
Supports model fine-tuning using Hugging Face integration.
### Tool Runtime
Supports various external tools including:
- Brave Search
- Tavily Search
- RAG Runtime
- Model Context Protocol
- Wolfram Alpha
## Running Llama Stack with Ollama
You can do this via Conda or venv (build code), or Docker which has a pre-built image.
### Via Docker
This method allows you to get started quickly without having to build the distribution code.
```bash
LLAMA_STACK_PORT=8321
docker run \
-it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ./run.yaml:/root/my-run.yaml \
llamastack/distribution-{{ name }} \
--config /root/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env OLLAMA_URL=$OLLAMA_URL \
--env INFERENCE_MODEL=$INFERENCE_MODEL
```
### Via Conda
```bash
llama stack build --template ollama --image-type conda
llama stack run ./run.yaml \
--port 8321 \
--env OLLAMA_URL=$OLLAMA_URL \
--env INFERENCE_MODEL=$INFERENCE_MODEL
```
### Via venv
If you've set up your local development environment, you can also build the image using your local virtual environment.
```bash
llama stack build --template ollama --image-type venv
llama stack run ./run.yaml \
--port 8321 \
--env OLLAMA_URL=$OLLAMA_URL \
--env INFERENCE_MODEL=$INFERENCE_MODEL
```
### Running with Safety
To enable safety checks, use the `run-with-safety.yaml` configuration:
```bash
llama stack run ./run-with-safety.yaml \
--port 8321 \
--env OLLAMA_URL=$OLLAMA_URL \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env SAFETY_MODEL=$SAFETY_MODEL
```
## Example Usage
Once your Llama Stack server is running with Ollama, you can interact with it using the Llama Stack client:
```python
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url="http://localhost:8321")
# Run inference
response = client.inference.chat_completion(
model_id="meta-llama/Llama-3.2-3B-Instruct",
messages=[{"role": "user", "content": "Hello, how are you?"}],
)
print(response.completion_message.content)
```
## Troubleshooting
### Common Issues
1. **Connection refused errors**: Ensure your Ollama server is running and accessible at the configured URL.
2. **Model not found errors**: Make sure you've pulled the required models using `ollama pull <model-name>`.
3. **Performance issues**: Consider using more powerful models or adjusting the Ollama server configuration for better performance.
### Logs
Check the Ollama server logs for any issues:
```bash
# Ollama logs are typically available in:
# - macOS: ~/Library/Logs/Ollama/
# - Linux: ~/.ollama/logs/
```