<!-- This file was auto-generated by distro_codegen.py, please edit source -->
# Ollama Distribution

The `llamastack/distribution-ollama` distribution consists of the following provider configurations.

| API | Provider(s) |
|-----|-------------|
| agents | `inline::meta-reference` |
| datasetio | `remote::huggingface`, `inline::localfs` |
| eval | `inline::meta-reference` |
| files | `inline::localfs` |
| inference | `remote::ollama` |
| post_training | `inline::huggingface` |
| safety | `inline::llama-guard` |
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
| telemetry | `inline::meta-reference` |
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol`, `remote::wolfram-alpha` |
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |


### Environment Variables

The following environment variables can be configured:

- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`)
- `OLLAMA_URL`: URL of the Ollama server (default: `http://127.0.0.1:11434`)
- `INFERENCE_MODEL`: Inference model loaded into the Ollama server (default: `meta-llama/Llama-3.2-3B-Instruct`)
- `SAFETY_MODEL`: Safety model loaded into the Ollama server (default: `meta-llama/Llama-Guard-3-1B`)


## Prerequisites

### Ollama Server

This distribution requires an external Ollama server to be running. You can install and run Ollama by following these steps:

1. **Install Ollama**: Download and install Ollama from [https://ollama.ai/](https://ollama.ai/)

2. **Start the Ollama server**:
   ```bash
   ollama serve
   ```
   By default, Ollama serves on `http://127.0.0.1:11434`

3. **Pull the required models**:
   ```bash
   # Pull the inference model
   ollama pull meta-llama/Llama-3.2-3B-Instruct

   # Pull the embedding model
   ollama pull all-minilm:latest

   # (Optional) Pull the safety model for run-with-safety.yaml
   ollama pull meta-llama/Llama-Guard-3-1B
   ```

## Supported Services

### Inference: Ollama
Uses an external Ollama server for running LLM inference. The server should be accessible at the URL specified in the `OLLAMA_URL` environment variable.

### Vector IO: FAISS
Provides vector storage capabilities using FAISS for embeddings and similarity search operations.

### Safety: Llama Guard (Optional)
When using the `run-with-safety.yaml` configuration, provides safety checks using Llama Guard models running on the Ollama server.

### Agents: Meta Reference
Provides agent execution capabilities using the meta-reference implementation.

### Post-Training: Hugging Face
Supports model fine-tuning using Hugging Face integration.

### Tool Runtime
Supports various external tools including:
- Brave Search
- Tavily Search
- RAG Runtime
- Model Context Protocol
- Wolfram Alpha

## Running Llama Stack with Ollama

You can do this via Conda or venv (build code), or Docker which has a pre-built image.

### Via Docker

This method allows you to get started quickly without having to build the distribution code.

```bash
LLAMA_STACK_PORT=8321
docker run \
  -it \
  --pull always \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ./run.yaml:/root/my-run.yaml \
  llamastack/distribution-ollama \
  --config /root/my-run.yaml \
  --port $LLAMA_STACK_PORT \
  --env OLLAMA_URL=$OLLAMA_URL \
  --env INFERENCE_MODEL=$INFERENCE_MODEL
```

### Via Conda

```bash
llama stack build --template ollama --image-type conda
llama stack run ./run.yaml \
  --port 8321 \
  --env OLLAMA_URL=$OLLAMA_URL \
  --env INFERENCE_MODEL=$INFERENCE_MODEL
```

### Via venv

If you've set up your local development environment, you can also build the image using your local virtual environment.

```bash
llama stack build --template ollama --image-type venv
llama stack run ./run.yaml \
  --port 8321 \
  --env OLLAMA_URL=$OLLAMA_URL \
  --env INFERENCE_MODEL=$INFERENCE_MODEL
```

### Running with Safety

To enable safety checks, use the `run-with-safety.yaml` configuration:

```bash
llama stack run ./run-with-safety.yaml \
  --port 8321 \
  --env OLLAMA_URL=$OLLAMA_URL \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env SAFETY_MODEL=$SAFETY_MODEL
```

## Example Usage

Once your Llama Stack server is running with Ollama, you can interact with it using the Llama Stack client:

```python
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:8321")

# Run inference
response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.2-3B-Instruct",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
)
print(response.completion_message.content)
```

## Troubleshooting

### Common Issues

1. **Connection refused errors**: Ensure your Ollama server is running and accessible at the configured URL.

2. **Model not found errors**: Make sure you've pulled the required models using `ollama pull <model-name>`.

3. **Performance issues**: Consider using more powerful models or adjusting the Ollama server configuration for better performance.

### Logs

Check the Ollama server logs for any issues:
```bash
# Ollama logs are typically available in:
# - macOS: ~/Library/Logs/Ollama/
# - Linux: ~/.ollama/logs/
```