feat: use XDG directory standards

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-07-27 06:28:50 +00:00 · 2025-07-03 18:48:53 +02:00 · 2025-07-03 18:48:53 +02:00 · 407c3e3bad
commit 407c3e3bad
parent 9736f096f6
50 changed files with 5611 additions and 508 deletions
--- a/docs/source/distributions/self_hosted_distro/ollama.md
+++ b/docs/source/distributions/self_hosted_distro/ollama.md
@ -0,0 +1,172 @@
+<!-- This file was auto-generated by distro_codegen.py, please edit source -->
+# Ollama Distribution
+
+The `llamastack/distribution-ollama` distribution consists of the following provider configurations.
+
+| API | Provider(s) |
+|-----|-------------|
+| agents | `inline::meta-reference` |
+| datasetio | `remote::huggingface`, `inline::localfs` |
+| eval | `inline::meta-reference` |
+| files | `inline::localfs` |
+| inference | `remote::ollama` |
+| post_training | `inline::huggingface` |
+| safety | `inline::llama-guard` |
+| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
+| telemetry | `inline::meta-reference` |
+| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol`, `remote::wolfram-alpha` |
+| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
+
+
+### Environment Variables
+
+The following environment variables can be configured:
+
+- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`)
+- `OLLAMA_URL`: URL of the Ollama server (default: `http://127.0.0.1:11434`)
+- `INFERENCE_MODEL`: Inference model loaded into the Ollama server (default: `meta-llama/Llama-3.2-3B-Instruct`)
+- `SAFETY_MODEL`: Safety model loaded into the Ollama server (default: `meta-llama/Llama-Guard-3-1B`)
+
+
+## Prerequisites
+
+### Ollama Server
+
+This distribution requires an external Ollama server to be running. You can install and run Ollama by following these steps:
+
+1. **Install Ollama**: Download and install Ollama from [https://ollama.ai/](https://ollama.ai/)
+
+2. **Start the Ollama server**:
+   ```bash
+   ollama serve
+   ```
+   By default, Ollama serves on `http://127.0.0.1:11434`
+
+3. **Pull the required models**:
+   ```bash
+   # Pull the inference model
+   ollama pull meta-llama/Llama-3.2-3B-Instruct
+
+   # Pull the embedding model
+   ollama pull all-minilm:latest
+
+   # (Optional) Pull the safety model for run-with-safety.yaml
+   ollama pull meta-llama/Llama-Guard-3-1B
+   ```
+
+## Supported Services
+
+### Inference: Ollama
+Uses an external Ollama server for running LLM inference. The server should be accessible at the URL specified in the `OLLAMA_URL` environment variable.
+
+### Vector IO: FAISS
+Provides vector storage capabilities using FAISS for embeddings and similarity search operations.
+
+### Safety: Llama Guard (Optional)
+When using the `run-with-safety.yaml` configuration, provides safety checks using Llama Guard models running on the Ollama server.
+
+### Agents: Meta Reference
+Provides agent execution capabilities using the meta-reference implementation.
+
+### Post-Training: Hugging Face
+Supports model fine-tuning using Hugging Face integration.
+
+### Tool Runtime
+Supports various external tools including:
+- Brave Search
+- Tavily Search
+- RAG Runtime
+- Model Context Protocol
+- Wolfram Alpha
+
+## Running Llama Stack with Ollama
+
+You can do this via Conda or venv (build code), or Docker which has a pre-built image.
+
+### Via Docker
+
+This method allows you to get started quickly without having to build the distribution code.
+
+```bash
+LLAMA_STACK_PORT=8321
+docker run \
+  -it \
+  --pull always \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  -v ./run.yaml:/root/my-run.yaml \
+  llamastack/distribution-ollama \
+  --config /root/my-run.yaml \
+  --port $LLAMA_STACK_PORT \
+  --env OLLAMA_URL=$OLLAMA_URL \
+  --env INFERENCE_MODEL=$INFERENCE_MODEL
+```
+
+### Via Conda
+
+```bash
+llama stack build --template ollama --image-type conda
+llama stack run ./run.yaml \
+  --port 8321 \
+  --env OLLAMA_URL=$OLLAMA_URL \
+  --env INFERENCE_MODEL=$INFERENCE_MODEL
+```
+
+### Via venv
+
+If you've set up your local development environment, you can also build the image using your local virtual environment.
+
+```bash
+llama stack build --template ollama --image-type venv
+llama stack run ./run.yaml \
+  --port 8321 \
+  --env OLLAMA_URL=$OLLAMA_URL \
+  --env INFERENCE_MODEL=$INFERENCE_MODEL
+```
+
+### Running with Safety
+
+To enable safety checks, use the `run-with-safety.yaml` configuration:
+
+```bash
+llama stack run ./run-with-safety.yaml \
+  --port 8321 \
+  --env OLLAMA_URL=$OLLAMA_URL \
+  --env INFERENCE_MODEL=$INFERENCE_MODEL \
+  --env SAFETY_MODEL=$SAFETY_MODEL
+```
+
+## Example Usage
+
+Once your Llama Stack server is running with Ollama, you can interact with it using the Llama Stack client:
+
+```python
+from llama_stack_client import LlamaStackClient
+
+client = LlamaStackClient(base_url="http://localhost:8321")
+
+# Run inference
+response = client.inference.chat_completion(
+    model_id="meta-llama/Llama-3.2-3B-Instruct",
+    messages=[{"role": "user", "content": "Hello, how are you?"}],
+)
+print(response.completion_message.content)
+```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Connection refused errors**: Ensure your Ollama server is running and accessible at the configured URL.
+
+2. **Model not found errors**: Make sure you've pulled the required models using `ollama pull <model-name>`.
+
+3. **Performance issues**: Consider using more powerful models or adjusting the Ollama server configuration for better performance.
+
+### Logs
+
+Check the Ollama server logs for any issues:
+```bash
+# Ollama logs are typically available in:
+# - macOS: ~/Library/Logs/Ollama/
+# - Linux: ~/.ollama/logs/
+```