# Ollama Distribution The `llamastack/distribution-{{ name }}` distribution consists of the following provider configurations. {{ providers_table }} {% if run_config_env_vars %} ### Environment Variables The following environment variables can be configured: {% for var, (default_value, description) in run_config_env_vars.items() %} - `{{ var }}`: {{ description }} (default: `{{ default_value }}`) {% endfor %} {% endif %} {% if default_models %} ### Models The following models are available by default: {% for model in default_models %} - `{{ model.model_id }} {{ model.doc_string }}` {% endfor %} {% endif %} ## Prerequisites ### Ollama Server This distribution requires an external Ollama server to be running. You can install and run Ollama by following these steps: 1. **Install Ollama**: Download and install Ollama from [https://ollama.ai/](https://ollama.ai/) 2. **Start the Ollama server**: ```bash ollama serve ``` By default, Ollama serves on `http://127.0.0.1:11434` 3. **Pull the required models**: ```bash # Pull the inference model ollama pull meta-llama/Llama-3.2-3B-Instruct # Pull the embedding model ollama pull all-minilm:latest # (Optional) Pull the safety model for run-with-safety.yaml ollama pull meta-llama/Llama-Guard-3-1B ``` ## Supported Services ### Inference: Ollama Uses an external Ollama server for running LLM inference. The server should be accessible at the URL specified in the `OLLAMA_URL` environment variable. ### Vector IO: FAISS Provides vector storage capabilities using FAISS for embeddings and similarity search operations. ### Safety: Llama Guard (Optional) When using the `run-with-safety.yaml` configuration, provides safety checks using Llama Guard models running on the Ollama server. ### Agents: Meta Reference Provides agent execution capabilities using the meta-reference implementation. ### Post-Training: Hugging Face Supports model fine-tuning using Hugging Face integration. ### Tool Runtime Supports various external tools including: - Brave Search - Tavily Search - RAG Runtime - Model Context Protocol - Wolfram Alpha ## Running Llama Stack with Ollama You can do this via Conda or venv (build code), or Docker which has a pre-built image. ### Via Docker This method allows you to get started quickly without having to build the distribution code. ```bash LLAMA_STACK_PORT=8321 docker run \ -it \ --pull always \ -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ -v ./run.yaml:/root/my-run.yaml \ llamastack/distribution-{{ name }} \ --config /root/my-run.yaml \ --port $LLAMA_STACK_PORT \ --env OLLAMA_URL=$OLLAMA_URL \ --env INFERENCE_MODEL=$INFERENCE_MODEL ``` ### Via Conda ```bash llama stack build --template ollama --image-type conda llama stack run ./run.yaml \ --port 8321 \ --env OLLAMA_URL=$OLLAMA_URL \ --env INFERENCE_MODEL=$INFERENCE_MODEL ``` ### Via venv If you've set up your local development environment, you can also build the image using your local virtual environment. ```bash llama stack build --template ollama --image-type venv llama stack run ./run.yaml \ --port 8321 \ --env OLLAMA_URL=$OLLAMA_URL \ --env INFERENCE_MODEL=$INFERENCE_MODEL ``` ### Running with Safety To enable safety checks, use the `run-with-safety.yaml` configuration: ```bash llama stack run ./run-with-safety.yaml \ --port 8321 \ --env OLLAMA_URL=$OLLAMA_URL \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env SAFETY_MODEL=$SAFETY_MODEL ``` ## Example Usage Once your Llama Stack server is running with Ollama, you can interact with it using the Llama Stack client: ```python from llama_stack_client import LlamaStackClient client = LlamaStackClient(base_url="http://localhost:8321") # Run inference response = client.inference.chat_completion( model_id="meta-llama/Llama-3.2-3B-Instruct", messages=[{"role": "user", "content": "Hello, how are you?"}], ) print(response.completion_message.content) ``` ## Troubleshooting ### Common Issues 1. **Connection refused errors**: Ensure your Ollama server is running and accessible at the configured URL. 2. **Model not found errors**: Make sure you've pulled the required models using `ollama pull `. 3. **Performance issues**: Consider using more powerful models or adjusting the Ollama server configuration for better performance. ### Logs Check the Ollama server logs for any issues: ```bash # Ollama logs are typically available in: # - macOS: ~/Library/Logs/Ollama/ # - Linux: ~/.ollama/logs/ ```