Added documentation for launching multiple llama stack servers

2025-10-03 19:57:35 +00:00 · 2025-10-01 17:54:35 -07:00 · 2025-10-01 17:54:35 -07:00 · 3e77ebf772
commit 3e77ebf772
parent 267f658968
2 changed files with 469 additions and 0 deletions
--- a/docs/docs/deploying/Launching_Multiple_LlamaStack_Servers.md
+++ b/docs/docs/deploying/Launching_Multiple_LlamaStack_Servers.md
@ -0,0 +1,468 @@
+# Multiple Llama Stack Servers: Starter Distro Guide
+
+A complete guide to running multiple Llama Stack servers using the **starter distribution** for first-time users.
+
+## Table of Contents
+
+1. [System Requirements](#system-requirements)
+2. [Verify Llama Stack](#Verify-that-llama-stack-is-installed)
+3. [Initialize Starter Distribution](#initialize-starter-distribution)
+4. [Set Up Multiple Servers](#set-up-multiple-servers)
+5. [Configure API Keys](#configure-api-keys)
+6. [Start the Servers](#start-the-servers)
+7. [Test Your Setup](#test-your-setup)
+8. [Manage Your Servers](#manage-your-servers)
+9. [Troubleshooting](#troubleshooting)
+
+---
+
+## System Requirements
+
+### Minimum Requirements
+- **Operating System**: Linux, macOS, or Windows with WSL2
+- **Python**: Version 3.12 or higher
+- **RAM**: 8GB minimum (16GB recommended)
+- **Storage**: 10GB free space minimum
+- **Network**: Stable internet connection
+
+### Check Your System
+```bash
+# Check Python version
+python3 --version
+
+# Check available RAM
+free -h
+
+# Check disk space
+df -h
+```
+
+---
+
+## Verify Llama Stack
+
+### Step 1: Verify that llama stack is installed
+```bash
+# Verify installation
+llama stack --help
+```
+
+### Step 2: Initialize Starter Distribution
+```bash
+# Initialize the starter distribution
+llama stack build --template starter --name starter
+
+# This creates ~/.llama/distributions/starter/
+```
+
+---
+
+## Set Up Multiple Servers
+
+The starter distribution provides a comprehensive configuration with multiple providers. We'll create **2 servers** based on this starter config:
+
+- **Server 1** (Port 8321): Full starter config with all providers
+- **Server 2** (Port 8322): Same config with different database paths (using CLI port override)
+
+### Step 1: Examine the Base Configuration
+
+```bash
+# View the starter configuration
+cat ~/.llama/distributions/starter/starter-run.yaml
+```
+
+### Step 2: Create Server 1 Configuration (Full Starter)
+
+```bash
+# Copy the starter config for Server 1
+cp ~/.llama/distributions/starter/starter-run.yaml ~/server1-starter.yaml
+```
+
+### Step 3: Create Server 2 Configuration (Same Config, Different Databases)
+
+```bash
+# Copy starter config for Server 2
+cp ~/.llama/distributions/starter/starter-run.yaml ~/server2-starter.yaml
+
+# Change the database paths to avoid conflicts (only change needed!)
+sed -i 's|~/.llama/distributions/starter|~/.llama/distributions/starter2|g' ~/server2-starter.yaml
+```
+
+### Step 4: Create Separate Database Directories
+```bash
+# Create separate directories for Server 2
+mkdir -p ~/.llama/distributions/starter2
+```
+
+**That's it!** No need to modify ports in YAML files - we'll use the CLI `--port` flag instead.
+
+---
+
+## Configure API Keys
+
+The starter configuration supports many providers. Set up the API keys you need:
+
+### Essential API Keys
+
+```bash
+# Groq (fast inference)
+export GROQ_API_KEY="your_groq_api_key_here"
+
+# OpenAI (if you want to use GPT models)
+export OPENAI_API_KEY="your_openai_api_key_here"
+
+# Anthropic (if you want Claude models)
+export ANTHROPIC_API_KEY="your_anthropic_api_key_here"
+
+# Ollama (for local models)
+export OLLAMA_URL="http://localhost:11434"
+```
+
+### Optional API Keys (Set only if you plan to use these providers)
+
+```bash
+# Fireworks AI
+export FIREWORKS_API_KEY="your_fireworks_api_key"
+
+# Together AI
+export TOGETHER_API_KEY="your_together_api_key"
+
+# Gemini
+export GEMINI_API_KEY="your_gemini_api_key"
+
+# NVIDIA
+export NVIDIA_API_KEY="your_nvidia_api_key"
+```
+
+---
+
+## Set Up Ollama (Optional)
+
+If you want to use local models through Ollama:
+
+### Install and Start Ollama
+
+**Linux:**
+```bash
+curl -fsSL https://ollama.com/install.sh | sh
+ollama serve
+```
+
+**macOS:**
+```bash
+brew install ollama
+ollama serve
+```
+
+### Download Models (in a new terminal)
+
+```bash
+# Download popular models
+ollama pull llama3.1:8b
+ollama pull llama-guard3:8b
+ollama pull all-minilm:l6-v2
+
+# Verify models
+ollama list
+```
+
+---
+
+## Start the Servers
+
+### Method 1: Run in Separate Terminals (Recommended for Development)
+
+**Terminal 1 - Server 1:**
+```bash
+cd ~
+llama stack run ~/server1-starter.yaml --port 8321
+```
+
+**Terminal 2 - Server 2 (Uses CLI port override!):**
+```bash
+cd ~
+llama stack run ~/server2-starter.yaml --port 8322
+```
+
+### Method 2: Run in Background
+
+```bash
+# Start Server 1 in background
+cd ~
+nohup llama stack run ~/server1-starter.yaml --port 8321 > server1.log 2>&1 &
+
+# Start Server 2 in background with port override
+nohup llama stack run ~/server2-starter.yaml --port 8322 > server2.log 2>&1 &
+```
+
+### Method 3: Alternative - Use Environment Variable
+
+```bash
+# You can also set port via environment variable
+export LLAMA_STACK_PORT=8322
+llama stack run ~/server2-starter.yaml
+
+# Or inline
+LLAMA_STACK_PORT=8322 llama stack run ~/server2-starter.yaml
+```
+
+### Expected Output
+
+Both servers should start successfully:
+```
+Starting server on port 8321...
+Server is running at http://localhost:8321
+```
+
+```
+Starting server on port 8322...
+Server is running at http://localhost:8322
+```
+
+---
+
+## Test Your Setup
+
+### Step 1: Health Check
+
+```bash
+# Test both servers
+curl http://localhost:8321/v1/health
+curl http://localhost:8322/v1/health
+```
+
+**Expected Response:**
+```json
+{"status": "OK"}
+```
+
+### Step 2: List Available Models
+
+```bash
+# Check models on Server 1
+curl -s http://localhost:8321/v1/models | python3 -m json.tool
+
+# Check models on Server 2
+curl -s http://localhost:8322/v1/models | python3 -m json.tool
+```
+
+### Step 3: Test Inference with Different Providers
+
+**Test Groq on Server 1:**
+```bash
+curl -X POST http://localhost:8321/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [{"role": "user", "content": "Hello! How are you?"}],
+    "model": "groq/llama-3.1-8b-instant"
+  }'
+```
+
+**Test OpenAI on Server 2 (if you have OpenAI API key):**
+```bash
+curl -X POST http://localhost:8322/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [{"role": "user", "content": "Hello from server 2!"}],
+    "model": "openai/gpt-4o-mini"
+  }'
+```
+
+**Test Ollama (if you set it up):**
+```bash
+curl -X POST http://localhost:8321/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [{"role": "user", "content": "Hello from Ollama!"}],
+    "model": "ollama/llama3.1:8b"
+  }'
+```
+
+### Step 4: Test Embeddings
+
+```bash
+curl -X POST http://localhost:8321/v1/embeddings \
+  -H "Content-Type: application/json" \
+  -d '{
+    "input": "Hello world",
+    "model": "sentence-transformers/all-MiniLM-L6-v2"
+  }'
+```
+
+---
+
+## Manage Your Servers
+
+### Check What's Running
+
+```bash
+# Check server processes
+lsof -i :8321 -i :8322
+
+# Check all llama stack processes
+ps aux | grep "llama.*stack"
+```
+
+### Stop Servers
+
+**Stop individual servers:**
+```bash
+# Stop Server 1
+kill $(lsof -t -i:8321)
+
+# Stop Server 2
+kill $(lsof -t -i:8322)
+```
+
+**Stop all servers:**
+```bash
+pkill -f "llama.*stack.*run"
+```
+
+### View Logs (if running in background)
+
+```bash
+# Watch Server 1 logs
+tail -f server1.log
+
+# Watch Server 2 logs
+tail -f server2.log
+```
+
+### Restart Servers
+
+```bash
+# Stop all servers first
+pkill -f "llama.*stack.*run"
+sleep 3
+
+# Restart both servers
+cd ~
+nohup llama stack run ~/server1-starter.yaml > server1.log 2>&1 &
+nohup llama stack run ~/server2-starter.yaml > server2.log 2>&1 &
+```
+
+---
+
+## Troubleshooting
+
+### Problem: "Port already in use"
+
+```bash
+# Find what's using the ports
+lsof -i :8321 -i :8322
+
+# Kill processes using the ports
+kill $(lsof -t -i:8321)
+kill $(lsof -t -i:8322)
+```
+
+### Problem: "Provider not available"
+
+The starter config includes many providers that may not have API keys set. This is normal behavior:
+
+```bash
+# Check which environment variables are set
+env | grep -E "(GROQ|OPENAI|ANTHROPIC|OLLAMA)_"
+
+# Set missing API keys you want to use
+export GROQ_API_KEY="your_key_here"
+```
+
+### Problem: "No models available"
+
+```bash
+# Check available models
+curl -s http://localhost:8321/v1/models | python3 -m json.tool
+
+# If empty, check your API keys are set correctly
+echo $GROQ_API_KEY
+echo $OPENAI_API_KEY
+```
+
+### Problem: Ollama connection issues
+
+```bash
+# Check if Ollama is running
+curl http://localhost:11434/api/version
+
+# If not running, start it
+ollama serve
+
+# Verify OLLAMA_URL is set
+echo $OLLAMA_URL
+```
+
+---
+
+## Advanced Usage
+
+### Customize Provider Selection
+
+You can modify the YAML files to enable/disable specific providers:
+
+```yaml
+# In your server config, comment out providers you don't want
+providers:
+  inference:
+    # - provider_id: openai       # Disabled
+    #   provider_type: remote::openai
+    #   config:
+    #     api_key: ${env.OPENAI_API_KEY:=}
+
+    - provider_id: groq           # Enabled
+      provider_type: remote::groq
+      config:
+        api_key: ${env.GROQ_API_KEY:=}
+```
+
+### You Can Use Different Providers on Different Servers
+
+**Server 1 - Local providers**
+- Enable: Ollama, vllm, other local providers
+- Disable: OpenAI, Anthropic, Groq, Fireworks
+
+**Server 2 - Remote providers:**
+- Enable: OpenAI, Anthropic, Gemini
+- Disable: Ollama, vllm and local providers
+---
+
+## Summary
+
+You now have **2 Llama Stack servers** running with the starter distribution:
+
+### Server Configuration
+- **Server 1**: `http://localhost:8321` (Full starter config)
+- **Server 2**: `http://localhost:8322` (Modified starter config)
+
+### Key Files
+- `~/server1-starter.yaml` - Server 1 configuration
+- `~/server2-starter.yaml` - Server 2 configuration
+- `server1.log` - Server 1 logs (if background)
+- `server2.log` - Server 2 logs (if background)
+
+### Key Commands
+```bash
+# Health check
+curl http://localhost:8321/v1/health
+curl http://localhost:8322/v1/health
+
+# Stop servers
+kill $(lsof -t -i:8321)
+kill $(lsof -t -i:8322)
+
+# Check processes
+lsof -i :8321 -i :8322
+```
+
+### Next Steps
+1. Create more servers with different configurations if needed.
+2. Set up API keys for providers you want to use.
+3. Test different models and providers.
+4. Customize configurations for your specific needs.
+5. Set up monitoring and logging for production use.
+
+
+---
+
+*This guide uses the official Llama Stack starter distribution for maximum compatibility and feature coverage.*
--- a/docs/docs/deploying/index.mdx
+++ b/docs/docs/deploying/index.mdx
@ -10,5 +10,6 @@ import TabItem from '@theme/TabItem';

 # Deploying Llama Stack

+[**→ Multiple Llama Stack Servers Guide**](./Launching_Multiple_LlamaStack_Servers.md)
 [**→ Kubernetes Deployment Guide**](./kubernetes_deployment.mdx)
 [**→ AWS EKS Deployment Guide**](./aws_eks_deployment.mdx)