diff --git a/docs/docs/getting_started/configuring_and_launching_llama_stack.md b/docs/docs/getting_started/configuring_and_launching_llama_stack.md new file mode 100644 index 000000000..bb925c2c7 --- /dev/null +++ b/docs/docs/getting_started/configuring_and_launching_llama_stack.md @@ -0,0 +1,303 @@ +# Configuring and Launching Llama Stack + +This guide walks you through the two primary methods for setting up and running Llama Stack: using Docker containers and configuring the server manually. + +## Method 1: Using the Starter Docker Container + +The easiest way to get started with Llama Stack is using the pre-built Docker container. This approach eliminates the need for manual dependency management and provides a consistent environment across different systems. + +### Prerequisites + +- Docker installed and running on your system +- Access to external model providers (e.g., Ollama running locally) + +### Basic Docker Usage + +Here's an example for spinning up the Llama Stack server using Docker: + +```bash +docker run -it \ + -v ~/.llama:/root/.llama \ + --network=host \ + llamastack/distribution-starter \ + --e OLLAMA_URL=http://localhost:11434 +``` + +### Docker Command Breakdown + +- `-it`: Run in interactive mode with TTY allocation +- `-v ~/.llama:/root/.llama`: Mount your local Llama Stack configuration directory +- `--network=host`: Use host networking to access local services like Ollama +- `llamastack/distribution-starter`: The official Llama Stack Docker image +- `--e OLLAMA_URL=http://localhost:11434`: Set environment variable for Ollama URL + +### Advanced Docker Configuration + +You can customize the Docker deployment with additional environment variables: + +```bash +docker run -it \ + -v ~/.llama:/root/.llama \ + -p 8321:8321 \ + -e OLLAMA_URL=http://localhost:11434 \ + -e BRAVE_SEARCH_API_KEY=your_api_key_here \ + -e TAVILY_SEARCH_API_KEY=your_api_key_here \ + llamastack/distribution-starter \ + --port 8321 +``` + +### Environment Variables + +Common environment variables you can set: + +| Variable | Description | Example | +|----------|-------------|---------| +| `OLLAMA_URL` | URL for Ollama service | `http://localhost:11434` | +| `BRAVE_SEARCH_API_KEY` | API key for Brave search | `your_brave_api_key` | +| `TAVILY_SEARCH_API_KEY` | API key for Tavily search | `your_tavily_api_key` | +| `TOGETHER_API_KEY` | API key for Together AI | `your_together_api_key` | +| `OPENAI_API_KEY` | API key for OpenAI | `your_openai_api_key` | + +## Method 2: Manual Server Configuration and Launch + +For more control over your Llama Stack deployment, you can configure and run the server manually. + +### Prerequisites + +1. **Install Llama Stack**: + + Using pip: + ```bash + pip install llama-stack + ``` + + Using uv (alternative): + ```bash + # Initialize a new project (if starting fresh) + uv init + + # Add llama-stack as a dependency + uv add llama-stack + + # Note: If using uv, prefix subsequent commands with 'uv run' + # Example: uv run llama stack build --list-distros + ``` + +### Step 1: Build a Distribution + +Choose a distro and build your Llama Stack distribution: + +```bash +# List available distributions +llama stack build --list-distros + +# Build with a specific distro +llama stack build --distro watsonx --image-type venv --image-name watsonx-stack + +# Or build with a meta-reference distro +llama stack build --distro meta-reference-gpu --image-type venv --image-name meta-reference-gpu-stack +``` + +#### Advanced: Custom Provider Selection (Step 1.a) + +If you know the specific providers you want to use, you can supply them directly on the command-line instead of using a pre-built distribution: + +```bash +llama stack build --providers inference=remote::ollama,agents=inline::meta-reference,safety=inline::llama-guard,vector_io=inline::faiss,tool_runtime=inline::rag-runtime --image-type venv --image-name custom-stack +``` + +**Discover Available Options:** + +```bash +# List all available APIs +llama stack list-apis + +# List all available providers +llama stack list-providers +``` + +This approach gives you complete control over which providers are included in your stack, allowing for highly customized configurations tailored to your specific needs. + +### Select Available Distributions + +- **ci-tests**: CI tests for Llama Stack +- **dell**: Dell's distribution of Llama Stack. TGI inference via Dell's custom container +- **meta-reference-gpu**: Use Meta Reference for running LLM inference +- **nvidia**: Use NVIDIA NIM for running LLM inference, evaluation and safety +- **open-benchmark**: Distribution for running open benchmarks +- **postgres-demo**: Quick start template for running Llama Stack with several popular providers +- **starter**: Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments +- **starter-gpu**: Quick start template for running Llama Stack with several popular providers. This distribution is intended for GPU-enabled environments +- **watsonx**: Use watsonx for running LLM inference + +### Step 2: Configure Your Stack + +After building, you can customize the configuration files: + +#### Configuration File Locations + +- Build config: `~/.llama/distributions/{stack-name}/{stack-name}-build.yaml` +- Runtime config: `~/.llama/distributions/{stack-name}/{stack-name}-run.yaml` + +#### Sample Runtime Configuration + +```yaml +version: 2 + +apis: +- inference +- safety +- embeddings +- tool_runtime + +providers: + inference: + - provider_id: ollama + provider_type: remote::ollama + config: + url: http://localhost:11434 + + safety: + - provider_id: llama-guard + provider_type: remote::ollama + config: + url: http://localhost:11434 + + embeddings: + - provider_id: ollama-embeddings + provider_type: remote::ollama + config: + url: http://localhost:11434 + + tool_runtime: + - provider_id: brave-search + provider_type: remote::brave-search + config: + api_key: ${env.BRAVE_SEARCH_API_KEY:=} +``` + +### Step 3: Launch the Server + +Start your configured Llama Stack server: + +```bash +# Run with specific port +llama stack run {stack-name} --port 8321 + +# Run with environment variables +OLLAMA_URL=http://localhost:11434 llama stack run starter --port 8321 + +# Run in background +nohup llama stack run starter --port 8321 > llama_stack.log 2>&1 & +``` + +### Step 4: Verify Installation + +Test your Llama Stack server: + +#### Basic HTTP Health Checks +```bash +# Check server health +curl http://localhost:8321/health + +# List available models +curl http://localhost:8321/v1/models +``` + +#### Comprehensive Verification (Recommended) +Use the official Llama Stack client for better verification: + +```bash +# List all configured providers (recommended) +uv run --with llama-stack-client llama-stack-client providers list + +# Alternative if you have llama-stack-client installed +llama-stack-client providers list +``` + +#### Test Chat Completion +```bash +# Basic HTTP test +curl -X POST http://localhost:8321/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "llama3.1:8b", + "messages": [{"role": "user", "content": "Hello!"}] + }' + +# Or using the client (more robust) +uv run --with llama-stack-client llama-stack-client inference chat-completion \ + --model llama3.1:8b \ + --message "Hello!" +``` + +## Configuration Management + +### Managing Multiple Stacks + +You can maintain multiple stack configurations: + +```bash +# List all built stacks +llama stack list + +# Remove a stack +llama stack rm {stack-name} + +# Rebuild with updates +llama stack build --distro starter --image-type venv --image-name starter-v2 +``` + +### Common Configuration Issues + +#### Port Conflicts + +If port 8321 is already in use: + +```bash +# Check what's using the port +netstat -tlnp | grep :8321 + +# Use a different port +llama stack run starter --port 8322 +``` + +## Troubleshooting + +### Common Issues + +1. **Docker Permission Denied**: + ```bash + sudo docker run -it \ + -v ~/.llama:/root/.llama \ + --network=host \ + llamastack/distribution-starter + ``` + +2. **Provider Connection Issues**: + - Verify external services (Ollama, APIs) are running + - Check network connectivity and firewall settings + - Validate API keys and URLs + +### Logs and Debugging + +Enable detailed logging: + +```bash +# Run with debug logging +llama stack run starter --port 8321 --log-level DEBUG + +# Check logs in Docker +docker logs +``` + +## Next Steps + +Once your Llama Stack server is running: + +1. **Explore the APIs**: Test inference, safety, and embeddings endpoints +2. **Integrate with Applications**: Use the server with LangChain, custom applications, or API clients +3. **Scale Your Deployment**: Consider load balancing and high-availability setups +4. **Monitor Performance**: Set up logging and monitoring for production use + +For more advanced configurations and production deployments, refer to the [Advanced Configuration Guide](advanced_configuration.md) and [Production Deployment Guide](production_deployment.md). diff --git a/docs/docs/index.mdx b/docs/docs/index.mdx index bed931fe7..422cae01a 100644 --- a/docs/docs/index.mdx +++ b/docs/docs/index.mdx @@ -46,6 +46,7 @@ Llama Stack consists of a server (with multiple pluggable API providers) and Cli ## Quick Links - Ready to build? Check out the [Getting Started Guide](https://llama-stack.github.io/getting_started/quickstart) to get started. +- Need help with setup? See the [Configuration and Launch Guide](./getting_started/configuring_and_launching_llama_stack) for detailed Docker and manual installation instructions. - Want to contribute? See the [Contributing Guide](https://github.com/llamastack/llama-stack/blob/main/CONTRIBUTING.md). - Explore [Example Applications](https://github.com/llamastack/llama-stack-apps) built with Llama Stack.