Merge 0fcd32eb3e into 28bbbcf2c1

2025-10-04 04:04:14 +00:00 · 2025-10-01 10:24:03 -07:00 · 2025-10-01 10:24:03 -07:00 · 0791847a41
commit 0791847a41
parent 28bbbcf2c1 0fcd32eb3e
2 changed files with 304 additions and 0 deletions
--- a/docs/docs/getting_started/configuring_and_launching_llama_stack.md
+++ b/docs/docs/getting_started/configuring_and_launching_llama_stack.md
@ -0,0 +1,303 @@
 # Configuring and Launching Llama Stack
 This guide walks you through the two primary methods for setting up and running Llama Stack: using Docker containers and configuring the server manually.
 ## Method 1: Using the Starter Docker Container
 The easiest way to get started with Llama Stack is using the pre-built Docker container. This approach eliminates the need for manual dependency management and provides a consistent environment across different systems.
 ### Prerequisites
 - Docker installed and running on your system
 - Access to external model providers (e.g., Ollama running locally)
 ### Basic Docker Usage
 Here's an example for spinning up the Llama Stack server using Docker:
 ```bash
 docker run -it \
  -v ~/.llama:/root/.llama \
  --network=host \
  llamastack/distribution-starter \
  --e OLLAMA_URL=http://localhost:11434
 ```
 ### Docker Command Breakdown
 - `-it`: Run in interactive mode with TTY allocation
 - `-v ~/.llama:/root/.llama`: Mount your local Llama Stack configuration directory
 - `--network=host`: Use host networking to access local services like Ollama
 - `llamastack/distribution-starter`: The official Llama Stack Docker image
 - `--e OLLAMA_URL=http://localhost:11434`: Set environment variable for Ollama URL
 ### Advanced Docker Configuration
 You can customize the Docker deployment with additional environment variables:
 ```bash
 docker run -it \
  -v ~/.llama:/root/.llama \
  -p 8321:8321 \
  -e OLLAMA_URL=http://localhost:11434 \
  -e BRAVE_SEARCH_API_KEY=your_api_key_here \
  -e TAVILY_SEARCH_API_KEY=your_api_key_here \
  llamastack/distribution-starter \
  --port 8321
 ```
 ### Environment Variables
 Common environment variables you can set:
 | Variable | Description | Example |
 |----------|-------------|---------|
 | `OLLAMA_URL` | URL for Ollama service | `http://localhost:11434` |
 | `BRAVE_SEARCH_API_KEY` | API key for Brave search | `your_brave_api_key` |
 | `TAVILY_SEARCH_API_KEY` | API key for Tavily search | `your_tavily_api_key` |
 | `TOGETHER_API_KEY` | API key for Together AI | `your_together_api_key` |
 | `OPENAI_API_KEY` | API key for OpenAI | `your_openai_api_key` |
 ## Method 2: Manual Server Configuration and Launch
 For more control over your Llama Stack deployment, you can configure and run the server manually.
 ### Prerequisites
 1. **Install Llama Stack**:
    Using pip:
    ```bash
    pip install llama-stack
    ```
    Using uv (alternative):
    ```bash
    # Initialize a new project (if starting fresh)
    uv init
    # Add llama-stack as a dependency
    uv add llama-stack
    # Note: If using uv, prefix subsequent commands with 'uv run'
    # Example: uv run llama stack build --list-distros
    ```
 ### Step 1: Build a Distribution
 Choose a distro and build your Llama Stack distribution:
 ```bash
 # List available distributions
 llama stack build --list-distros
 # Build with a specific distro
 llama stack build --distro watsonx --image-type venv --image-name watsonx-stack
 # Or build with a meta-reference distro
 llama stack build --distro  meta-reference-gpu --image-type venv --image-name  meta-reference-gpu-stack
 ```
 #### Advanced: Custom Provider Selection (Step 1.a)
 If you know the specific providers you want to use, you can supply them directly on the command-line instead of using a pre-built distribution:
 ```bash
 llama stack build --providers inference=remote::ollama,agents=inline::meta-reference,safety=inline::llama-guard,vector_io=inline::faiss,tool_runtime=inline::rag-runtime --image-type venv --image-name custom-stack
 ```
 **Discover Available Options:**
 ```bash
 # List all available APIs
 llama stack list-apis
 # List all available providers
 llama stack list-providers
 ```
 This approach gives you complete control over which providers are included in your stack, allowing for highly customized configurations tailored to your specific needs.
 ### Select Available Distributions
 - **ci-tests**: CI tests for Llama Stack
 - **dell**: Dell's distribution of Llama Stack. TGI inference via Dell's custom container
 - **meta-reference-gpu**: Use Meta Reference for running LLM inference
 - **nvidia**: Use NVIDIA NIM for running LLM inference, evaluation and safety
 - **open-benchmark**: Distribution for running open benchmarks
 - **postgres-demo**: Quick start template for running Llama Stack with several popular providers
 - **starter**: Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments
 - **starter-gpu**: Quick start template for running Llama Stack with several popular providers. This distribution is intended for GPU-enabled environments
 - **watsonx**: Use watsonx for running LLM inference
 ### Step 2: Configure Your Stack
 After building, you can customize the configuration files:
 #### Configuration File Locations
 - Build config: `~/.llama/distributions/{stack-name}/{stack-name}-build.yaml`
 - Runtime config: `~/.llama/distributions/{stack-name}/{stack-name}-run.yaml`
 #### Sample Runtime Configuration
 ```yaml
 version: 2
 apis:
 - inference
 - safety
 - embeddings
 - tool_runtime
 providers:
  inference:
  - provider_id: ollama
    provider_type: remote::ollama
    config:
      url: http://localhost:11434
  safety:
  - provider_id: llama-guard
    provider_type: remote::ollama
    config:
      url: http://localhost:11434
  embeddings:
  - provider_id: ollama-embeddings
    provider_type: remote::ollama
    config:
      url: http://localhost:11434
  tool_runtime:
  - provider_id: brave-search
    provider_type: remote::brave-search
    config:
      api_key: ${env.BRAVE_SEARCH_API_KEY:=}
 ```
 ### Step 3: Launch the Server
 Start your configured Llama Stack server:
 ```bash
 # Run with specific port
 llama stack run {stack-name} --port 8321
 # Run with environment variables
 OLLAMA_URL=http://localhost:11434 llama stack run starter --port 8321
 # Run in background
 nohup llama stack run starter --port 8321 > llama_stack.log 2>&1 &
 ```
 ### Step 4: Verify Installation
 Test your Llama Stack server:
 #### Basic HTTP Health Checks
 ```bash
 # Check server health
 curl http://localhost:8321/health
 # List available models
 curl http://localhost:8321/v1/models
 ```
 #### Comprehensive Verification (Recommended)
 Use the official Llama Stack client for better verification:
 ```bash
 # List all configured providers (recommended)
 uv run --with llama-stack-client llama-stack-client providers list
 # Alternative if you have llama-stack-client installed
 llama-stack-client providers list
 ```
 #### Test Chat Completion
 ```bash
 # Basic HTTP test
 curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
 # Or using the client (more robust)
 uv run --with llama-stack-client llama-stack-client inference chat-completion \
  --model llama3.1:8b \
  --message "Hello!"
 ```
 ## Configuration Management
 ### Managing Multiple Stacks
 You can maintain multiple stack configurations:
 ```bash
 # List all built stacks
 llama stack list
 # Remove a stack
 llama stack rm {stack-name}
 # Rebuild with updates
 llama stack build --distro starter --image-type venv --image-name starter-v2
 ```
 ### Common Configuration Issues
 #### Port Conflicts
 If port 8321 is already in use:
 ```bash
 # Check what's using the port
 netstat -tlnp | grep :8321
 # Use a different port
 llama stack run starter --port 8322
 ```
 ## Troubleshooting
 ### Common Issues
 1. **Docker Permission Denied**:
   ```bash
   sudo docker run -it \
     -v ~/.llama:/root/.llama \
     --network=host \
     llamastack/distribution-starter
   ```
 2. **Provider Connection Issues**:
   - Verify external services (Ollama, APIs) are running
   - Check network connectivity and firewall settings
   - Validate API keys and URLs
 ### Logs and Debugging
 Enable detailed logging:
 ```bash
 # Run with debug logging
 llama stack run starter --port 8321 --log-level DEBUG
 # Check logs in Docker
 docker logs <container-id>
 ```
 ## Next Steps
 Once your Llama Stack server is running:
 1. **Explore the APIs**: Test inference, safety, and embeddings endpoints
 2. **Integrate with Applications**: Use the server with LangChain, custom applications, or API clients
 3. **Scale Your Deployment**: Consider load balancing and high-availability setups
 4. **Monitor Performance**: Set up logging and monitoring for production use
 For more advanced configurations and production deployments, refer to the [Advanced Configuration Guide](advanced_configuration.md) and [Production Deployment Guide](production_deployment.md).
--- a/docs/docs/index.mdx
+++ b/docs/docs/index.mdx
@ -46,6 +46,7 @@ Llama Stack consists of a server (with multiple pluggable API providers) and Cli
 ## Quick Links
 - Ready to build? Check out the [Getting Started Guide](https://llama-stack.github.io/getting_started/quickstart) to get started.
 - Need help with setup? See the [Configuration and Launch Guide](./getting_started/configuring_and_launching_llama_stack) for detailed Docker and manual installation instructions.
 - Want to contribute? See the [Contributing Guide](https://github.com/llamastack/llama-stack/blob/main/CONTRIBUTING.md).
 - Explore [Example Applications](https://github.com/llamastack/llama-stack-apps) built with Llama Stack.