llama-stack-mirror/docs/docs
Roy Belio e72583cd9c feat(cli): use gunicorn to manage server workers on unix systems
Implement Gunicorn + Uvicorn deployment strategy for Unix systems to provide
multi-process parallelism and high-concurrency async request handling.

Key Features:
- Platform detection: Uses Gunicorn on Unix (Linux/macOS), falls back to
  Uvicorn on Windows
- Worker management: Auto-calculates workers as (2 * CPU cores) + 1 with
  env var overrides (GUNICORN_WORKERS, WEB_CONCURRENCY)
- Production optimizations:
  * Worker recycling (--max-requests, --max-requests-jitter) prevents memory leaks
  * Configurable worker connections (default: 1000 per worker)
  * Connection keepalive for improved performance
  * Automatic log level mapping from Python logging to Gunicorn
  * Optional --preload for memory efficiency (disabled by default)
- IPv6 support: Proper bind address formatting for IPv6 addresses
- SSL/TLS: Passes through certificate configuration from uvicorn_config
- Comprehensive logging: Reports workers, capacity, and configuration details
- Graceful fallback: Falls back to Uvicorn if Gunicorn not installed

Configuration via Environment Variables:
- GUNICORN_WORKERS / WEB_CONCURRENCY: Override worker count
- GUNICORN_WORKER_CONNECTIONS: Concurrent connections per worker
- GUNICORN_TIMEOUT: Worker timeout (default: 120s for async workers)
- GUNICORN_KEEPALIVE: Connection keepalive (default: 5s)
- GUNICORN_MAX_REQUESTS: Worker recycling interval (default: 10000)
- GUNICORN_MAX_REQUESTS_JITTER: Randomize restart timing (default: 1000)
- GUNICORN_PRELOAD: Enable app preloading for production (default: false)

Based on best practices from:
- DeepWiki analysis of encode/uvicorn and benoitc/gunicorn repositories
- Medium article: "Mastering Gunicorn and Uvicorn: The Right Way to Deploy
  FastAPI Applications"

Fixes:
- Avoids worker multiplication anti-pattern (nested workers)
- Proper IPv6 bind address formatting ([::]:port)
- Correct Gunicorn parameter names (--keep-alive vs --keepalive)

Dependencies:
- Added gunicorn>=23.0.0 to pyproject.toml

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-29 17:09:17 +02:00
..
advanced_apis chore: update doc (#3857) 2025-10-20 10:33:21 -07:00
building_applications chore: update docs for telemetry api removal (#3900) 2025-10-24 13:57:28 -07:00
concepts chore: update docs for telemetry api removal (#3900) 2025-10-24 13:57:28 -07:00
contributing feat: Add static file import system for docs (#3882) 2025-10-24 14:01:33 -04:00
deploying chore: use uvicorn to start llama stack server everywhere (#3625) 2025-10-06 14:27:40 +02:00
distributions feat(cli): use gunicorn to manage server workers on unix systems 2025-10-29 17:09:17 +02:00
getting_started feat: Add static file import system for docs (#3882) 2025-10-24 14:01:33 -04:00
providers feat: openai files provider (#3946) 2025-10-28 16:25:03 -07:00
references chore: update docs for telemetry api removal (#3900) 2025-10-24 13:57:28 -07:00
api-overview.md docs: api separation (#3630) 2025-10-01 10:13:31 -07:00
index.mdx chore: update docs for telemetry api removal (#3900) 2025-10-24 13:57:28 -07:00