llama-stack-mirror/src/llama_stack
Roy Belio e72583cd9c feat(cli): use gunicorn to manage server workers on unix systems
Implement Gunicorn + Uvicorn deployment strategy for Unix systems to provide
multi-process parallelism and high-concurrency async request handling.

Key Features:
- Platform detection: Uses Gunicorn on Unix (Linux/macOS), falls back to
  Uvicorn on Windows
- Worker management: Auto-calculates workers as (2 * CPU cores) + 1 with
  env var overrides (GUNICORN_WORKERS, WEB_CONCURRENCY)
- Production optimizations:
  * Worker recycling (--max-requests, --max-requests-jitter) prevents memory leaks
  * Configurable worker connections (default: 1000 per worker)
  * Connection keepalive for improved performance
  * Automatic log level mapping from Python logging to Gunicorn
  * Optional --preload for memory efficiency (disabled by default)
- IPv6 support: Proper bind address formatting for IPv6 addresses
- SSL/TLS: Passes through certificate configuration from uvicorn_config
- Comprehensive logging: Reports workers, capacity, and configuration details
- Graceful fallback: Falls back to Uvicorn if Gunicorn not installed

Configuration via Environment Variables:
- GUNICORN_WORKERS / WEB_CONCURRENCY: Override worker count
- GUNICORN_WORKER_CONNECTIONS: Concurrent connections per worker
- GUNICORN_TIMEOUT: Worker timeout (default: 120s for async workers)
- GUNICORN_KEEPALIVE: Connection keepalive (default: 5s)
- GUNICORN_MAX_REQUESTS: Worker recycling interval (default: 10000)
- GUNICORN_MAX_REQUESTS_JITTER: Randomize restart timing (default: 1000)
- GUNICORN_PRELOAD: Enable app preloading for production (default: false)

Based on best practices from:
- DeepWiki analysis of encode/uvicorn and benoitc/gunicorn repositories
- Medium article: "Mastering Gunicorn and Uvicorn: The Right Way to Deploy
  FastAPI Applications"

Fixes:
- Avoids worker multiplication anti-pattern (nested workers)
- Proper IPv6 bind address formatting ([::]:port)
- Correct Gunicorn parameter names (--keep-alive vs --keepalive)

Dependencies:
- Added gunicorn>=23.0.0 to pyproject.toml

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-29 17:09:17 +02:00
..
apis feat(responses)!: introduce OpenAI compatible prompts to Responses API (#3942) 2025-10-28 09:31:27 -07:00
cli feat(cli): use gunicorn to manage server workers on unix systems 2025-10-29 17:09:17 +02:00
core chore: remove unused methods from InferenceRouter (#3953) 2025-10-28 17:12:41 -07:00
distributions docs: add documentation on how to use custom run yaml in docker (#3949) 2025-10-28 16:05:44 -07:00
models fix(mypy): resolve provider utility and testing type issues (#3935) 2025-10-28 10:37:27 -07:00
providers feat: openai files provider (#3946) 2025-10-28 16:25:03 -07:00
strong_typing chore(package): migrate to src/ layout (#3920) 2025-10-27 12:02:21 -07:00
testing fix(mypy): add type stubs and fix typing issues (#3938) 2025-10-28 11:00:09 -07:00
ui chore(package): migrate to src/ layout (#3920) 2025-10-27 12:02:21 -07:00
__init__.py chore(package): migrate to src/ layout (#3920) 2025-10-27 12:02:21 -07:00
env.py chore(package): migrate to src/ layout (#3920) 2025-10-27 12:02:21 -07:00
log.py chore(package): migrate to src/ layout (#3920) 2025-10-27 12:02:21 -07:00
schema_utils.py chore(package): migrate to src/ layout (#3920) 2025-10-27 12:02:21 -07:00