mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
feat(cli): use gunicorn to manage server workers on unix systems
Implement Gunicorn + Uvicorn deployment strategy for Unix systems to provide multi-process parallelism and high-concurrency async request handling. Key Features: - Platform detection: Uses Gunicorn on Unix (Linux/macOS), falls back to Uvicorn on Windows - Worker management: Auto-calculates workers as (2 * CPU cores) + 1 with env var overrides (GUNICORN_WORKERS, WEB_CONCURRENCY) - Production optimizations: * Worker recycling (--max-requests, --max-requests-jitter) prevents memory leaks * Configurable worker connections (default: 1000 per worker) * Connection keepalive for improved performance * Automatic log level mapping from Python logging to Gunicorn * Optional --preload for memory efficiency (disabled by default) - IPv6 support: Proper bind address formatting for IPv6 addresses - SSL/TLS: Passes through certificate configuration from uvicorn_config - Comprehensive logging: Reports workers, capacity, and configuration details - Graceful fallback: Falls back to Uvicorn if Gunicorn not installed Configuration via Environment Variables: - GUNICORN_WORKERS / WEB_CONCURRENCY: Override worker count - GUNICORN_WORKER_CONNECTIONS: Concurrent connections per worker - GUNICORN_TIMEOUT: Worker timeout (default: 120s for async workers) - GUNICORN_KEEPALIVE: Connection keepalive (default: 5s) - GUNICORN_MAX_REQUESTS: Worker recycling interval (default: 10000) - GUNICORN_MAX_REQUESTS_JITTER: Randomize restart timing (default: 1000) - GUNICORN_PRELOAD: Enable app preloading for production (default: false) Based on best practices from: - DeepWiki analysis of encode/uvicorn and benoitc/gunicorn repositories - Medium article: "Mastering Gunicorn and Uvicorn: The Right Way to Deploy FastAPI Applications" Fixes: - Avoids worker multiplication anti-pattern (nested workers) - Proper IPv6 bind address formatting ([::]:port) - Correct Gunicorn parameter names (--keep-alive vs --keepalive) Dependencies: - Added gunicorn>=23.0.0 to pyproject.toml Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
e809d21357
commit
e72583cd9c
6 changed files with 198 additions and 1 deletions
|
|
@ -44,6 +44,7 @@ dependencies = [
|
|||
"h11>=0.16.0",
|
||||
"python-multipart>=0.0.20", # For fastapi Form
|
||||
"uvicorn>=0.34.0", # server
|
||||
"gunicorn>=23.0.0", # production server for Unix systems
|
||||
"opentelemetry-sdk>=1.30.0", # server
|
||||
"opentelemetry-exporter-otlp-proto-http>=1.30.0", # server
|
||||
"aiosqlite>=0.21.0", # server - for metadata store
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue