llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Roy Belio e72583cd9c feat(cli): use gunicorn to manage server workers on unix systems Implement Gunicorn + Uvicorn deployment strategy for Unix systems to provide multi-process parallelism and high-concurrency async request handling. Key Features: - Platform detection: Uses Gunicorn on Unix (Linux/macOS), falls back to Uvicorn on Windows - Worker management: Auto-calculates workers as (2 * CPU cores) + 1 with env var overrides (GUNICORN_WORKERS, WEB_CONCURRENCY) - Production optimizations: * Worker recycling (--max-requests, --max-requests-jitter) prevents memory leaks * Configurable worker connections (default: 1000 per worker) * Connection keepalive for improved performance * Automatic log level mapping from Python logging to Gunicorn * Optional --preload for memory efficiency (disabled by default) - IPv6 support: Proper bind address formatting for IPv6 addresses - SSL/TLS: Passes through certificate configuration from uvicorn_config - Comprehensive logging: Reports workers, capacity, and configuration details - Graceful fallback: Falls back to Uvicorn if Gunicorn not installed Configuration via Environment Variables: - GUNICORN_WORKERS / WEB_CONCURRENCY: Override worker count - GUNICORN_WORKER_CONNECTIONS: Concurrent connections per worker - GUNICORN_TIMEOUT: Worker timeout (default: 120s for async workers) - GUNICORN_KEEPALIVE: Connection keepalive (default: 5s) - GUNICORN_MAX_REQUESTS: Worker recycling interval (default: 10000) - GUNICORN_MAX_REQUESTS_JITTER: Randomize restart timing (default: 1000) - GUNICORN_PRELOAD: Enable app preloading for production (default: false) Based on best practices from: - DeepWiki analysis of encode/uvicorn and benoitc/gunicorn repositories - Medium article: "Mastering Gunicorn and Uvicorn: The Right Way to Deploy FastAPI Applications" Fixes: - Avoids worker multiplication anti-pattern (nested workers) - Proper IPv6 bind address formatting ([::]:port) - Correct Gunicorn parameter names (--keep-alive vs --keepalive) Dependencies: - Added gunicorn>=23.0.0 to pyproject.toml Co-Authored-By: Claude <noreply@anthropic.com>		2025-10-29 17:09:17 +02:00
..
apis	feat(responses)!: introduce OpenAI compatible prompts to Responses API (#3942 )	2025-10-28 09:31:27 -07:00
cli	feat(cli): use gunicorn to manage server workers on unix systems	2025-10-29 17:09:17 +02:00
core	chore: remove unused methods from InferenceRouter (#3953 )	2025-10-28 17:12:41 -07:00
distributions	docs: add documentation on how to use custom run yaml in docker (#3949 )	2025-10-28 16:05:44 -07:00
models	fix(mypy): resolve provider utility and testing type issues (#3935 )	2025-10-28 10:37:27 -07:00
providers	feat: openai files provider (#3946 )	2025-10-28 16:25:03 -07:00
strong_typing	chore(package): migrate to src/ layout (#3920 )	2025-10-27 12:02:21 -07:00
testing	fix(mypy): add type stubs and fix typing issues (#3938 )	2025-10-28 11:00:09 -07:00
ui	chore(package): migrate to src/ layout (#3920 )	2025-10-27 12:02:21 -07:00
__init__.py	chore(package): migrate to src/ layout (#3920 )	2025-10-27 12:02:21 -07:00
env.py	chore(package): migrate to src/ layout (#3920 )	2025-10-27 12:02:21 -07:00
log.py	chore(package): migrate to src/ layout (#3920 )	2025-10-27 12:02:21 -07:00
schema_utils.py	chore(package): migrate to src/ layout (#3920 )	2025-10-27 12:02:21 -07:00