mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
refactor: address PR feedback - improve naming, error handling, and documentation
Address all feedback from PR #3962: **Code Quality Improvements:** - Rename `_uvicorn_run` → `_run_server` for accurate method naming - Refactor error handling: move Gunicorn fallback logic from `_run_with_gunicorn` to caller - Update comments to reflect both Uvicorn and Gunicorn behavior - Update test mock from `_uvicorn_run` to `_run_server` **Environment Variable:** - Change `LLAMA_STACK_DISABLE_GUNICORN` → `LLAMA_STACK_ENABLE_GUNICORN` - More intuitive positive logic (no double negatives) - Defaults to `true` on Unix systems - Clearer log messages distinguishing platform limitations vs explicit disable **Documentation:** - Remove unnecessary `uv sync --group unit --group test` from user docs - Clarify SQLite limitations: "SQLite only allows one writer at a time" - Accurate explanation: WAL mode enables concurrent reads but writes are serialized - Strong recommendation for PostgreSQL in production with high traffic **Architecture:** - Better separation of concerns: `_run_with_gunicorn` just executes, caller handles fallback - Exceptions propagate to caller for centralized decision making 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
9ff881a28a
commit
241e189fee
10 changed files with 75 additions and 63 deletions
|
|
@ -90,7 +90,8 @@ On Unix-based systems (Linux, macOS), the server automatically uses Gunicorn wit
|
|||
**Important Notes**:
|
||||
|
||||
- On Windows, the server automatically falls back to single-process Uvicorn.
|
||||
- **Database Race Condition**: When using multiple workers without `GUNICORN_PRELOAD=true`, you may encounter database initialization race conditions (e.g., "table already exists" errors) as multiple workers simultaneously attempt to create database tables. To avoid this issue in production, set `GUNICORN_PRELOAD=true` and ensure all dependencies are installed with `uv sync --group unit --group test`.
|
||||
- **Database Race Condition**: When using multiple workers without `GUNICORN_PRELOAD=true`, you may encounter database initialization race conditions (e.g., "table already exists" errors) as multiple workers simultaneously attempt to create database tables. To avoid this issue in production, set `GUNICORN_PRELOAD=true`.
|
||||
- **SQLite with Multiple Workers**: SQLite works with Gunicorn's multi-process mode for development and low-to-moderate traffic scenarios. The system automatically enables WAL (Write-Ahead Logging) mode and sets a 5-second busy timeout. However, **SQLite only allows one writer at a time** - even with WAL mode, write operations from multiple workers are serialized, causing workers to wait for database locks under concurrent write load. **For production deployments with high traffic or multiple workers, we strongly recommend using PostgreSQL or another production-grade database** for true concurrent write performance.
|
||||
|
||||
**Example production configuration:**
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -44,7 +44,9 @@ Configure Gunicorn behavior using environment variables:
|
|||
- `GUNICORN_MAX_REQUESTS_JITTER`: Randomize worker restart timing (default: `1000`)
|
||||
- `GUNICORN_PRELOAD`: Preload app before forking workers for memory efficiency (default: `true`, as set in `run.py` line 264)
|
||||
|
||||
**Important**: When using multiple workers without `GUNICORN_PRELOAD=true`, you may encounter database initialization race conditions. To avoid this, set `GUNICORN_PRELOAD=true` and install all dependencies with `uv sync --group unit --group test`.
|
||||
**Important Notes**:
|
||||
- When using multiple workers without `GUNICORN_PRELOAD=true`, you may encounter database initialization race conditions. To avoid this, set `GUNICORN_PRELOAD=true`.
|
||||
- **SQLite with Multiple Workers**: SQLite works with Gunicorn's multi-process mode for development and low-to-moderate traffic scenarios. The system automatically enables WAL (Write-Ahead Logging) mode and sets a 5-second busy timeout. However, **SQLite only allows one writer at a time** - even with WAL mode, write operations from multiple workers are serialized, causing workers to wait for database locks under concurrent write load. **For production deployments with high traffic or multiple workers, we strongly recommend using PostgreSQL or another production-grade database** for true concurrent write performance.
|
||||
|
||||
**Example production configuration:**
|
||||
```bash
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
description: "Agents
|
||||
|
||||
APIs for creating and interacting with agentic systems."
|
||||
APIs for creating and interacting with agentic systems."
|
||||
sidebar_label: Agents
|
||||
title: Agents
|
||||
---
|
||||
|
|
@ -12,6 +12,6 @@ title: Agents
|
|||
|
||||
Agents
|
||||
|
||||
APIs for creating and interacting with agentic systems.
|
||||
APIs for creating and interacting with agentic systems.
|
||||
|
||||
This section contains documentation for all available providers for the **agents** API.
|
||||
|
|
|
|||
|
|
@ -1,14 +1,14 @@
|
|||
---
|
||||
description: "The Batches API enables efficient processing of multiple requests in a single operation,
|
||||
particularly useful for processing large datasets, batch evaluation workflows, and
|
||||
cost-effective inference at scale.
|
||||
particularly useful for processing large datasets, batch evaluation workflows, and
|
||||
cost-effective inference at scale.
|
||||
|
||||
The API is designed to allow use of openai client libraries for seamless integration.
|
||||
The API is designed to allow use of openai client libraries for seamless integration.
|
||||
|
||||
This API provides the following extensions:
|
||||
- idempotent batch creation
|
||||
This API provides the following extensions:
|
||||
- idempotent batch creation
|
||||
|
||||
Note: This API is currently under active development and may undergo changes."
|
||||
Note: This API is currently under active development and may undergo changes."
|
||||
sidebar_label: Batches
|
||||
title: Batches
|
||||
---
|
||||
|
|
@ -18,14 +18,14 @@ title: Batches
|
|||
## Overview
|
||||
|
||||
The Batches API enables efficient processing of multiple requests in a single operation,
|
||||
particularly useful for processing large datasets, batch evaluation workflows, and
|
||||
cost-effective inference at scale.
|
||||
particularly useful for processing large datasets, batch evaluation workflows, and
|
||||
cost-effective inference at scale.
|
||||
|
||||
The API is designed to allow use of openai client libraries for seamless integration.
|
||||
The API is designed to allow use of openai client libraries for seamless integration.
|
||||
|
||||
This API provides the following extensions:
|
||||
- idempotent batch creation
|
||||
This API provides the following extensions:
|
||||
- idempotent batch creation
|
||||
|
||||
Note: This API is currently under active development and may undergo changes.
|
||||
Note: This API is currently under active development and may undergo changes.
|
||||
|
||||
This section contains documentation for all available providers for the **batches** API.
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
description: "Evaluations
|
||||
|
||||
Llama Stack Evaluation API for running evaluations on model and agent candidates."
|
||||
Llama Stack Evaluation API for running evaluations on model and agent candidates."
|
||||
sidebar_label: Eval
|
||||
title: Eval
|
||||
---
|
||||
|
|
@ -12,6 +12,6 @@ title: Eval
|
|||
|
||||
Evaluations
|
||||
|
||||
Llama Stack Evaluation API for running evaluations on model and agent candidates.
|
||||
Llama Stack Evaluation API for running evaluations on model and agent candidates.
|
||||
|
||||
This section contains documentation for all available providers for the **eval** API.
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
description: "Files
|
||||
|
||||
This API is used to upload documents that can be used with other Llama Stack APIs."
|
||||
This API is used to upload documents that can be used with other Llama Stack APIs."
|
||||
sidebar_label: Files
|
||||
title: Files
|
||||
---
|
||||
|
|
@ -12,6 +12,6 @@ title: Files
|
|||
|
||||
Files
|
||||
|
||||
This API is used to upload documents that can be used with other Llama Stack APIs.
|
||||
This API is used to upload documents that can be used with other Llama Stack APIs.
|
||||
|
||||
This section contains documentation for all available providers for the **files** API.
|
||||
|
|
|
|||
|
|
@ -1,12 +1,12 @@
|
|||
---
|
||||
description: "Inference
|
||||
|
||||
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
||||
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
||||
|
||||
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
||||
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
|
||||
- Embedding models: these models generate embeddings to be used for semantic search.
|
||||
- Rerank models: these models reorder the documents based on their relevance to a query."
|
||||
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
||||
- LLM models: these models generate \"raw\" and \"chat\" (conversational) completions.
|
||||
- Embedding models: these models generate embeddings to be used for semantic search.
|
||||
- Rerank models: these models reorder the documents based on their relevance to a query."
|
||||
sidebar_label: Inference
|
||||
title: Inference
|
||||
---
|
||||
|
|
@ -17,11 +17,11 @@ title: Inference
|
|||
|
||||
Inference
|
||||
|
||||
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
||||
Llama Stack Inference API for generating completions, chat completions, and embeddings.
|
||||
|
||||
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
||||
- LLM models: these models generate "raw" and "chat" (conversational) completions.
|
||||
- Embedding models: these models generate embeddings to be used for semantic search.
|
||||
- Rerank models: these models reorder the documents based on their relevance to a query.
|
||||
This API provides the raw interface to the underlying models. Three kinds of models are supported:
|
||||
- LLM models: these models generate "raw" and "chat" (conversational) completions.
|
||||
- Embedding models: these models generate embeddings to be used for semantic search.
|
||||
- Rerank models: these models reorder the documents based on their relevance to a query.
|
||||
|
||||
This section contains documentation for all available providers for the **inference** API.
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
description: "Safety
|
||||
|
||||
OpenAI-compatible Moderations API."
|
||||
OpenAI-compatible Moderations API."
|
||||
sidebar_label: Safety
|
||||
title: Safety
|
||||
---
|
||||
|
|
@ -12,6 +12,6 @@ title: Safety
|
|||
|
||||
Safety
|
||||
|
||||
OpenAI-compatible Moderations API.
|
||||
OpenAI-compatible Moderations API.
|
||||
|
||||
This section contains documentation for all available providers for the **safety** API.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue