docs: Add detailed docstrings to API models and update OpenAPI spec (#2889)

This PR focuses on improving the developer experience by adding
comprehensive docstrings to the API data models across the Llama Stack.
These docstrings provide detailed explanations for each model and its
fields, making the API easier to understand and use.

**Key changes:**
- **Added Docstrings:** Added reST formatted docstrings to Pydantic
models in the `llama_stack/apis/` directory. This includes models for:
  - Agents (`agents.py`)
  - Benchmarks (`benchmarks.py`)
  - Datasets (`datasets.py`)
  - Inference (`inference.py`)
  - And many other API modules.
- **OpenAPI Spec Update:** Regenerated the OpenAPI specification
(`docs/_static/llama-stack-spec.yaml` and
`docs/_static/llama-stack-spec.html`) to include the new docstrings.
This will be reflected in the API documentation, providing richer
information to users.

**Impact:**
- Developers using the Llama Stack API will have a better understanding
of the data structures.
- The auto-generated API documentation is now more informative.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
This commit is contained in:
Sai Prashanth S 2025-07-30 16:32:59 -07:00 committed by GitHub
parent cd5c6a2fcd
commit cb7354a9ce
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
28 changed files with 4079 additions and 812 deletions

View file

@ -14,7 +14,15 @@ from llama_stack.schema_utils import json_schema_type, webmethod
class FilteringFunction(Enum):
"""The type of filtering function."""
"""The type of filtering function.
:cvar none: No filtering applied, accept all generated synthetic data
:cvar random: Random sampling of generated data points
:cvar top_k: Keep only the top-k highest scoring synthetic data samples
:cvar top_p: Nucleus-style filtering, keep samples exceeding cumulative score threshold
:cvar top_k_top_p: Combined top-k and top-p filtering strategy
:cvar sigmoid: Apply sigmoid function for probability-based filtering
"""
none = "none"
random = "random"
@ -26,7 +34,12 @@ class FilteringFunction(Enum):
@json_schema_type
class SyntheticDataGenerationRequest(BaseModel):
"""Request to generate synthetic data. A small batch of prompts and a filtering function"""
"""Request to generate synthetic data. A small batch of prompts and a filtering function
:param dialogs: List of conversation messages to use as input for synthetic data generation
:param filtering_function: Type of filtering to apply to generated synthetic data samples
:param model: (Optional) The identifier of the model to use. The model must be registered with Llama Stack and available via the /models endpoint
"""
dialogs: list[Message]
filtering_function: FilteringFunction = FilteringFunction.none
@ -35,7 +48,11 @@ class SyntheticDataGenerationRequest(BaseModel):
@json_schema_type
class SyntheticDataGenerationResponse(BaseModel):
"""Response from the synthetic data generation. Batch of (prompt, response, score) tuples that pass the threshold."""
"""Response from the synthetic data generation. Batch of (prompt, response, score) tuples that pass the threshold.
:param synthetic_data: List of generated synthetic data samples that passed the filtering criteria
:param statistics: (Optional) Statistical information about the generation process and filtering results
"""
synthetic_data: list[dict[str, Any]]
statistics: dict[str, Any] | None = None
@ -48,4 +65,12 @@ class SyntheticDataGeneration(Protocol):
dialogs: list[Message],
filtering_function: FilteringFunction = FilteringFunction.none,
model: str | None = None,
) -> SyntheticDataGenerationResponse: ...
) -> SyntheticDataGenerationResponse:
"""Generate synthetic data based on input dialogs and apply filtering.
:param dialogs: List of conversation messages to use as input for synthetic data generation
:param filtering_function: Type of filtering to apply to generated synthetic data samples
:param model: (Optional) The identifier of the model to use. The model must be registered with Llama Stack and available via the /models endpoint
:returns: Response containing filtered synthetic data samples and optional statistics
"""
...