Merge remote-tracking branch 'origin/main' into resp_branching

This commit is contained in:
Ashwin Bharambe 2025-10-01 21:13:12 -07:00
commit 1536ae0333
144 changed files with 62682 additions and 51560 deletions

49
docs/docs/api-overview.md Normal file
View file

@ -0,0 +1,49 @@
# API Reference Overview
The Llama Stack provides a comprehensive set of APIs organized by stability level to help you choose the right endpoints for your use case.
## 🟢 Stable APIs
**Production-ready APIs with backward compatibility guarantees.**
These APIs are fully tested, documented, and stable. They follow semantic versioning principles and maintain backward compatibility within major versions. Recommended for production applications.
[**Browse Stable APIs →**](./api/llama-stack-specification)
**Key Features:**
- ✅ Backward compatibility guaranteed
- ✅ Comprehensive testing and validation
- ✅ Production-ready reliability
- ✅ Long-term support
---
## 🟡 Experimental APIs
**Preview APIs that may change before becoming stable.**
These APIs include v1alpha and v1beta endpoints that are feature-complete but may undergo changes based on feedback. Great for exploring new capabilities and providing feedback.
[**Browse Experimental APIs →**](./api-experimental/llama-stack-specification-experimental-apis)
**Key Features:**
- 🧪 Latest features and capabilities
- 🧪 May change based on user feedback
- 🧪 Active development and iteration
- 🧪 Opportunity to influence final design
---
## 🔴 Deprecated APIs
**Legacy APIs for migration reference.**
These APIs are deprecated and will be removed in future versions. They are provided for migration purposes and to help transition to newer, stable alternatives.
[**Browse Deprecated APIs →**](./api-deprecated/llama-stack-specification-deprecated-apis)
**Key Features:**
- ⚠️ Will be removed in future versions
- ⚠️ Migration guidance provided
- ⚠️ Use for compatibility during transition
- ⚠️ Not recommended for new projects

View file

@ -181,7 +181,7 @@ Once defined, simply pass the tool to the agent config. `Agent` will take care o
agent = Agent(client, ..., tools=[my_tool])
```
Refer to [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py) for an example of how to use client provided tools.
Refer to [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/) for an example of how to use client provided tools.
## Tool Invocation

View file

@ -131,4 +131,4 @@ graph TD
3. **Configure your providers** with API keys or local models
4. **Start building** with Llama Stack!
For help choosing or troubleshooting, check our [Getting Started Guide](/docs/getting_started/quickstart) or [Community Support](https://github.com/llama-stack/llama-stack/discussions).
For help choosing or troubleshooting, check our [Getting Started Guide](/docs/getting_started/quickstart) or [Community Support](https://github.com/llamastack/llama-stack/discussions).

View file

@ -45,7 +45,7 @@ Llama Stack consists of a server (with multiple pluggable API providers) and Cli
## Quick Links
- Ready to build? Check out the [Getting Started Guide](https://llama-stack.github.io/getting_started/quickstart) to get started.
- Ready to build? Check out the [Getting Started Guide](/docs/getting_started/quickstart) to get started.
- Want to contribute? See the [Contributing Guide](https://github.com/llamastack/llama-stack/blob/main/CONTRIBUTING.md).
- Explore [Example Applications](https://github.com/llamastack/llama-stack-apps) built with Llama Stack.
@ -59,13 +59,13 @@ Llama Stack provides adapters for popular providers across all API categories:
- **Training & Evaluation**: HuggingFace, TorchTune, NVIDIA NEMO
:::info Provider Details
For complete provider compatibility and setup instructions, see our [Providers Documentation](https://llamastack.github.io/providers/).
For complete provider compatibility and setup instructions, see our [Providers Documentation](https://llamastack.github.io/docs/providers/).
:::
## Get Started Today
<div style={{display: 'flex', gap: '1rem', flexWrap: 'wrap', margin: '2rem 0'}}>
<a href="https://llama-stack.github.io/getting_started/quickstart"
<a href="/docs/getting_started/quickstart"
style={{
background: 'var(--ifm-color-primary)',
color: 'white',

View file

@ -1,12 +1,7 @@
---
description: "Agents API for creating and interacting with agentic systems.
description: "Agents
Main functionalities provided by this API:
- Create agents with specific instructions and ability to use tools.
- Interactions with agents are grouped into sessions (\"threads\"), and each interaction is called a \"turn\".
- Agents can be provided with various tools (see the ToolGroups and ToolRuntime APIs for more details).
- Agents can be provided with various shields (see the Safety API for more details).
- Agents can also use Memory to retrieve information from knowledge bases. See the RAG Tool and Vector IO APIs for more details."
APIs for creating and interacting with agentic systems."
sidebar_label: Agents
title: Agents
---
@ -15,13 +10,8 @@ title: Agents
## Overview
Agents API for creating and interacting with agentic systems.
Agents
Main functionalities provided by this API:
- Create agents with specific instructions and ability to use tools.
- Interactions with agents are grouped into sessions ("threads"), and each interaction is called a "turn".
- Agents can be provided with various tools (see the ToolGroups and ToolRuntime APIs for more details).
- Agents can be provided with various shields (see the Safety API for more details).
- Agents can also use Memory to retrieve information from knowledge bases. See the RAG Tool and Vector IO APIs for more details.
APIs for creating and interacting with agentic systems.
This section contains documentation for all available providers for the **agents** API.

View file

@ -26,9 +26,6 @@ const config: Config = {
{
docs: {
sidebarPath: require.resolve("./sidebars.ts"),
// Please change this to your repo.
// Remove this to remove the "edit this page" links.
editUrl: 'https://github.com/meta-llama/llama-stack/tree/main/docs/',
docItemComponent: "@theme/ApiItem", // Derived from docusaurus-theme-openapi
},
blog: false,
@ -55,10 +52,27 @@ const config: Config = {
label: 'Docs',
},
{
type: 'docSidebar',
sidebarId: 'apiSidebar',
position: 'left',
type: 'dropdown',
label: 'API Reference',
position: 'left',
to: '/docs/api-overview',
items: [
{
type: 'docSidebar',
sidebarId: 'stableApiSidebar',
label: '🟢 Stable APIs',
},
{
type: 'docSidebar',
sidebarId: 'experimentalApiSidebar',
label: '🟡 Experimental APIs',
},
{
type: 'docSidebar',
sidebarId: 'deprecatedApiSidebar',
label: '🔴 Deprecated APIs',
},
],
},
{
href: 'https://github.com/llamastack/llama-stack',
@ -83,7 +97,7 @@ const config: Config = {
},
{
label: 'API Reference',
to: '/docs/api/llama-stack-specification',
to: '/docs/api-overview',
},
],
},
@ -170,7 +184,7 @@ const config: Config = {
id: "openapi",
docsPluginId: "classic",
config: {
llamastack: {
stable: {
specPath: "static/llama-stack-spec.yaml",
outputDir: "docs/api",
downloadUrl: "https://raw.githubusercontent.com/meta-llama/llama-stack/main/docs/static/llama-stack-spec.yaml",
@ -179,6 +193,24 @@ const config: Config = {
categoryLinkSource: "tag",
},
} satisfies OpenApiPlugin.Options,
experimental: {
specPath: "static/experimental-llama-stack-spec.yaml",
outputDir: "docs/api-experimental",
downloadUrl: "https://raw.githubusercontent.com/meta-llama/llama-stack/main/docs/static/experimental-llama-stack-spec.yaml",
sidebarOptions: {
groupPathsBy: "tag",
categoryLinkSource: "tag",
},
} satisfies OpenApiPlugin.Options,
deprecated: {
specPath: "static/deprecated-llama-stack-spec.yaml",
outputDir: "docs/api-deprecated",
downloadUrl: "https://raw.githubusercontent.com/meta-llama/llama-stack/main/docs/static/deprecated-llama-stack-spec.yaml",
sidebarOptions: {
groupPathsBy: "tag",
categoryLinkSource: "tag",
},
} satisfies OpenApiPlugin.Options,
} satisfies Plugin.PluginOptions,
},
],

View file

@ -34,40 +34,52 @@ def str_presenter(dumper, data):
return dumper.represent_scalar("tag:yaml.org,2002:str", data, style=style)
def main(output_dir: str):
output_dir = Path(output_dir)
if not output_dir.exists():
raise ValueError(f"Directory {output_dir} does not exist")
def generate_spec(output_dir: Path, stability_filter: str = None, main_spec: bool = False):
"""Generate OpenAPI spec with optional stability filtering."""
# Validate API protocols before generating spec
return_type_errors = validate_api()
if return_type_errors:
print("\nAPI Method Return Type Validation Errors:\n")
for error in return_type_errors:
print(error, file=sys.stderr)
sys.exit(1)
now = str(datetime.now())
print(
"Converting the spec to YAML (openapi.yaml) and HTML (openapi.html) at " + now
)
print("")
if stability_filter:
title_suffix = {
"stable": " - Stable APIs" if not main_spec else "",
"experimental": " - Experimental APIs",
"deprecated": " - Deprecated APIs"
}.get(stability_filter, f" - {stability_filter.title()} APIs")
# Use main spec filename for stable when main_spec=True
if main_spec and stability_filter == "stable":
filename_prefix = ""
else:
filename_prefix = f"{stability_filter}-"
description_suffix = {
"stable": "\n\n**✅ STABLE**: Production-ready APIs with backward compatibility guarantees.",
"experimental": "\n\n**🧪 EXPERIMENTAL**: Pre-release APIs (v1alpha, v1beta) that may change before becoming stable.",
"deprecated": "\n\n**⚠️ DEPRECATED**: Legacy APIs that may be removed in future versions. Use for migration reference only."
}.get(stability_filter, "")
else:
title_suffix = ""
filename_prefix = ""
description_suffix = ""
spec = Specification(
LlamaStack,
Options(
server=Server(url="http://any-hosted-llama-stack.com"),
info=Info(
title="Llama Stack Specification",
title=f"Llama Stack Specification{title_suffix}",
version=LLAMA_STACK_API_V1,
description="""This is the specification of the Llama Stack that provides
description=f"""This is the specification of the Llama Stack that provides
a set of endpoints and their corresponding interfaces that are tailored to
best leverage Llama Models.""",
best leverage Llama Models.{description_suffix}""",
),
include_standard_error_responses=True,
stability_filter=stability_filter, # Pass the filter to the generator
),
)
with open(output_dir / "llama-stack-spec.yaml", "w", encoding="utf-8") as fp:
yaml_filename = f"{filename_prefix}llama-stack-spec.yaml"
html_filename = f"{filename_prefix}llama-stack-spec.html"
with open(output_dir / yaml_filename, "w", encoding="utf-8") as fp:
y = yaml.YAML()
y.default_flow_style = False
y.block_seq_indent = 2
@ -83,9 +95,36 @@ def main(output_dir: str):
fp,
)
with open(output_dir / "llama-stack-spec.html", "w") as fp:
with open(output_dir / html_filename, "w") as fp:
spec.write_html(fp, pretty_print=True)
print(f"Generated {yaml_filename} and {html_filename}")
def main(output_dir: str):
output_dir = Path(output_dir)
if not output_dir.exists():
raise ValueError(f"Directory {output_dir} does not exist")
# Validate API protocols before generating spec
return_type_errors = validate_api()
if return_type_errors:
print("\nAPI Method Return Type Validation Errors:\n")
for error in return_type_errors:
print(error, file=sys.stderr)
sys.exit(1)
now = str(datetime.now())
print(f"Converting the spec to YAML (openapi.yaml) and HTML (openapi.html) at {now}")
print("")
# Generate main spec as stable APIs (llama-stack-spec.yaml)
print("Generating main specification (stable APIs)...")
generate_spec(output_dir, "stable", main_spec=True)
print("Generating other stability-filtered specifications...")
generate_spec(output_dir, "experimental")
generate_spec(output_dir, "deprecated")
if __name__ == "__main__":
fire.Fire(main)

View file

@ -7,13 +7,14 @@
import hashlib
import inspect
import ipaddress
import os
import types
import typing
from dataclasses import make_dataclass
from pathlib import Path
from typing import Annotated, Any, Dict, get_args, get_origin, Set, Union
from fastapi import UploadFile
from pydantic import BaseModel
from llama_stack.apis.datatypes import Error
from llama_stack.strong_typing.core import JsonType
@ -35,6 +36,7 @@ from llama_stack.strong_typing.schema import (
SchemaOptions,
)
from llama_stack.strong_typing.serialization import json_dump_string, object_to_json
from pydantic import BaseModel
from .operations import (
EndpointOperation,
@ -546,6 +548,84 @@ class Generator:
return extra_tags
def _get_api_group_for_operation(self, op) -> str | None:
"""
Determine the API group for an operation based on its route path.
Args:
op: The endpoint operation
Returns:
The API group name derived from the route, or None if unable to determine
"""
if not hasattr(op, 'webmethod') or not op.webmethod or not hasattr(op.webmethod, 'route'):
return None
route = op.webmethod.route
if not route or not route.startswith('/'):
return None
# Extract API group from route path
# Examples: /v1/agents/list -> agents-api
# /v1/responses -> responses-api
# /v1/models -> models-api
path_parts = route.strip('/').split('/')
if len(path_parts) < 2:
return None
# Skip version prefix (v1, v1alpha, v1beta, etc.)
if path_parts[0].startswith('v1'):
if len(path_parts) < 2:
return None
api_segment = path_parts[1]
else:
api_segment = path_parts[0]
# Convert to supplementary file naming convention
# agents -> agents-api, responses -> responses-api, etc.
return f"{api_segment}-api"
def _load_supplemental_content(self, api_group: str | None) -> str:
"""
Load supplemental content for an API group based on stability level.
Follows this resolution order:
1. docs/supplementary/{stability}/{api_group}.md
2. docs/supplementary/shared/{api_group}.md (fallback)
3. Empty string if no files found
Args:
api_group: The API group name (e.g., "agents-responses-api"), or None if no mapping exists
Returns:
The supplemental content as markdown string, or empty string if not found
"""
if not api_group:
return ""
base_path = Path(__file__).parent.parent.parent / "supplementary"
# Try stability-specific content first if stability filter is set
if self.options.stability_filter:
stability_path = base_path / self.options.stability_filter / f"{api_group}.md"
if stability_path.exists():
try:
return stability_path.read_text(encoding="utf-8")
except Exception as e:
print(f"Warning: Could not read stability-specific supplemental content from {stability_path}: {e}")
# Fall back to shared content
shared_path = base_path / "shared" / f"{api_group}.md"
if shared_path.exists():
try:
return shared_path.read_text(encoding="utf-8")
except Exception as e:
print(f"Warning: Could not read shared supplemental content from {shared_path}: {e}")
# No supplemental content found
return ""
def _build_operation(self, op: EndpointOperation) -> Operation:
if op.defining_class.__name__ in [
"SyntheticDataGeneration",
@ -797,10 +877,14 @@ class Generator:
else:
callbacks = None
description = "\n".join(
# Build base description from docstring
base_description = "\n".join(
filter(None, [doc_string.short_description, doc_string.long_description])
)
# Individual endpoints get clean descriptions only
description = base_description
return Operation(
tags=[
getattr(op.defining_class, "API_NAMESPACE", op.defining_class.__name__)
@ -811,16 +895,121 @@ class Generator:
requestBody=requestBody,
responses=responses,
callbacks=callbacks,
deprecated=True if "DEPRECATED" in op.func_name else None,
deprecated=getattr(op.webmethod, "deprecated", False)
or "DEPRECATED" in op.func_name,
security=[] if op.public else None,
)
def _get_api_stability_priority(self, api_level: str) -> int:
"""
Return sorting priority for API stability levels.
Lower numbers = higher priority (appear first)
:param api_level: The API level (e.g., "v1", "v1beta", "v1alpha")
:return: Priority number for sorting
"""
stability_order = {
"v1": 0, # Stable - highest priority
"v1beta": 1, # Beta - medium priority
"v1alpha": 2, # Alpha - lowest priority
}
return stability_order.get(api_level, 999) # Unknown levels go last
def generate(self) -> Document:
paths: Dict[str, PathItem] = {}
endpoint_classes: Set[type] = set()
for op in get_endpoint_operations(
self.endpoint, use_examples=self.options.use_examples
):
# Collect all operations and filter by stability if specified
operations = list(
get_endpoint_operations(
self.endpoint, use_examples=self.options.use_examples
)
)
# Filter operations by stability level if requested
if self.options.stability_filter:
filtered_operations = []
for op in operations:
deprecated = (
getattr(op.webmethod, "deprecated", False)
or "DEPRECATED" in op.func_name
)
stability_level = op.webmethod.level
if self.options.stability_filter == "stable":
# Include v1 non-deprecated endpoints
if stability_level == "v1" and not deprecated:
filtered_operations.append(op)
elif self.options.stability_filter == "experimental":
# Include v1alpha and v1beta endpoints (deprecated or not)
if stability_level in ["v1alpha", "v1beta"]:
filtered_operations.append(op)
elif self.options.stability_filter == "deprecated":
# Include only deprecated endpoints
if deprecated:
filtered_operations.append(op)
operations = filtered_operations
print(
f"Filtered to {len(operations)} operations for stability level: {self.options.stability_filter}"
)
# Sort operations by multiple criteria for consistent ordering:
# 1. Stability level with deprecation handling (global priority):
# - Active stable (v1) comes first
# - Beta (v1beta) comes next
# - Alpha (v1alpha) comes next
# - Deprecated stable (v1 deprecated) comes last
# 2. Route path (group related endpoints within same stability level)
# 3. HTTP method (GET, POST, PUT, DELETE, PATCH)
# 4. Operation name (alphabetical)
def sort_key(op):
http_method_order = {
HTTPMethod.GET: 0,
HTTPMethod.POST: 1,
HTTPMethod.PUT: 2,
HTTPMethod.DELETE: 3,
HTTPMethod.PATCH: 4,
}
# Enhanced stability priority for migration pattern support
deprecated = getattr(op.webmethod, "deprecated", False)
stability_priority = self._get_api_stability_priority(op.webmethod.level)
# Deprecated versions should appear after everything else
# This ensures deprecated stable endpoints come last globally
if deprecated:
stability_priority += 10 # Push deprecated endpoints to the end
return (
stability_priority, # Global stability handling comes first
op.get_route(
op.webmethod
), # Group by route path within stability level
http_method_order.get(op.http_method, 999),
op.func_name,
)
operations.sort(key=sort_key)
# Debug output for migration pattern tracking
migration_routes = {}
for op in operations:
route_key = (op.get_route(op.webmethod), op.http_method)
if route_key not in migration_routes:
migration_routes[route_key] = []
migration_routes[route_key].append(
(op.webmethod.level, getattr(op.webmethod, "deprecated", False))
)
for route_key, versions in migration_routes.items():
if len(versions) > 1:
print(f"Migration pattern detected for {route_key[1]} {route_key[0]}:")
for level, deprecated in versions:
status = "DEPRECATED" if deprecated else "ACTIVE"
print(f" - {level} ({status})")
for op in operations:
endpoint_classes.add(op.defining_class)
operation = self._build_operation(op)
@ -851,10 +1040,22 @@ class Generator:
doc_string = parse_type(cls)
if hasattr(cls, "API_NAMESPACE") and cls.API_NAMESPACE != cls.__name__:
continue
# Add supplemental content to tag pages
api_group = f"{cls.__name__.lower()}-api"
supplemental_content = self._load_supplemental_content(api_group)
tag_description = doc_string.long_description or ""
if supplemental_content:
if tag_description:
tag_description = f"{tag_description}\n\n{supplemental_content}"
else:
tag_description = supplemental_content
operation_tags.append(
Tag(
name=cls.__name__,
description=doc_string.long_description,
description=tag_description,
displayName=doc_string.short_description,
)
)

View file

@ -54,6 +54,7 @@ class Options:
property_description_fun: Optional[Callable[[type, str, str], str]] = None
captions: Optional[Dict[str, str]] = None
include_standard_error_responses: bool = True
stability_filter: Optional[str] = None
default_captions: ClassVar[Dict[str, str]] = {
"Operations": "Operations",

View file

@ -335,8 +335,10 @@ const sidebars: SidebarsConfig = {
},
],
// API Reference sidebar - use plugin-generated sidebar
apiSidebar: require('./docs/api/sidebar.ts').default,
// API Reference sidebars - use plugin-generated sidebars
stableApiSidebar: require('./docs/api/sidebar.ts').default,
experimentalApiSidebar: require('./docs/api-experimental/sidebar.ts').default,
deprecatedApiSidebar: require('./docs/api-deprecated/sidebar.ts').default,
};
export default sidebars;

View file

@ -189,3 +189,29 @@ button[class*="button"]:hover,
.pagination-nav__link--prev:hover {
background-color: #f3f4f6 !important;
}
/* Deprecated endpoint styling */
.menu__list-item--deprecated .menu__link {
text-decoration: line-through !important;
opacity: 0.7;
font-style: italic;
}
.menu__list-item--deprecated .menu__link:hover {
opacity: 0.9;
}
/* Deprecated endpoint badges - slightly muted */
.menu__list-item--deprecated.api-method > .menu__link::before {
opacity: 0.7;
border-style: dashed !important;
}
/* Dark theme adjustments for deprecated endpoints */
[data-theme='dark'] .menu__list-item--deprecated .menu__link {
opacity: 0.6;
}
[data-theme='dark'] .menu__list-item--deprecated .menu__link:hover {
opacity: 0.8;
}

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,9 @@
## Deprecated APIs
> **⚠️ DEPRECATED**: These APIs are provided for migration reference and will be removed in future versions. Not recommended for new projects.
### Migration Guidance
If you are using deprecated versions of the Agents or Responses APIs, please migrate to:
- **Responses API**: Use the stable v1 Responses API endpoints

View file

@ -0,0 +1,21 @@
## Agents API (Experimental)
> **🧪 EXPERIMENTAL**: This API is in preview and may change based on user feedback. Great for exploring new capabilities and providing feedback to influence the final design.
Main functionalities provided by this API:
- Create agents with specific instructions and ability to use tools.
- Interactions with agents are grouped into sessions ("threads"), and each interaction is called a "turn".
- Agents can be provided with various tools (see the ToolGroups and ToolRuntime APIs for more details).
- Agents can be provided with various shields (see the Safety API for more details).
- Agents can also use Memory to retrieve information from knowledge bases. See the RAG Tool and Vector IO APIs for more details.
### 🧪 Feedback Welcome
This API is actively being developed. We welcome feedback on:
- API design and usability
- Performance characteristics
- Missing features or capabilities
- Integration patterns
**Provide Feedback**: [GitHub Discussions](https://github.com/llamastack/llama-stack/discussions) or [GitHub Issues](https://github.com/llamastack/llama-stack/issues)

View file

@ -0,0 +1,40 @@
## Responses API
The Responses API provides OpenAI-compatible functionality with enhanced capabilities for dynamic, stateful interactions.
> **✅ STABLE**: This API is production-ready with backward compatibility guarantees. Recommended for production applications.
### ✅ Supported Tools
The Responses API supports the following tool types:
- **`web_search`**: Search the web for current information and real-time data
- **`file_search`**: Search through uploaded files and vector stores
- Supports dynamic `vector_store_ids` per call
- Compatible with OpenAI file search patterns
- **`function`**: Call custom functions with JSON schema validation
- **`mcp_tool`**: Model Context Protocol integration
### ✅ Supported Fields & Features
**Core Capabilities:**
- **Dynamic Configuration**: Switch models, vector stores, and tools per request without pre-configuration
- **Conversation Branching**: Use `previous_response_id` to branch conversations and explore different paths
- **Rich Annotations**: Automatic file citations, URL citations, and container file citations
- **Status Tracking**: Monitor tool call execution status and handle failures gracefully
### 🚧 Work in Progress
- Full real-time response streaming support
- `tool_choice` parameter
- `max_tool_calls` parameter
- Built-in tools (code interpreter, containers API)
- Safety & guardrails
- `reasoning` capabilities
- `service_tier`
- `logprobs`
- `max_output_tokens`
- `metadata` handling
- `instructions`
- `incomplete_details`
- `background`