This commit introduces a new FastAPI router-based system for defining
API endpoints, enabling a migration path away from the legacy @webmethod
decorator system. The implementation includes router infrastructure,
migration of the Batches API as the first example, and updates
to server, OpenAPI generation, and inspection systems to
support both routing approaches.
The router infrastructure consists of a router registry system
that allows APIs to register FastAPI router factories, which
are then automatically discovered and included in the server
application. Standard error responses are centralized in
router_utils to ensure consistent OpenAPI specification
generation with proper $ref references to component responses.
The Batches API has been migrated to demonstrate the new
pattern. The protocol definition and models remain in
llama_stack_api/batches, maintaining clear separation between
API contracts and server implementation. The FastAPI router
implementation lives in
llama_stack/core/server/routers/batches, following the
established pattern where API contracts are defined in
llama_stack_api and server routing logic lives in
llama_stack/core/server.
The server now checks for registered routers before falling
back to the legacy webmethod-based route discovery, ensuring
backward compatibility during the migration period. The
OpenAPI generator has been updated to handle both router-based
and webmethod-based routes, correctly extracting metadata from
FastAPI route decorators and Pydantic Field descriptions. The
inspect endpoint now includes routes from both systems, with
proper filtering for deprecated routes and API levels.
Response descriptions are now explicitly defined in router decorators,
ensuring the generated OpenAPI specification matches the
previous format. Error responses use $ref references to
component responses (BadRequest400, TooManyRequests429, etc.)
as required by the specification. This is neat and will allow us to
remove a lot of boiler plate code from our generator once the
migration is done.
This implementation provides a foundation for incrementally migrating
other APIs to the router system while maintaining full backward
compatibility with existing webmethod-based APIs.
Closes: https://github.com/llamastack/llama-stack/issues/4188
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
This replaces the legacy "pyopenapi + strong_typing" pipeline with a
FastAPI-backed generator that has an explicit schema registry inside
`llama_stack_api`. The key changes:
1. **New generator architecture.** FastAPI now builds the OpenAPI schema
directly from the real routes, while helper modules
(`schema_collection`, `endpoints`, `schema_transforms`, etc.)
post-process the result. The old pyopenapi stack and its strong_typing
helpers are removed entirely, so we no longer rely on fragile AST
analysis or top-level import side effects.
2. **Schema registry in `llama_stack_api`.** `schema_utils.py` keeps a
`SchemaInfo` record for every `@json_schema_type`, `register_schema`,
and dynamically created request model. The OpenAPI generator and other
tooling query this registry instead of scanning the package tree,
producing deterministic names (e.g., `{MethodName}Request`), capturing
all optional/nullable fields, and making schema discovery testable. A
new unit test covers the registry behavior.
3. **Regenerated specs + CI alignment.** All docs/Stainless specs are
regenerated from the new pipeline, so optional/nullable fields now match
reality (expect the API Conformance workflow to report breaking
changes—this PR establishes the new baseline). The workflow itself is
back to the stock oasdiff invocation so future regressions surface
normally.
*Conformance will be RED on this PR; we choose to accept the
deviations.*
## Test Plan
- `uv run pytest tests/unit/server/test_schema_registry.py`
- `uv run python -m scripts.openapi_generator.main docs/static`
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Mark all register_* / unregister_* APIs as deprecated across models,
shields, tool groups, datasets, benchmarks, and scoring functions. This
is the first step toward moving resource mutations to an `/admin`
namespace as outlined in
https://github.com/llamastack/llama-stack/issues/3809#issuecomment-3492931585.
The deprecation flag will be reflected in the OpenAPI schema to warn API
users that these endpoints are being phased out. Next step will be
implementing the `/admin` route namespace for these resource management
operations.
- `register_model` / `unregister_model`
- `register_shield` / `unregister_shield`
- `register_tool_group` / `unregister_toolgroup`
- `register_dataset` / `unregister_dataset`
- `register_benchmark` / `unregister_benchmark`
- `register_scoring_function` / `unregister_scoring_function`
- Removes the deprecated agents (sessions and turns) API that was marked
alpha in 0.3.0
- Cleans up unused imports and orphaned types after the API removal
- Removes `SessionNotFoundError` and `AgentTurnInputType` which are no
longer needed
The agents API is completely superseded by the Responses + Conversations
APIs, and the client SDK Agent class already uses those implementations.
Corresponding client-side PR:
https://github.com/llamastack/llama-stack-client-python/pull/295
This patch ensures if max tokens is not defined, then is set to None
instead of 0 when calling openai_chat_completion. This way some
providers (like gemini) that cannot handle the `max_tokens = 0` will not
fail
Issue: #3666
# What does this PR do?
As discussed on discord, we do not need to reinvent the wheel for
telemetry. Instead we'll lean into the canonical OTEL stack.
Logs/traces/metrics will still be sent via OTEL - they just won't be
stored on, queried through Stack.
This is the first of many PRs to remove telemetry API from Stack.
1) removed webmethod decorators to remove from API spec
2) removed tests as @iamemilio is adding them on otel directly.
## Test Plan
This is a sweeping change to clean up some gunk around our "Tool"
definitions.
First, we had two types `Tool` and `ToolDef`. The first of these was a
"Resource" type for the registry but we had stopped registering tools
inside the Registry long back (and only registered ToolGroups.) The
latter was for specifying tools for the Agents API. This PR removes the
former and adds an optional `toolgroup_id` field to the latter.
Secondly, as pointed out by @bbrowning in
https://github.com/llamastack/llama-stack/pull/3003#issuecomment-3245270132,
we were doing a lossy conversion from a full JSON schema from the MCP
tool specification into our ToolDefinition to send it to the model.
There is no necessity to do this -- we ourselves aren't doing any
execution at all but merely passing it to the chat completions API which
supports this. By doing this (and by doing it poorly), we encountered
limitations like not supporting array items, or not resolving $refs,
etc.
To fix this, we replaced the `parameters` field by `{ input_schema,
output_schema }` which can be full blown JSON schemas.
Finally, there were some types in our llama-related chat format
conversion which needed some cleanup. We are taking this opportunity to
clean those up.
This PR is a substantial breaking change to the API. However, given our
window for introducing breaking changes, this suits us just fine. I will
be landing a concurrent `llama-stack-client` change as well since API
shapes are changing.
# What does this PR do?
Adds supplementary static content to root API spec pages. This is useful for giving context behind a specific API group, adding information on supported features or work in progress, etc.
This PR introduces supplementary information for Agents (experimental, deprecated) and Responses (stable) APIs.
<!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. -->
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Documentation server renders rich static content for the Agents API group:

<!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
# What does this PR do?
First step towards cleaning up the API reference section of the docs.
- Separates API reference into 3 sections: stable (`v1`), experimental (`v1alpha` and `v1beta`), and deprecated (`deprecated=True`)
- Each section is accessible via the dropdown menu and `docs/api-overview`
<img width="1237" height="321" alt="Screenshot 2025-09-30 at 5 47 30 PM" src="https://github.com/user-attachments/assets/fe0e498c-b066-46ed-a48e-4739d3b6724c" />
<img width="860" height="510" alt="Screenshot 2025-09-30 at 5 47 49 PM" src="https://github.com/user-attachments/assets/a92a8d8c-94bf-42d5-9f5b-b47bb2b14f9c" />
- Deprecated APIs: Added styling to the sidebar, and a notice on the endpoint pages
<img width="867" height="428" alt="Screenshot 2025-09-30 at 5 47 43 PM" src="https://github.com/user-attachments/assets/9e6e050d-c782-461b-8084-5ff6496d7bd9" />
Closes#3628
TODO in follow-up PRs:
- Add the ability to annotate API groups with supplementary content (so we can have longer descriptions of complex APIs like Responses)
- Clean up docstrings to show API endpoints (or short semantic titles) in the sidebar
## Test Plan
- Local testing
- Made sure API conformance test still passes