llama-stack-mirror/docs
Sébastien Han 7f43051a63
feat: Implement FastAPI router system (#4191)
# What does this PR do?

This commit introduces a new FastAPI router-based system for defining
API endpoints, enabling a migration path away from the legacy @webmethod
decorator system. The implementation includes router infrastructure,
migration of the Batches API as the first example, and updates to
server, OpenAPI generation, and inspection systems to support both
routing approaches.

The router infrastructure consists of a router registry system that
allows APIs to register FastAPI router factories, which are then
automatically discovered and included in the server application.
Standard error responses are centralized in router_utils to ensure
consistent OpenAPI specification generation with proper $ref references
to component responses.

The Batches API has been migrated to demonstrate the new pattern. The
protocol definition and models remain in llama_stack_api/batches,
maintaining clear separation between API contracts and server
implementation. The FastAPI router implementation lives in
llama_stack/core/server/routers/batches, following the established
pattern where API contracts are defined in llama_stack_api and server
routing logic lives in
llama_stack/core/server.

The server now checks for registered routers before falling back to the
legacy webmethod-based route discovery, ensuring backward compatibility
during the migration period. The OpenAPI generator has been updated to
handle both router-based and webmethod-based routes, correctly
extracting metadata from FastAPI route decorators and Pydantic Field
descriptions. The inspect endpoint now includes routes from both
systems, with proper filtering for deprecated routes and API levels.

Response descriptions are now explicitly defined in router decorators,
ensuring the generated OpenAPI specification matches the previous
format. Error responses use $ref references to component responses
(BadRequest400, TooManyRequests429, etc.) as required by the
specification. This is neat and will allow us to remove a lot of boiler
plate code from our generator once the migration is done.

This implementation provides a foundation for incrementally migrating
other APIs to the router system while maintaining full backward
compatibility with existing webmethod-based APIs.

Closes: https://github.com/llamastack/llama-stack/issues/4188

## Test Plan

CI, the server should start, same routes should be visible.

```
curl http://localhost:8321/v1/inspect/routes | jq '.data[] | select(.route | contains("batches"))'
```

Also:

```
 uv run pytest tests/integration/batches/ -vv --stack-config=http://localhost:8321
================================================== test session starts ==================================================
platform darwin -- Python 3.12.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.8', 'Platform': 'macOS-26.0.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 24 items                                                                                                      

tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_creation_and_retrieval[None] SKIPPED [  4%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_listing[None] SKIPPED               [  8%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_immediate_cancellation[None] SKIPPED [ 12%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_e2e_chat_completions[None] SKIPPED  [ 16%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_e2e_completions[None] SKIPPED       [ 20%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_invalid_endpoint[None] SKIPPED [ 25%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_cancel_completed[None] SKIPPED [ 29%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_missing_required_fields[None] SKIPPED [ 33%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_invalid_completion_window[None] SKIPPED [ 37%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_streaming_not_supported[None] SKIPPED [ 41%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_mixed_streaming_requests[None] SKIPPED [ 45%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_endpoint_mismatch[None] SKIPPED [ 50%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_missing_required_body_fields[None] SKIPPED [ 54%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_invalid_metadata_types[None] SKIPPED [ 58%]
tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_e2e_embeddings[None] SKIPPED        [ 62%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_nonexistent_file_id PASSED [ 66%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_malformed_jsonl PASSED     [ 70%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[empty] XFAIL [ 75%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[malformed] XFAIL [ 79%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_retrieve_nonexistent PASSED [ 83%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_cancel_nonexistent PASSED  [ 87%]
tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_error_handling_invalid_model PASSED [ 91%]
tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotent_batch_creation_successful PASSED [ 95%]
tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotency_conflict_with_different_params PASSED [100%]

================================================= slowest 10 durations ==================================================
1.01s call     tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotent_batch_creation_successful
0.21s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_nonexistent_file_id
0.17s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_malformed_jsonl
0.12s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_error_handling_invalid_model
0.05s setup    tests/integration/batches/test_batches.py::TestBatchesIntegration::test_batch_creation_and_retrieval[None]
0.02s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[empty]
0.01s call     tests/integration/batches/test_batches_idempotency.py::TestBatchesIdempotencyIntegration::test_idempotency_conflict_with_different_params
0.01s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_file_malformed_batch_file[malformed]
0.01s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_retrieve_nonexistent
0.00s call     tests/integration/batches/test_batches_errors.py::TestBatchesErrorHandling::test_batch_cancel_nonexistent
======================================= 7 passed, 15 skipped, 2 xfailed in 1.78s ========================================
```

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-12-03 12:25:54 +01:00
..
docs fix(docs): Updated the LS documentation to point users to the correct docker container (#4267) 2025-12-01 21:03:34 -08:00
notebooks chore: Stack server no longer depends on llama-stack-client (#4094) 2025-11-07 09:54:09 -08:00
scripts feat: Add static file import system for docs (#3882) 2025-10-24 14:01:33 -04:00
src feat!: Architect Llama Stack Telemetry Around Automatic Open Telemetry Instrumentation (#4127) 2025-12-01 10:33:18 -08:00
static feat: Implement FastAPI router system (#4191) 2025-12-03 12:25:54 +01:00
supplementary docs: adding supplementary markdown content to API specs (#3632) 2025-10-01 10:15:30 -07:00
zero_to_hero_guide chore: update doc (#3857) 2025-10-20 10:33:21 -07:00
docusaurus.config.ts feat: Add static file import system for docs (#3882) 2025-10-24 14:01:33 -04:00
dog.jpg Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
getting_started.ipynb chore: update getting_started (#3875) 2025-10-21 11:09:45 -07:00
getting_started_llama4.ipynb chore: update doc (#3857) 2025-10-20 10:33:21 -07:00
getting_started_llama_api.ipynb chore: update doc (#3857) 2025-10-20 10:33:21 -07:00
license_header.txt Initial commit 2024-07-23 08:32:33 -07:00
original_rfc.md chore(rename): move llama_stack.distribution to llama_stack.core (#2975) 2025-07-30 23:30:53 -07:00
package-lock.json fix(docs): fix glob vulnerability (#4193) 2025-11-19 11:01:52 -08:00
package.json fix(docs): fix glob vulnerability (#4193) 2025-11-19 11:01:52 -08:00
quick_start.ipynb chore: update quick_start (#3878) 2025-10-21 11:33:23 -07:00
README.md feat: Add static file import system for docs (#3882) 2025-10-24 14:01:33 -04:00
sidebars.ts chore(ui): add npm package and dockerfile (#4100) 2025-11-11 10:40:31 -08:00
tsconfig.json docs: docusaurus setup (#3541) 2025-09-24 14:11:30 -07:00

Llama Stack Documentation

Here's a collection of comprehensive guides, examples, and resources for building AI applications with Llama Stack. For the complete documentation, visit our Github page.

Render locally

From the llama-stack docs/ directory, run the following commands to render the docs locally:

npm install
npm run gen-api-docs all
npm run build
npm run serve

You can open up the docs in your browser at http://localhost:3000

File Import System

This documentation uses remark-code-import to import files directly from the repository, eliminating copy-paste maintenance. Files are automatically embedded during build time.

Importing Code Files

To import Python code (or any code files) with syntax highlighting, use this syntax in .mdx files:

```python file=./demo_script.py title="demo_script.py"

This automatically imports the file content and displays it as a formatted code block with Python syntax highlighting.

**Note:** Paths are relative to the current `.mdx` file location, not the repository root.

### Importing Markdown Files as Content

For importing and rendering markdown files (like CONTRIBUTING.md), use the raw-loader approach:

```jsx
import Contributing from '!!raw-loader!../../../CONTRIBUTING.md';
import ReactMarkdown from 'react-markdown';

<ReactMarkdown>{Contributing}</ReactMarkdown>

Requirements:

  • Install dependencies: npm install --save-dev raw-loader react-markdown

Path Resolution:

  • For remark-code-import: Paths are relative to the current .mdx file location
  • For raw-loader: Paths are relative to the current .mdx file location
  • Use ../ to navigate up directories as needed

Content

Try out Llama Stack's capabilities through our detailed Jupyter notebooks: