llama-stack-mirror/docs
Ashwin Bharambe 5cdb29758a
feat(responses): add output_text delta events to responses (#2265)
This adds initial streaming support to the Responses API. 

This PR makes sure that the _first_ inference call made to chat
completions streams out.

There's more to be done:
 - tool call output tokens need to stream out when possible
- we need to loop through multiple rounds of inference and they all need
to stream out.

## Test Plan

Added a test. Executed as:

```
FIREWORKS_API_KEY=... \
  pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
  --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct
```

Then, started a llama stack fireworks distro and tested against it like
this:

```
OPENAI_API_KEY=blah \
   pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
   --base-url http://localhost:8321/v1/openai/v1 \
  --model meta-llama/Llama-4-Scout-17B-16E-Instruct 
```
2025-05-27 13:07:14 -07:00
..
_static feat(responses): add output_text delta events to responses (#2265) 2025-05-27 13:07:14 -07:00
notebooks docs: fix evals notebook preview (#2277) 2025-05-27 15:18:20 +02:00
openapi_generator feat: introduce APIs for retrieving chat completion requests (#2145) 2025-05-18 21:43:19 -07:00
resources Several documentation fixes and fix link to API reference 2025-02-04 14:00:43 -08:00
source fix: use pypi browser agent (#2260) 2025-05-24 23:26:30 -07:00
zero_to_hero_guide feat: add additional logging to llama stack build (#1689) 2025-04-30 11:06:24 -07:00
conftest.py fix: sleep after notebook test 2025-03-23 14:03:35 -07:00
contbuild.sh Fix broken links with docs 2024-11-22 20:42:17 -08:00
dog.jpg Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
getting_started.ipynb chore: remove last instances of code-interpreter provider (#2143) 2025-05-12 10:54:43 -07:00
getting_started_llama4.ipynb docs: llama4 getting started nb (#1878) 2025-04-06 18:51:34 -07:00
getting_started_llama_api.ipynb feat: add api.llama provider, llama-guard-4 model (#2058) 2025-04-29 10:07:41 -07:00
license_header.txt Initial commit 2024-07-23 08:32:33 -07:00
make.bat feat(pre-commit): enhance pre-commit hooks with additional checks (#2014) 2025-04-30 11:35:49 -07:00
Makefile first version of readthedocs (#278) 2024-10-22 10:15:58 +05:30
readme.md docs: misc cleanup (#2223) 2025-05-21 17:35:27 +02:00

Llama Stack Documentation

Here's a collection of comprehensive guides, examples, and resources for building AI applications with Llama Stack. For the complete documentation, visit our ReadTheDocs page.

Render locally

From the llama-stack root directory, run the following command to render the docs locally:

uv run --with ".[docs]" sphinx-autobuild docs/source docs/build/html --write-all

You can open up the docs in your browser at http://localhost:8000

Content

Try out Llama Stack's capabilities through our detailed Jupyter notebooks: