llama-stack-mirror/llama_stack
Ashwin Bharambe 5cdb29758a
feat(responses): add output_text delta events to responses (#2265)
This adds initial streaming support to the Responses API. 

This PR makes sure that the _first_ inference call made to chat
completions streams out.

There's more to be done:
 - tool call output tokens need to stream out when possible
- we need to loop through multiple rounds of inference and they all need
to stream out.

## Test Plan

Added a test. Executed as:

```
FIREWORKS_API_KEY=... \
  pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
  --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct
```

Then, started a llama stack fireworks distro and tested against it like
this:

```
OPENAI_API_KEY=blah \
   pytest -s -v 'tests/verifications/openai_api/test_responses.py' \
   --base-url http://localhost:8321/v1/openai/v1 \
  --model meta-llama/Llama-4-Scout-17B-16E-Instruct 
```
2025-05-27 13:07:14 -07:00
..
apis feat(responses): add output_text delta events to responses (#2265) 2025-05-27 13:07:14 -07:00
cli fix: handle None external_providers_dir in build with run arg (#2269) 2025-05-27 09:41:12 +02:00
distribution fix: index non-MCP toolgroups at registration time (#2272) 2025-05-26 20:33:36 -07:00
models chore: make cprint write to stderr (#2250) 2025-05-24 23:39:57 -07:00
providers feat(responses): add output_text delta events to responses (#2265) 2025-05-27 13:07:14 -07:00
strong_typing chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
templates chore: remove dependencies.json (#2281) 2025-05-27 10:26:57 -07:00
ui feat: start ui server in llama stack run (#2170) 2025-05-23 20:00:09 -07:00
__init__.py export LibraryClient 2024-12-13 12:08:00 -08:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py chore: make cprint write to stderr (#2250) 2025-05-24 23:39:57 -07:00
schema_utils.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00