llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Ashwin Bharambe 5cdb29758a feat(responses): add output_text delta events to responses (#2265 ) This adds initial streaming support to the Responses API. This PR makes sure that the _first_ inference call made to chat completions streams out. There's more to be done: - tool call output tokens need to stream out when possible - we need to loop through multiple rounds of inference and they all need to stream out. ## Test Plan Added a test. Executed as: ``` FIREWORKS_API_KEY=... \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Then, started a llama stack fireworks distro and tested against it like this: ``` OPENAI_API_KEY=blah \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --base-url http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ```		2025-05-27 13:07:14 -07:00
..
css	fix: improve Mermaid diagram visibility in dark mode (#2092 )	2025-05-02 13:09:45 -07:00
js	chore: Fix to persist the theme preference across page navigation. (#1974 )	2025-04-16 13:58:25 -07:00
providers/vector_io	docs: Document sqlite-vec faiss comparison (#1821 )	2025-03-28 17:41:33 +01:00
llama-stack-logo.png	first version of readthedocs (#278 )	2024-10-22 10:15:58 +05:30
llama-stack-spec.html	feat(responses): add output_text delta events to responses (#2265 )	2025-05-27 13:07:14 -07:00
llama-stack-spec.yaml	feat(responses): add output_text delta events to responses (#2265 )	2025-05-27 13:07:14 -07:00
llama-stack.png	Make a new llama stack image	2024-11-22 23:49:22 -08:00
remote_or_local.gif	[docs] update documentations (#356 )	2024-11-04 16:52:38 -08:00
safety_system.webp	[Docs] Zero-to-Hero notebooks and quick start documentation (#368 )	2024-11-08 17:16:44 -08:00