llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

Ashwin Bharambe 5cdb29758a feat(responses): add output_text delta events to responses (#2265 ) This adds initial streaming support to the Responses API. This PR makes sure that the _first_ inference call made to chat completions streams out. There's more to be done: - tool call output tokens need to stream out when possible - we need to loop through multiple rounds of inference and they all need to stream out. ## Test Plan Added a test. Executed as: ``` FIREWORKS_API_KEY=... \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --provider=stack:fireworks --model meta-llama/Llama-4-Scout-17B-16E-Instruct ``` Then, started a llama stack fireworks distro and tested against it like this: ``` OPENAI_API_KEY=blah \ pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --base-url http://localhost:8321/v1/openai/v1 \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ```		2025-05-27 13:07:14 -07:00
..
agent	feat: add list responses API (#2233 )	2025-05-23 13:16:48 -07:00
agents	feat(responses): add output_text delta events to responses (#2265 )	2025-05-27 13:07:14 -07:00
inference	fix: multiple tool calls in remote-vllm chat_completion (#2161 )	2025-05-15 11:23:29 -07:00
nvidia	fix: Pass model parameter as config name to NeMo Customizer (#2218 )	2025-05-20 09:51:39 -07:00
utils	fix: add check for interleavedContent (#1973 )	2025-05-06 09:55:07 -07:00
vector_io	feat(sqlite-vec): enable keyword search for sqlite-vec (#1439 )	2025-05-21 15:24:24 -04:00
test_configs.py	feat(api): don't return a payload on file delete (#1640 )	2025-03-25 17:12:36 -07:00