mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
feat(responses): implement usage tracking in streaming responses (#3771)
Implementats usage accumulation to StreamingResponseOrchestrator.
The most important part was to pass `stream_options = { "include_usage":
true }` to the chat_completion call. This means I will have to record
all responses tests again because request hash will change :)
Test changes:
- Add usage assertions to streaming and non-streaming tests
- Update test recordings with actual usage data from OpenAI
This commit is contained in:
parent
e7d21e1ee3
commit
1394403360
21 changed files with 15099 additions and 612 deletions
|
|
@ -167,6 +167,9 @@ async def test_create_openai_response_with_string_input(openai_responses_impl, m
|
|||
tools=None,
|
||||
stream=True,
|
||||
temperature=0.1,
|
||||
stream_options={
|
||||
"include_usage": True,
|
||||
},
|
||||
)
|
||||
|
||||
# Should have content part events for text streaming
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue