mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-07 02:47:21 +00:00
Implementats usage accumulation to StreamingResponseOrchestrator.
The most important part was to pass `stream_options = { "include_usage":
true }` to the chat_completion call. This means I will have to record
all responses tests again because request hash will change :)
Test changes:
- Add usage assertions to streaming and non-streaming tests
- Update test recordings with actual usage data from OpenAI
|
||
|---|---|---|
| .. | ||
| agent | ||
| agents | ||
| batches | ||
| files | ||
| inference | ||
| inline | ||
| nvidia | ||
| utils | ||
| vector_io | ||
| test_bedrock.py | ||
| test_configs.py | ||