llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-13 06:07:58 +00:00

History

Ashwin Bharambe 1394403360 feat(responses): implement usage tracking in streaming responses (#3771 ) Implementats usage accumulation to StreamingResponseOrchestrator. The most important part was to pass `stream_options = { "include_usage": true }` to the chat_completion call. This means I will have to record all responses tests again because request hash will change :) Test changes: - Add usage assertions to streaming and non-streaming tests - Update test recordings with actual usage data from OpenAI		2025-10-10 12:27:03 -07:00
..
agent	feat: Add support for Conversations in Responses API (#3743 )	2025-10-10 11:57:40 -07:00
agents	feat(responses): implement usage tracking in streaming responses (#3771 )	2025-10-10 12:27:03 -07:00
batches	feat(batches, completions): add /v1/completions support to /v1/batches (#3309 )	2025-09-05 11:59:57 -07:00
files	feat(files): fix expires_after API shape (#3604 )	2025-09-29 21:29:15 -07:00
inference	fix: Update watsonx.ai provider to use LiteLLM mixin and list all models (#3674 )	2025-10-08 07:29:43 -04:00
inline	feat(tools)!: substantial clean up of "Tool" related datatypes (#3627 )	2025-10-02 15:12:03 -07:00
nvidia	feat: add static embedding metadata to dynamic model listings for providers using OpenAIMixin (#3547 )	2025-09-25 17:17:00 -04:00
utils	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
vector_io	fix(tests): remove chroma and qdrant from vector io unit tests (#3759 )	2025-10-09 14:36:34 -07:00
test_bedrock.py	fix: AWS Bedrock inference profile ID conversion for region-specific endpoints (#3386 )	2025-09-11 11:41:53 +02:00
test_configs.py	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00