llama-stack-mirror/llama_stack/apis/inference
Ashwin Bharambe c92a1c99f0 feat(responses): add usage types to inference and responses APIs
Add OpenAI-compatible usage tracking types:
- OpenAIChatCompletionUsage with prompt/completion token counts
- OpenAIResponseUsage with input/output token counts
- Token detail types for cached_tokens and reasoning_tokens
- Add usage field to chat completion and response objects

This enables reporting token consumption for both streaming and
non-streaming responses, matching OpenAI's usage reporting format.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-09 21:12:29 -07:00
..
__init__.py chore: remove nested imports (#2515) 2025-06-26 08:01:05 +05:30
event_logger.py pre-commit lint 2024-09-28 16:04:41 -07:00
inference.py feat(responses): add usage types to inference and responses APIs 2025-10-09 21:12:29 -07:00