mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-11 13:44:38 +00:00
feat(responses): add usage types to inference and responses APIs (#3764)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
API Conformance Tests / check-schema-compatibility (push) Successful in 36s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 2m7s
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
API Conformance Tests / check-schema-compatibility (push) Successful in 36s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 2m7s
## Summary Adds OpenAI-compatible usage tracking types to enable reporting token consumption for both streaming and non-streaming responses. ## Type Definitions **Chat Completion Usage** (inference API): ```python class OpenAIChatCompletionUsage(BaseModel): prompt_tokens: int completion_tokens: int total_tokens: int prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails | None completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails | None ``` **Response Usage** (responses API): ```python class OpenAIResponseUsage(BaseModel): input_tokens: int output_tokens: int total_tokens: int input_tokens_details: OpenAIResponseUsageInputTokensDetails | None output_tokens_details: OpenAIResponseUsageOutputTokensDetails | None ``` This matches OpenAI's usage reporting format and enables PR #3766 to implement usage tracking in streaming responses. Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
parent
ebae0385bb
commit
aaf5036235
8 changed files with 747 additions and 0 deletions
|
@ -346,6 +346,42 @@ class OpenAIResponseText(BaseModel):
|
|||
format: OpenAIResponseTextFormat | None = None
|
||||
|
||||
|
||||
class OpenAIResponseUsageOutputTokensDetails(BaseModel):
|
||||
"""Token details for output tokens in OpenAI response usage.
|
||||
|
||||
:param reasoning_tokens: Number of tokens used for reasoning (o1/o3 models)
|
||||
"""
|
||||
|
||||
reasoning_tokens: int | None = None
|
||||
|
||||
|
||||
class OpenAIResponseUsageInputTokensDetails(BaseModel):
|
||||
"""Token details for input tokens in OpenAI response usage.
|
||||
|
||||
:param cached_tokens: Number of tokens retrieved from cache
|
||||
"""
|
||||
|
||||
cached_tokens: int | None = None
|
||||
|
||||
|
||||
@json_schema_type
|
||||
class OpenAIResponseUsage(BaseModel):
|
||||
"""Usage information for OpenAI response.
|
||||
|
||||
:param input_tokens: Number of tokens in the input
|
||||
:param output_tokens: Number of tokens in the output
|
||||
:param total_tokens: Total tokens used (input + output)
|
||||
:param input_tokens_details: Detailed breakdown of input token usage
|
||||
:param output_tokens_details: Detailed breakdown of output token usage
|
||||
"""
|
||||
|
||||
input_tokens: int
|
||||
output_tokens: int
|
||||
total_tokens: int
|
||||
input_tokens_details: OpenAIResponseUsageInputTokensDetails | None = None
|
||||
output_tokens_details: OpenAIResponseUsageOutputTokensDetails | None = None
|
||||
|
||||
|
||||
@json_schema_type
|
||||
class OpenAIResponseObject(BaseModel):
|
||||
"""Complete OpenAI response object containing generation results and metadata.
|
||||
|
@ -363,6 +399,7 @@ class OpenAIResponseObject(BaseModel):
|
|||
:param text: Text formatting configuration for the response
|
||||
:param top_p: (Optional) Nucleus sampling parameter used for generation
|
||||
:param truncation: (Optional) Truncation strategy applied to the response
|
||||
:param usage: (Optional) Token usage information for the response
|
||||
"""
|
||||
|
||||
created_at: int
|
||||
|
@ -380,6 +417,7 @@ class OpenAIResponseObject(BaseModel):
|
|||
text: OpenAIResponseText = OpenAIResponseText(format=OpenAIResponseTextFormat(type="text"))
|
||||
top_p: float | None = None
|
||||
truncation: str | None = None
|
||||
usage: OpenAIResponseUsage | None = None
|
||||
|
||||
|
||||
@json_schema_type
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue