mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-11 13:44:38 +00:00
feat(responses): add usage types to inference and responses APIs (#3764)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
API Conformance Tests / check-schema-compatibility (push) Successful in 36s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 2m7s
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
API Conformance Tests / check-schema-compatibility (push) Successful in 36s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 2m7s
## Summary Adds OpenAI-compatible usage tracking types to enable reporting token consumption for both streaming and non-streaming responses. ## Type Definitions **Chat Completion Usage** (inference API): ```python class OpenAIChatCompletionUsage(BaseModel): prompt_tokens: int completion_tokens: int total_tokens: int prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails | None completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails | None ``` **Response Usage** (responses API): ```python class OpenAIResponseUsage(BaseModel): input_tokens: int output_tokens: int total_tokens: int input_tokens_details: OpenAIResponseUsageInputTokensDetails | None output_tokens_details: OpenAIResponseUsageOutputTokensDetails | None ``` This matches OpenAI's usage reporting format and enables PR #3766 to implement usage tracking in streaming responses. Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
parent
ebae0385bb
commit
aaf5036235
8 changed files with 747 additions and 0 deletions
120
docs/static/deprecated-llama-stack-spec.html
vendored
120
docs/static/deprecated-llama-stack-spec.html
vendored
|
@ -6781,6 +6781,10 @@
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
},
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information for the completion"
|
||||||
|
},
|
||||||
"input_messages": {
|
"input_messages": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {
|
"items": {
|
||||||
|
@ -6983,6 +6987,55 @@
|
||||||
"title": "OpenAIChatCompletionToolCallFunction",
|
"title": "OpenAIChatCompletionToolCallFunction",
|
||||||
"description": "Function call details for OpenAI-compatible tool calls."
|
"description": "Function call details for OpenAI-compatible tool calls."
|
||||||
},
|
},
|
||||||
|
"OpenAIChatCompletionUsage": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"prompt_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the prompt"
|
||||||
|
},
|
||||||
|
"completion_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the completion"
|
||||||
|
},
|
||||||
|
"total_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Total tokens used (prompt + completion)"
|
||||||
|
},
|
||||||
|
"prompt_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"cached_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens retrieved from cache"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"title": "OpenAIChatCompletionUsagePromptTokensDetails",
|
||||||
|
"description": "Token details for prompt tokens in OpenAI chat completion usage."
|
||||||
|
},
|
||||||
|
"completion_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"reasoning_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens used for reasoning (o1/o3 models)"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"title": "OpenAIChatCompletionUsageCompletionTokensDetails",
|
||||||
|
"description": "Token details for output tokens in OpenAI chat completion usage."
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"required": [
|
||||||
|
"prompt_tokens",
|
||||||
|
"completion_tokens",
|
||||||
|
"total_tokens"
|
||||||
|
],
|
||||||
|
"title": "OpenAIChatCompletionUsage",
|
||||||
|
"description": "Usage information for OpenAI chat completion."
|
||||||
|
},
|
||||||
"OpenAIChoice": {
|
"OpenAIChoice": {
|
||||||
"type": "object",
|
"type": "object",
|
||||||
"properties": {
|
"properties": {
|
||||||
|
@ -7745,6 +7798,10 @@
|
||||||
"model": {
|
"model": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information for the completion"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"additionalProperties": false,
|
"additionalProperties": false,
|
||||||
|
@ -7785,6 +7842,10 @@
|
||||||
"model": {
|
"model": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information (typically included in final chunk with stream_options)"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"additionalProperties": false,
|
"additionalProperties": false,
|
||||||
|
@ -7882,6 +7943,10 @@
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
},
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information for the completion"
|
||||||
|
},
|
||||||
"input_messages": {
|
"input_messages": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {
|
"items": {
|
||||||
|
@ -9096,6 +9161,10 @@
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "(Optional) Truncation strategy applied to the response"
|
"description": "(Optional) Truncation strategy applied to the response"
|
||||||
},
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIResponseUsage",
|
||||||
|
"description": "(Optional) Token usage information for the response"
|
||||||
|
},
|
||||||
"input": {
|
"input": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {
|
"items": {
|
||||||
|
@ -9541,6 +9610,53 @@
|
||||||
"title": "OpenAIResponseText",
|
"title": "OpenAIResponseText",
|
||||||
"description": "Text response configuration for OpenAI responses."
|
"description": "Text response configuration for OpenAI responses."
|
||||||
},
|
},
|
||||||
|
"OpenAIResponseUsage": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"input_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the input"
|
||||||
|
},
|
||||||
|
"output_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the output"
|
||||||
|
},
|
||||||
|
"total_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Total tokens used (input + output)"
|
||||||
|
},
|
||||||
|
"input_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"cached_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens retrieved from cache"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"description": "Detailed breakdown of input token usage"
|
||||||
|
},
|
||||||
|
"output_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"reasoning_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens used for reasoning (o1/o3 models)"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"description": "Detailed breakdown of output token usage"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"required": [
|
||||||
|
"input_tokens",
|
||||||
|
"output_tokens",
|
||||||
|
"total_tokens"
|
||||||
|
],
|
||||||
|
"title": "OpenAIResponseUsage",
|
||||||
|
"description": "Usage information for OpenAI response."
|
||||||
|
},
|
||||||
"ResponseShieldSpec": {
|
"ResponseShieldSpec": {
|
||||||
"type": "object",
|
"type": "object",
|
||||||
"properties": {
|
"properties": {
|
||||||
|
@ -9983,6 +10099,10 @@
|
||||||
"truncation": {
|
"truncation": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "(Optional) Truncation strategy applied to the response"
|
"description": "(Optional) Truncation strategy applied to the response"
|
||||||
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIResponseUsage",
|
||||||
|
"description": "(Optional) Token usage information for the response"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"additionalProperties": false,
|
"additionalProperties": false,
|
||||||
|
|
103
docs/static/deprecated-llama-stack-spec.yaml
vendored
103
docs/static/deprecated-llama-stack-spec.yaml
vendored
|
@ -4999,6 +4999,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information for the completion
|
||||||
input_messages:
|
input_messages:
|
||||||
type: array
|
type: array
|
||||||
items:
|
items:
|
||||||
|
@ -5165,6 +5169,49 @@ components:
|
||||||
title: OpenAIChatCompletionToolCallFunction
|
title: OpenAIChatCompletionToolCallFunction
|
||||||
description: >-
|
description: >-
|
||||||
Function call details for OpenAI-compatible tool calls.
|
Function call details for OpenAI-compatible tool calls.
|
||||||
|
OpenAIChatCompletionUsage:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
prompt_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the prompt
|
||||||
|
completion_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the completion
|
||||||
|
total_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Total tokens used (prompt + completion)
|
||||||
|
prompt_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
cached_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens retrieved from cache
|
||||||
|
additionalProperties: false
|
||||||
|
title: >-
|
||||||
|
OpenAIChatCompletionUsagePromptTokensDetails
|
||||||
|
description: >-
|
||||||
|
Token details for prompt tokens in OpenAI chat completion usage.
|
||||||
|
completion_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
reasoning_tokens:
|
||||||
|
type: integer
|
||||||
|
description: >-
|
||||||
|
Number of tokens used for reasoning (o1/o3 models)
|
||||||
|
additionalProperties: false
|
||||||
|
title: >-
|
||||||
|
OpenAIChatCompletionUsageCompletionTokensDetails
|
||||||
|
description: >-
|
||||||
|
Token details for output tokens in OpenAI chat completion usage.
|
||||||
|
additionalProperties: false
|
||||||
|
required:
|
||||||
|
- prompt_tokens
|
||||||
|
- completion_tokens
|
||||||
|
- total_tokens
|
||||||
|
title: OpenAIChatCompletionUsage
|
||||||
|
description: >-
|
||||||
|
Usage information for OpenAI chat completion.
|
||||||
OpenAIChoice:
|
OpenAIChoice:
|
||||||
type: object
|
type: object
|
||||||
properties:
|
properties:
|
||||||
|
@ -5696,6 +5743,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information for the completion
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
|
@ -5731,6 +5782,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information (typically included in final chunk with stream_options)
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
|
@ -5811,6 +5866,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information for the completion
|
||||||
input_messages:
|
input_messages:
|
||||||
type: array
|
type: array
|
||||||
items:
|
items:
|
||||||
|
@ -6747,6 +6806,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
(Optional) Truncation strategy applied to the response
|
(Optional) Truncation strategy applied to the response
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIResponseUsage'
|
||||||
|
description: >-
|
||||||
|
(Optional) Token usage information for the response
|
||||||
input:
|
input:
|
||||||
type: array
|
type: array
|
||||||
items:
|
items:
|
||||||
|
@ -7095,6 +7158,42 @@ components:
|
||||||
title: OpenAIResponseText
|
title: OpenAIResponseText
|
||||||
description: >-
|
description: >-
|
||||||
Text response configuration for OpenAI responses.
|
Text response configuration for OpenAI responses.
|
||||||
|
OpenAIResponseUsage:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
input_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the input
|
||||||
|
output_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the output
|
||||||
|
total_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Total tokens used (input + output)
|
||||||
|
input_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
cached_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens retrieved from cache
|
||||||
|
additionalProperties: false
|
||||||
|
description: Detailed breakdown of input token usage
|
||||||
|
output_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
reasoning_tokens:
|
||||||
|
type: integer
|
||||||
|
description: >-
|
||||||
|
Number of tokens used for reasoning (o1/o3 models)
|
||||||
|
additionalProperties: false
|
||||||
|
description: Detailed breakdown of output token usage
|
||||||
|
additionalProperties: false
|
||||||
|
required:
|
||||||
|
- input_tokens
|
||||||
|
- output_tokens
|
||||||
|
- total_tokens
|
||||||
|
title: OpenAIResponseUsage
|
||||||
|
description: Usage information for OpenAI response.
|
||||||
ResponseShieldSpec:
|
ResponseShieldSpec:
|
||||||
type: object
|
type: object
|
||||||
properties:
|
properties:
|
||||||
|
@ -7421,6 +7520,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
(Optional) Truncation strategy applied to the response
|
(Optional) Truncation strategy applied to the response
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIResponseUsage'
|
||||||
|
description: >-
|
||||||
|
(Optional) Token usage information for the response
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
required:
|
required:
|
||||||
- created_at
|
- created_at
|
||||||
|
|
120
docs/static/llama-stack-spec.html
vendored
120
docs/static/llama-stack-spec.html
vendored
|
@ -4277,6 +4277,10 @@
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
},
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information for the completion"
|
||||||
|
},
|
||||||
"input_messages": {
|
"input_messages": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {
|
"items": {
|
||||||
|
@ -4479,6 +4483,55 @@
|
||||||
"title": "OpenAIChatCompletionToolCallFunction",
|
"title": "OpenAIChatCompletionToolCallFunction",
|
||||||
"description": "Function call details for OpenAI-compatible tool calls."
|
"description": "Function call details for OpenAI-compatible tool calls."
|
||||||
},
|
},
|
||||||
|
"OpenAIChatCompletionUsage": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"prompt_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the prompt"
|
||||||
|
},
|
||||||
|
"completion_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the completion"
|
||||||
|
},
|
||||||
|
"total_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Total tokens used (prompt + completion)"
|
||||||
|
},
|
||||||
|
"prompt_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"cached_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens retrieved from cache"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"title": "OpenAIChatCompletionUsagePromptTokensDetails",
|
||||||
|
"description": "Token details for prompt tokens in OpenAI chat completion usage."
|
||||||
|
},
|
||||||
|
"completion_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"reasoning_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens used for reasoning (o1/o3 models)"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"title": "OpenAIChatCompletionUsageCompletionTokensDetails",
|
||||||
|
"description": "Token details for output tokens in OpenAI chat completion usage."
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"required": [
|
||||||
|
"prompt_tokens",
|
||||||
|
"completion_tokens",
|
||||||
|
"total_tokens"
|
||||||
|
],
|
||||||
|
"title": "OpenAIChatCompletionUsage",
|
||||||
|
"description": "Usage information for OpenAI chat completion."
|
||||||
|
},
|
||||||
"OpenAIChoice": {
|
"OpenAIChoice": {
|
||||||
"type": "object",
|
"type": "object",
|
||||||
"properties": {
|
"properties": {
|
||||||
|
@ -5241,6 +5294,10 @@
|
||||||
"model": {
|
"model": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information for the completion"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"additionalProperties": false,
|
"additionalProperties": false,
|
||||||
|
@ -5281,6 +5338,10 @@
|
||||||
"model": {
|
"model": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information (typically included in final chunk with stream_options)"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"additionalProperties": false,
|
"additionalProperties": false,
|
||||||
|
@ -5378,6 +5439,10 @@
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
},
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information for the completion"
|
||||||
|
},
|
||||||
"input_messages": {
|
"input_messages": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {
|
"items": {
|
||||||
|
@ -7503,6 +7568,10 @@
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "(Optional) Truncation strategy applied to the response"
|
"description": "(Optional) Truncation strategy applied to the response"
|
||||||
},
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIResponseUsage",
|
||||||
|
"description": "(Optional) Token usage information for the response"
|
||||||
|
},
|
||||||
"input": {
|
"input": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {
|
"items": {
|
||||||
|
@ -7636,6 +7705,53 @@
|
||||||
"title": "OpenAIResponseText",
|
"title": "OpenAIResponseText",
|
||||||
"description": "Text response configuration for OpenAI responses."
|
"description": "Text response configuration for OpenAI responses."
|
||||||
},
|
},
|
||||||
|
"OpenAIResponseUsage": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"input_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the input"
|
||||||
|
},
|
||||||
|
"output_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the output"
|
||||||
|
},
|
||||||
|
"total_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Total tokens used (input + output)"
|
||||||
|
},
|
||||||
|
"input_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"cached_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens retrieved from cache"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"description": "Detailed breakdown of input token usage"
|
||||||
|
},
|
||||||
|
"output_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"reasoning_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens used for reasoning (o1/o3 models)"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"description": "Detailed breakdown of output token usage"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"required": [
|
||||||
|
"input_tokens",
|
||||||
|
"output_tokens",
|
||||||
|
"total_tokens"
|
||||||
|
],
|
||||||
|
"title": "OpenAIResponseUsage",
|
||||||
|
"description": "Usage information for OpenAI response."
|
||||||
|
},
|
||||||
"ResponseShieldSpec": {
|
"ResponseShieldSpec": {
|
||||||
"type": "object",
|
"type": "object",
|
||||||
"properties": {
|
"properties": {
|
||||||
|
@ -8078,6 +8194,10 @@
|
||||||
"truncation": {
|
"truncation": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "(Optional) Truncation strategy applied to the response"
|
"description": "(Optional) Truncation strategy applied to the response"
|
||||||
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIResponseUsage",
|
||||||
|
"description": "(Optional) Token usage information for the response"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"additionalProperties": false,
|
"additionalProperties": false,
|
||||||
|
|
103
docs/static/llama-stack-spec.yaml
vendored
103
docs/static/llama-stack-spec.yaml
vendored
|
@ -3248,6 +3248,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information for the completion
|
||||||
input_messages:
|
input_messages:
|
||||||
type: array
|
type: array
|
||||||
items:
|
items:
|
||||||
|
@ -3414,6 +3418,49 @@ components:
|
||||||
title: OpenAIChatCompletionToolCallFunction
|
title: OpenAIChatCompletionToolCallFunction
|
||||||
description: >-
|
description: >-
|
||||||
Function call details for OpenAI-compatible tool calls.
|
Function call details for OpenAI-compatible tool calls.
|
||||||
|
OpenAIChatCompletionUsage:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
prompt_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the prompt
|
||||||
|
completion_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the completion
|
||||||
|
total_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Total tokens used (prompt + completion)
|
||||||
|
prompt_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
cached_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens retrieved from cache
|
||||||
|
additionalProperties: false
|
||||||
|
title: >-
|
||||||
|
OpenAIChatCompletionUsagePromptTokensDetails
|
||||||
|
description: >-
|
||||||
|
Token details for prompt tokens in OpenAI chat completion usage.
|
||||||
|
completion_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
reasoning_tokens:
|
||||||
|
type: integer
|
||||||
|
description: >-
|
||||||
|
Number of tokens used for reasoning (o1/o3 models)
|
||||||
|
additionalProperties: false
|
||||||
|
title: >-
|
||||||
|
OpenAIChatCompletionUsageCompletionTokensDetails
|
||||||
|
description: >-
|
||||||
|
Token details for output tokens in OpenAI chat completion usage.
|
||||||
|
additionalProperties: false
|
||||||
|
required:
|
||||||
|
- prompt_tokens
|
||||||
|
- completion_tokens
|
||||||
|
- total_tokens
|
||||||
|
title: OpenAIChatCompletionUsage
|
||||||
|
description: >-
|
||||||
|
Usage information for OpenAI chat completion.
|
||||||
OpenAIChoice:
|
OpenAIChoice:
|
||||||
type: object
|
type: object
|
||||||
properties:
|
properties:
|
||||||
|
@ -3945,6 +3992,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information for the completion
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
|
@ -3980,6 +4031,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information (typically included in final chunk with stream_options)
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
|
@ -4060,6 +4115,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information for the completion
|
||||||
input_messages:
|
input_messages:
|
||||||
type: array
|
type: array
|
||||||
items:
|
items:
|
||||||
|
@ -5700,6 +5759,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
(Optional) Truncation strategy applied to the response
|
(Optional) Truncation strategy applied to the response
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIResponseUsage'
|
||||||
|
description: >-
|
||||||
|
(Optional) Token usage information for the response
|
||||||
input:
|
input:
|
||||||
type: array
|
type: array
|
||||||
items:
|
items:
|
||||||
|
@ -5791,6 +5854,42 @@ components:
|
||||||
title: OpenAIResponseText
|
title: OpenAIResponseText
|
||||||
description: >-
|
description: >-
|
||||||
Text response configuration for OpenAI responses.
|
Text response configuration for OpenAI responses.
|
||||||
|
OpenAIResponseUsage:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
input_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the input
|
||||||
|
output_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the output
|
||||||
|
total_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Total tokens used (input + output)
|
||||||
|
input_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
cached_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens retrieved from cache
|
||||||
|
additionalProperties: false
|
||||||
|
description: Detailed breakdown of input token usage
|
||||||
|
output_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
reasoning_tokens:
|
||||||
|
type: integer
|
||||||
|
description: >-
|
||||||
|
Number of tokens used for reasoning (o1/o3 models)
|
||||||
|
additionalProperties: false
|
||||||
|
description: Detailed breakdown of output token usage
|
||||||
|
additionalProperties: false
|
||||||
|
required:
|
||||||
|
- input_tokens
|
||||||
|
- output_tokens
|
||||||
|
- total_tokens
|
||||||
|
title: OpenAIResponseUsage
|
||||||
|
description: Usage information for OpenAI response.
|
||||||
ResponseShieldSpec:
|
ResponseShieldSpec:
|
||||||
type: object
|
type: object
|
||||||
properties:
|
properties:
|
||||||
|
@ -6117,6 +6216,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
(Optional) Truncation strategy applied to the response
|
(Optional) Truncation strategy applied to the response
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIResponseUsage'
|
||||||
|
description: >-
|
||||||
|
(Optional) Token usage information for the response
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
required:
|
required:
|
||||||
- created_at
|
- created_at
|
||||||
|
|
120
docs/static/stainless-llama-stack-spec.html
vendored
120
docs/static/stainless-llama-stack-spec.html
vendored
|
@ -6286,6 +6286,10 @@
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
},
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information for the completion"
|
||||||
|
},
|
||||||
"input_messages": {
|
"input_messages": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {
|
"items": {
|
||||||
|
@ -6488,6 +6492,55 @@
|
||||||
"title": "OpenAIChatCompletionToolCallFunction",
|
"title": "OpenAIChatCompletionToolCallFunction",
|
||||||
"description": "Function call details for OpenAI-compatible tool calls."
|
"description": "Function call details for OpenAI-compatible tool calls."
|
||||||
},
|
},
|
||||||
|
"OpenAIChatCompletionUsage": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"prompt_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the prompt"
|
||||||
|
},
|
||||||
|
"completion_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the completion"
|
||||||
|
},
|
||||||
|
"total_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Total tokens used (prompt + completion)"
|
||||||
|
},
|
||||||
|
"prompt_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"cached_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens retrieved from cache"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"title": "OpenAIChatCompletionUsagePromptTokensDetails",
|
||||||
|
"description": "Token details for prompt tokens in OpenAI chat completion usage."
|
||||||
|
},
|
||||||
|
"completion_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"reasoning_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens used for reasoning (o1/o3 models)"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"title": "OpenAIChatCompletionUsageCompletionTokensDetails",
|
||||||
|
"description": "Token details for output tokens in OpenAI chat completion usage."
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"required": [
|
||||||
|
"prompt_tokens",
|
||||||
|
"completion_tokens",
|
||||||
|
"total_tokens"
|
||||||
|
],
|
||||||
|
"title": "OpenAIChatCompletionUsage",
|
||||||
|
"description": "Usage information for OpenAI chat completion."
|
||||||
|
},
|
||||||
"OpenAIChoice": {
|
"OpenAIChoice": {
|
||||||
"type": "object",
|
"type": "object",
|
||||||
"properties": {
|
"properties": {
|
||||||
|
@ -7250,6 +7303,10 @@
|
||||||
"model": {
|
"model": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information for the completion"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"additionalProperties": false,
|
"additionalProperties": false,
|
||||||
|
@ -7290,6 +7347,10 @@
|
||||||
"model": {
|
"model": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information (typically included in final chunk with stream_options)"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"additionalProperties": false,
|
"additionalProperties": false,
|
||||||
|
@ -7387,6 +7448,10 @@
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "The model that was used to generate the chat completion"
|
"description": "The model that was used to generate the chat completion"
|
||||||
},
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIChatCompletionUsage",
|
||||||
|
"description": "Token usage information for the completion"
|
||||||
|
},
|
||||||
"input_messages": {
|
"input_messages": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {
|
"items": {
|
||||||
|
@ -9512,6 +9577,10 @@
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "(Optional) Truncation strategy applied to the response"
|
"description": "(Optional) Truncation strategy applied to the response"
|
||||||
},
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIResponseUsage",
|
||||||
|
"description": "(Optional) Token usage information for the response"
|
||||||
|
},
|
||||||
"input": {
|
"input": {
|
||||||
"type": "array",
|
"type": "array",
|
||||||
"items": {
|
"items": {
|
||||||
|
@ -9645,6 +9714,53 @@
|
||||||
"title": "OpenAIResponseText",
|
"title": "OpenAIResponseText",
|
||||||
"description": "Text response configuration for OpenAI responses."
|
"description": "Text response configuration for OpenAI responses."
|
||||||
},
|
},
|
||||||
|
"OpenAIResponseUsage": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"input_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the input"
|
||||||
|
},
|
||||||
|
"output_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens in the output"
|
||||||
|
},
|
||||||
|
"total_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Total tokens used (input + output)"
|
||||||
|
},
|
||||||
|
"input_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"cached_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens retrieved from cache"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"description": "Detailed breakdown of input token usage"
|
||||||
|
},
|
||||||
|
"output_tokens_details": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {
|
||||||
|
"reasoning_tokens": {
|
||||||
|
"type": "integer",
|
||||||
|
"description": "Number of tokens used for reasoning (o1/o3 models)"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"description": "Detailed breakdown of output token usage"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"additionalProperties": false,
|
||||||
|
"required": [
|
||||||
|
"input_tokens",
|
||||||
|
"output_tokens",
|
||||||
|
"total_tokens"
|
||||||
|
],
|
||||||
|
"title": "OpenAIResponseUsage",
|
||||||
|
"description": "Usage information for OpenAI response."
|
||||||
|
},
|
||||||
"ResponseShieldSpec": {
|
"ResponseShieldSpec": {
|
||||||
"type": "object",
|
"type": "object",
|
||||||
"properties": {
|
"properties": {
|
||||||
|
@ -10087,6 +10203,10 @@
|
||||||
"truncation": {
|
"truncation": {
|
||||||
"type": "string",
|
"type": "string",
|
||||||
"description": "(Optional) Truncation strategy applied to the response"
|
"description": "(Optional) Truncation strategy applied to the response"
|
||||||
|
},
|
||||||
|
"usage": {
|
||||||
|
"$ref": "#/components/schemas/OpenAIResponseUsage",
|
||||||
|
"description": "(Optional) Token usage information for the response"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"additionalProperties": false,
|
"additionalProperties": false,
|
||||||
|
|
103
docs/static/stainless-llama-stack-spec.yaml
vendored
103
docs/static/stainless-llama-stack-spec.yaml
vendored
|
@ -4693,6 +4693,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information for the completion
|
||||||
input_messages:
|
input_messages:
|
||||||
type: array
|
type: array
|
||||||
items:
|
items:
|
||||||
|
@ -4859,6 +4863,49 @@ components:
|
||||||
title: OpenAIChatCompletionToolCallFunction
|
title: OpenAIChatCompletionToolCallFunction
|
||||||
description: >-
|
description: >-
|
||||||
Function call details for OpenAI-compatible tool calls.
|
Function call details for OpenAI-compatible tool calls.
|
||||||
|
OpenAIChatCompletionUsage:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
prompt_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the prompt
|
||||||
|
completion_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the completion
|
||||||
|
total_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Total tokens used (prompt + completion)
|
||||||
|
prompt_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
cached_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens retrieved from cache
|
||||||
|
additionalProperties: false
|
||||||
|
title: >-
|
||||||
|
OpenAIChatCompletionUsagePromptTokensDetails
|
||||||
|
description: >-
|
||||||
|
Token details for prompt tokens in OpenAI chat completion usage.
|
||||||
|
completion_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
reasoning_tokens:
|
||||||
|
type: integer
|
||||||
|
description: >-
|
||||||
|
Number of tokens used for reasoning (o1/o3 models)
|
||||||
|
additionalProperties: false
|
||||||
|
title: >-
|
||||||
|
OpenAIChatCompletionUsageCompletionTokensDetails
|
||||||
|
description: >-
|
||||||
|
Token details for output tokens in OpenAI chat completion usage.
|
||||||
|
additionalProperties: false
|
||||||
|
required:
|
||||||
|
- prompt_tokens
|
||||||
|
- completion_tokens
|
||||||
|
- total_tokens
|
||||||
|
title: OpenAIChatCompletionUsage
|
||||||
|
description: >-
|
||||||
|
Usage information for OpenAI chat completion.
|
||||||
OpenAIChoice:
|
OpenAIChoice:
|
||||||
type: object
|
type: object
|
||||||
properties:
|
properties:
|
||||||
|
@ -5390,6 +5437,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information for the completion
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
|
@ -5425,6 +5476,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information (typically included in final chunk with stream_options)
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
required:
|
required:
|
||||||
- id
|
- id
|
||||||
|
@ -5505,6 +5560,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
The model that was used to generate the chat completion
|
The model that was used to generate the chat completion
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIChatCompletionUsage'
|
||||||
|
description: >-
|
||||||
|
Token usage information for the completion
|
||||||
input_messages:
|
input_messages:
|
||||||
type: array
|
type: array
|
||||||
items:
|
items:
|
||||||
|
@ -7145,6 +7204,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
(Optional) Truncation strategy applied to the response
|
(Optional) Truncation strategy applied to the response
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIResponseUsage'
|
||||||
|
description: >-
|
||||||
|
(Optional) Token usage information for the response
|
||||||
input:
|
input:
|
||||||
type: array
|
type: array
|
||||||
items:
|
items:
|
||||||
|
@ -7236,6 +7299,42 @@ components:
|
||||||
title: OpenAIResponseText
|
title: OpenAIResponseText
|
||||||
description: >-
|
description: >-
|
||||||
Text response configuration for OpenAI responses.
|
Text response configuration for OpenAI responses.
|
||||||
|
OpenAIResponseUsage:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
input_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the input
|
||||||
|
output_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens in the output
|
||||||
|
total_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Total tokens used (input + output)
|
||||||
|
input_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
cached_tokens:
|
||||||
|
type: integer
|
||||||
|
description: Number of tokens retrieved from cache
|
||||||
|
additionalProperties: false
|
||||||
|
description: Detailed breakdown of input token usage
|
||||||
|
output_tokens_details:
|
||||||
|
type: object
|
||||||
|
properties:
|
||||||
|
reasoning_tokens:
|
||||||
|
type: integer
|
||||||
|
description: >-
|
||||||
|
Number of tokens used for reasoning (o1/o3 models)
|
||||||
|
additionalProperties: false
|
||||||
|
description: Detailed breakdown of output token usage
|
||||||
|
additionalProperties: false
|
||||||
|
required:
|
||||||
|
- input_tokens
|
||||||
|
- output_tokens
|
||||||
|
- total_tokens
|
||||||
|
title: OpenAIResponseUsage
|
||||||
|
description: Usage information for OpenAI response.
|
||||||
ResponseShieldSpec:
|
ResponseShieldSpec:
|
||||||
type: object
|
type: object
|
||||||
properties:
|
properties:
|
||||||
|
@ -7562,6 +7661,10 @@ components:
|
||||||
type: string
|
type: string
|
||||||
description: >-
|
description: >-
|
||||||
(Optional) Truncation strategy applied to the response
|
(Optional) Truncation strategy applied to the response
|
||||||
|
usage:
|
||||||
|
$ref: '#/components/schemas/OpenAIResponseUsage'
|
||||||
|
description: >-
|
||||||
|
(Optional) Token usage information for the response
|
||||||
additionalProperties: false
|
additionalProperties: false
|
||||||
required:
|
required:
|
||||||
- created_at
|
- created_at
|
||||||
|
|
|
@ -346,6 +346,42 @@ class OpenAIResponseText(BaseModel):
|
||||||
format: OpenAIResponseTextFormat | None = None
|
format: OpenAIResponseTextFormat | None = None
|
||||||
|
|
||||||
|
|
||||||
|
class OpenAIResponseUsageOutputTokensDetails(BaseModel):
|
||||||
|
"""Token details for output tokens in OpenAI response usage.
|
||||||
|
|
||||||
|
:param reasoning_tokens: Number of tokens used for reasoning (o1/o3 models)
|
||||||
|
"""
|
||||||
|
|
||||||
|
reasoning_tokens: int | None = None
|
||||||
|
|
||||||
|
|
||||||
|
class OpenAIResponseUsageInputTokensDetails(BaseModel):
|
||||||
|
"""Token details for input tokens in OpenAI response usage.
|
||||||
|
|
||||||
|
:param cached_tokens: Number of tokens retrieved from cache
|
||||||
|
"""
|
||||||
|
|
||||||
|
cached_tokens: int | None = None
|
||||||
|
|
||||||
|
|
||||||
|
@json_schema_type
|
||||||
|
class OpenAIResponseUsage(BaseModel):
|
||||||
|
"""Usage information for OpenAI response.
|
||||||
|
|
||||||
|
:param input_tokens: Number of tokens in the input
|
||||||
|
:param output_tokens: Number of tokens in the output
|
||||||
|
:param total_tokens: Total tokens used (input + output)
|
||||||
|
:param input_tokens_details: Detailed breakdown of input token usage
|
||||||
|
:param output_tokens_details: Detailed breakdown of output token usage
|
||||||
|
"""
|
||||||
|
|
||||||
|
input_tokens: int
|
||||||
|
output_tokens: int
|
||||||
|
total_tokens: int
|
||||||
|
input_tokens_details: OpenAIResponseUsageInputTokensDetails | None = None
|
||||||
|
output_tokens_details: OpenAIResponseUsageOutputTokensDetails | None = None
|
||||||
|
|
||||||
|
|
||||||
@json_schema_type
|
@json_schema_type
|
||||||
class OpenAIResponseObject(BaseModel):
|
class OpenAIResponseObject(BaseModel):
|
||||||
"""Complete OpenAI response object containing generation results and metadata.
|
"""Complete OpenAI response object containing generation results and metadata.
|
||||||
|
@ -363,6 +399,7 @@ class OpenAIResponseObject(BaseModel):
|
||||||
:param text: Text formatting configuration for the response
|
:param text: Text formatting configuration for the response
|
||||||
:param top_p: (Optional) Nucleus sampling parameter used for generation
|
:param top_p: (Optional) Nucleus sampling parameter used for generation
|
||||||
:param truncation: (Optional) Truncation strategy applied to the response
|
:param truncation: (Optional) Truncation strategy applied to the response
|
||||||
|
:param usage: (Optional) Token usage information for the response
|
||||||
"""
|
"""
|
||||||
|
|
||||||
created_at: int
|
created_at: int
|
||||||
|
@ -380,6 +417,7 @@ class OpenAIResponseObject(BaseModel):
|
||||||
text: OpenAIResponseText = OpenAIResponseText(format=OpenAIResponseTextFormat(type="text"))
|
text: OpenAIResponseText = OpenAIResponseText(format=OpenAIResponseTextFormat(type="text"))
|
||||||
top_p: float | None = None
|
top_p: float | None = None
|
||||||
truncation: str | None = None
|
truncation: str | None = None
|
||||||
|
usage: OpenAIResponseUsage | None = None
|
||||||
|
|
||||||
|
|
||||||
@json_schema_type
|
@json_schema_type
|
||||||
|
|
|
@ -816,6 +816,42 @@ class OpenAIChoice(BaseModel):
|
||||||
logprobs: OpenAIChoiceLogprobs | None = None
|
logprobs: OpenAIChoiceLogprobs | None = None
|
||||||
|
|
||||||
|
|
||||||
|
class OpenAIChatCompletionUsageCompletionTokensDetails(BaseModel):
|
||||||
|
"""Token details for output tokens in OpenAI chat completion usage.
|
||||||
|
|
||||||
|
:param reasoning_tokens: Number of tokens used for reasoning (o1/o3 models)
|
||||||
|
"""
|
||||||
|
|
||||||
|
reasoning_tokens: int | None = None
|
||||||
|
|
||||||
|
|
||||||
|
class OpenAIChatCompletionUsagePromptTokensDetails(BaseModel):
|
||||||
|
"""Token details for prompt tokens in OpenAI chat completion usage.
|
||||||
|
|
||||||
|
:param cached_tokens: Number of tokens retrieved from cache
|
||||||
|
"""
|
||||||
|
|
||||||
|
cached_tokens: int | None = None
|
||||||
|
|
||||||
|
|
||||||
|
@json_schema_type
|
||||||
|
class OpenAIChatCompletionUsage(BaseModel):
|
||||||
|
"""Usage information for OpenAI chat completion.
|
||||||
|
|
||||||
|
:param prompt_tokens: Number of tokens in the prompt
|
||||||
|
:param completion_tokens: Number of tokens in the completion
|
||||||
|
:param total_tokens: Total tokens used (prompt + completion)
|
||||||
|
:param input_tokens_details: Detailed breakdown of input token usage
|
||||||
|
:param output_tokens_details: Detailed breakdown of output token usage
|
||||||
|
"""
|
||||||
|
|
||||||
|
prompt_tokens: int
|
||||||
|
completion_tokens: int
|
||||||
|
total_tokens: int
|
||||||
|
prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails | None = None
|
||||||
|
completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails | None = None
|
||||||
|
|
||||||
|
|
||||||
@json_schema_type
|
@json_schema_type
|
||||||
class OpenAIChatCompletion(BaseModel):
|
class OpenAIChatCompletion(BaseModel):
|
||||||
"""Response from an OpenAI-compatible chat completion request.
|
"""Response from an OpenAI-compatible chat completion request.
|
||||||
|
@ -825,6 +861,7 @@ class OpenAIChatCompletion(BaseModel):
|
||||||
:param object: The object type, which will be "chat.completion"
|
:param object: The object type, which will be "chat.completion"
|
||||||
:param created: The Unix timestamp in seconds when the chat completion was created
|
:param created: The Unix timestamp in seconds when the chat completion was created
|
||||||
:param model: The model that was used to generate the chat completion
|
:param model: The model that was used to generate the chat completion
|
||||||
|
:param usage: Token usage information for the completion
|
||||||
"""
|
"""
|
||||||
|
|
||||||
id: str
|
id: str
|
||||||
|
@ -832,6 +869,7 @@ class OpenAIChatCompletion(BaseModel):
|
||||||
object: Literal["chat.completion"] = "chat.completion"
|
object: Literal["chat.completion"] = "chat.completion"
|
||||||
created: int
|
created: int
|
||||||
model: str
|
model: str
|
||||||
|
usage: OpenAIChatCompletionUsage | None = None
|
||||||
|
|
||||||
|
|
||||||
@json_schema_type
|
@json_schema_type
|
||||||
|
@ -843,6 +881,7 @@ class OpenAIChatCompletionChunk(BaseModel):
|
||||||
:param object: The object type, which will be "chat.completion.chunk"
|
:param object: The object type, which will be "chat.completion.chunk"
|
||||||
:param created: The Unix timestamp in seconds when the chat completion was created
|
:param created: The Unix timestamp in seconds when the chat completion was created
|
||||||
:param model: The model that was used to generate the chat completion
|
:param model: The model that was used to generate the chat completion
|
||||||
|
:param usage: Token usage information (typically included in final chunk with stream_options)
|
||||||
"""
|
"""
|
||||||
|
|
||||||
id: str
|
id: str
|
||||||
|
@ -850,6 +889,7 @@ class OpenAIChatCompletionChunk(BaseModel):
|
||||||
object: Literal["chat.completion.chunk"] = "chat.completion.chunk"
|
object: Literal["chat.completion.chunk"] = "chat.completion.chunk"
|
||||||
created: int
|
created: int
|
||||||
model: str
|
model: str
|
||||||
|
usage: OpenAIChatCompletionUsage | None = None
|
||||||
|
|
||||||
|
|
||||||
@json_schema_type
|
@json_schema_type
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue