(feat) openai prompt caching (non streaming) - add prompt_tokens_details in usage response (#6039)

* add prompt_tokens_details in usage response

* use _prompt_tokens_details as a param in Usage

* fix linting errors

* fix type error

* fix ci/cd deps

* bump deps for openai

* bump deps openai

* fix llm translation testing

* fix llm translation embedding
This commit is contained in:
Ishaan Jaff 2024-10-03 11:01:10 -07:00 committed by GitHub
parent 9fccb4a0da
commit 4e88fd65e1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 1515 additions and 1428 deletions

View file

@ -1022,10 +1022,11 @@ class Huggingface(BaseLLM):
model_response,
"usage",
litellm.Usage(
**{
"prompt_tokens": input_tokens,
"total_tokens": input_tokens,
}
prompt_tokens=input_tokens,
completion_tokens=input_tokens,
total_tokens=input_tokens,
prompt_tokens_details=None,
completion_tokens_details=None,
),
)
return model_response