Support caching on reasoning content + other fixes (#8973)

* fix(factory.py): pass on anthropic thinking content from assistant call

* fix(factory.py): fix anthropic messages to handle thinking blocks

Fixes https://github.com/BerriAI/litellm/issues/8961

* fix(factory.py): fix bedrock handling for assistant content in messages

Fixes https://github.com/BerriAI/litellm/issues/8961

* feat(convert_dict_to_response.py): handle reasoning content + thinking blocks in chat completion block

ensures caching works for anthropic thinking block

* fix(convert_dict_to_response.py): pass all message params to delta block

ensures streaming delta also contains the reasoning content / thinking block

* test(test_prompt_factory.py): remove redundant test

anthropic now supports assistant as the first message

* fix(factory.py): fix linting errors

* fix: fix code qa

* test: remove falsy test

* fix(litellm_logging.py): fix str conversion
This commit is contained in:
Krish Dholakia 2025-03-04 21:12:16 -08:00 committed by GitHub
parent 4c8b4fefc9
commit 662c59adcf
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 230 additions and 50 deletions

View file

@ -1048,6 +1048,7 @@ def client(original_function): # noqa: PLR0915
)
if caching_handler_response.cached_result is not None:
verbose_logger.debug("Cache hit!")
return caching_handler_response.cached_result
# CHECK MAX TOKENS