Support caching on reasoning content + other fixes (#8973)

* fix(factory.py): pass on anthropic thinking content from assistant call * fix(factory.py): fix anthropic messages to handle thinking blocks Fixes https://github.com/BerriAI/litellm/issues/8961 * fix(factory.py): fix bedrock handling for assistant content in messages Fixes https://github.com/BerriAI/litellm/issues/8961 * feat(convert_dict_to_response.py): handle reasoning content + thinking blocks in chat completion block ensures caching works for anthropic thinking block * fix(convert_dict_to_response.py): pass all message params to delta block ensures streaming delta also contains the reasoning content / thinking block * test(test_prompt_factory.py): remove redundant test anthropic now supports assistant as the first message * fix(factory.py): fix linting errors * fix: fix code qa * test: remove falsy test * fix(litellm_logging.py): fix str conversion
2025-04-25 02:34:29 +00:00 · 2025-03-04 21:12:16 -08:00 · 2025-03-04 21:12:16 -08:00 · 662c59adcf
commit 662c59adcf
parent 4c8b4fefc9
11 changed files with 230 additions and 50 deletions
--- a/litellm/utils.py
+++ b/litellm/utils.py
@ -1048,6 +1048,7 @@ def client(original_function):  # noqa: PLR0915
                )

                if caching_handler_response.cached_result is not None:
+                    verbose_logger.debug("Cache hit!")
                    return caching_handler_response.cached_result

            # CHECK MAX TOKENS