Litellm dev 12 12 2024 (#7203)

* fix(azure/): support passing headers to azure openai endpoints Fixes https://github.com/BerriAI/litellm/issues/6217 * fix(utils.py): move default tokenizer to just openai hf tokenizer makes network calls when trying to get the tokenizer - this slows down execution time calls * fix(router.py): fix pattern matching router - add generic "*" to it as well Fixes issue where generic "*" model access group wouldn't show up * fix(pattern_match_deployments.py): match to more specific pattern match to more specific pattern allows setting generic wildcard model access group and excluding specific models more easily * fix(proxy_server.py): fix _delete_deployment to handle base case where db_model list is empty don't delete all router models b/c of empty list Fixes https://github.com/BerriAI/litellm/issues/7196 * fix(anthropic/): fix handling response_format for anthropic messages with anthropic api * fix(fireworks_ai/): support passing response_format + tool call in same message Addresses https://github.com/BerriAI/litellm/issues/7135 * Revert "fix(fireworks_ai/): support passing response_format + tool call in same message" This reverts commit 6a30dc6929. * test: fix test * fix(replicate/): fix replicate default retry/polling logic * test: add unit testing for router pattern matching * test: update test to use default oai tokenizer * test: mark flaky test * test: skip flaky test
2025-04-25 18:54:30 +00:00 · 2024-12-13 08:54:03 -08:00 · 2024-12-13 08:54:03 -08:00 · a42f008cd0
commit a42f008cd0
parent e65f990319
19 changed files with 496 additions and 103 deletions
--- a/litellm/constants.py
+++ b/litellm/constants.py
@ -2,6 +2,8 @@ ROUTER_MAX_FALLBACKS = 5
 DEFAULT_BATCH_SIZE = 512
 DEFAULT_FLUSH_INTERVAL_SECONDS = 5
 DEFAULT_MAX_RETRIES = 2
+DEFAULT_REPLICATE_POLLING_RETRIES = 5
+DEFAULT_REPLICATE_POLLING_DELAY_SECONDS = 1
 DEFAULT_IMAGE_TOKEN_COUNT = 250
 DEFAULT_IMAGE_WIDTH = 300
 DEFAULT_IMAGE_HEIGHT = 300
@ -67,6 +69,7 @@ LITELLM_CHAT_PROVIDERS = [
    "galadriel",
 ]

+RESPONSE_FORMAT_TOOL_NAME = "json_tool_call"  # default tool name used when converting response format to tool call

 ########################### LiteLLM Proxy Specific Constants ###########################
 MAX_SPENDLOG_ROWS_TO_QUERY = (
@ -74,4 +77,3 @@ MAX_SPENDLOG_ROWS_TO_QUERY = (
 )
 # makes it clear this is a rate limit error for a litellm virtual key
 RATE_LIMIT_ERROR_MESSAGE_FOR_VIRTUAL_KEY = "LiteLLM Virtual Key user_api_key_hash"
-