LiteLLM Minor Fixes & Improvements (10/15/2024) (#6242)

* feat(litellm_pre_call_utils.py): support forwarding request headers to backend llm api * fix(litellm_pre_call_utils.py): handle custom litellm key header * test(router_code_coverage.py): check if all router functions are dire… (#6186) * test(router_code_coverage.py): check if all router functions are directly tested prevent regressions * docs(configs.md): document all environment variables (#6185) * docs: make it easier to find anthropic/openai prompt caching doc * aded codecov yml (#6207) * fix codecov.yaml * run ci/cd again * (refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * fix test_embedding_caching_azure_individual_items_reordered * (feat) prometheus have well defined latency buckets (#6211) * fix prometheus have well defined latency buckets * use a well define latency bucket * use types file for prometheus logging * add test for LATENCY_BUCKETS * fix prom testing * fix config.yml * (refactor caching) use LLMCachingHandler for caching streaming responses (#6210) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * refactor async set stream cache * fix linting * bump (#6187) * update code cov yaml * fix config.yml * add caching component to code cov * fix config.yml ci/cd * add coverage for proxy auth * (refactor caching) use common `_retrieve_from_cache` helper (#6212) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * refactor async set stream cache * fix linting * refactor - use _retrieve_from_cache * refactor use _convert_cached_result_to_model_response * fix linting errors * bump: version 1.49.2 → 1.49.3 * fix code cov components * test(test_router_helpers.py): add router component unit tests * test: add additional router tests * test: add more router testing * test: add more router testing + more mock functions * ci(router_code_coverage.py): fix check --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: yujonglee <yujonglee.dev@gmail.com> * bump: version 1.49.3 → 1.49.4 * (refactor) use helper function `_assemble_complete_response_from_streaming_chunks` to assemble complete responses in caching and logging callbacks (#6220) * (refactor) use _assemble_complete_response_from_streaming_chunks * add unit test for test_assemble_complete_response_from_streaming_chunks_1 * fix assemble complete_streaming_response * config add logging_testing * add logging_coverage in codecov * test test_assemble_complete_response_from_streaming_chunks_3 * add unit tests for _assemble_complete_response_from_streaming_chunks * fix remove unused / junk function * add test for streaming_chunks when error assembling * (refactor) OTEL - use safe_set_attribute for setting attributes (#6226) * otel - use safe_set_attribute for setting attributes * fix OTEL only use safe_set_attribute * (fix) prompt caching cost calculation OpenAI, Azure OpenAI (#6231) * fix prompt caching cost calculation * fix testing for prompt cache cost calc * fix(allowed_model_region): allow us as allowed region (#6234) * test(router_code_coverage.py): check if all router functions are dire… (#6186) * test(router_code_coverage.py): check if all router functions are directly tested prevent regressions * docs(configs.md): document all environment variables (#6185) * docs: make it easier to find anthropic/openai prompt caching doc * aded codecov yml (#6207) * fix codecov.yaml * run ci/cd again * (refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * fix test_embedding_caching_azure_individual_items_reordered * (feat) prometheus have well defined latency buckets (#6211) * fix prometheus have well defined latency buckets * use a well define latency bucket * use types file for prometheus logging * add test for LATENCY_BUCKETS * fix prom testing * fix config.yml * (refactor caching) use LLMCachingHandler for caching streaming responses (#6210) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * refactor async set stream cache * fix linting * bump (#6187) * update code cov yaml * fix config.yml * add caching component to code cov * fix config.yml ci/cd * add coverage for proxy auth * (refactor caching) use common `_retrieve_from_cache` helper (#6212) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * refactor async set stream cache * fix linting * refactor - use _retrieve_from_cache * refactor use _convert_cached_result_to_model_response * fix linting errors * bump: version 1.49.2 → 1.49.3 * fix code cov components * test(test_router_helpers.py): add router component unit tests * test: add additional router tests * test: add more router testing * test: add more router testing + more mock functions * ci(router_code_coverage.py): fix check --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: yujonglee <yujonglee.dev@gmail.com> * bump: version 1.49.3 → 1.49.4 * (refactor) use helper function `_assemble_complete_response_from_streaming_chunks` to assemble complete responses in caching and logging callbacks (#6220) * (refactor) use _assemble_complete_response_from_streaming_chunks * add unit test for test_assemble_complete_response_from_streaming_chunks_1 * fix assemble complete_streaming_response * config add logging_testing * add logging_coverage in codecov * test test_assemble_complete_response_from_streaming_chunks_3 * add unit tests for _assemble_complete_response_from_streaming_chunks * fix remove unused / junk function * add test for streaming_chunks when error assembling * (refactor) OTEL - use safe_set_attribute for setting attributes (#6226) * otel - use safe_set_attribute for setting attributes * fix OTEL only use safe_set_attribute * fix(allowed_model_region): allow us as allowed region --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: yujonglee <yujonglee.dev@gmail.com> * fix(litellm_pre_call_utils.py): support 'us' region routing + fix header forwarding to filter on `x-` headers * docs(customer_routing.md): fix region-based routing example * feat(azure.py): handle empty arguments function call - azure Closes https://github.com/BerriAI/litellm/issues/6241 * feat(guardrails_ai.py): support guardrails ai integration Adds support for on-prem guardrails via guardrails ai * fix(proxy/utils.py): prevent sql injection attack Fixes https://huntr.com/bounties/a4f6d357-5b44-4e00-9cac-f1cc351211d2 * fix: fix linting errors * fix(litellm_pre_call_utils.py): don't log litellm api key in proxy server request headers * fix(litellm_pre_call_utils.py): don't forward stainless headers * docs(guardrails_ai.md): add guardrails ai quick start to docs * test: handle flaky test --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: yujonglee <yujonglee.dev@gmail.com> Co-authored-by: Marcus Elwin <marcus@elwin.com>
2025-04-25 10:44:24 +00:00 · 2024-10-16 07:32:06 -07:00 · 2024-10-16 07:32:06 -07:00 · 54ebdbf7ce
commit 54ebdbf7ce
parent fc5b75d171
32 changed files with 982 additions and 314 deletions
--- a/docs/my-website/docs/proxy/customer_routing.md
+++ b/docs/my-website/docs/proxy/customer_routing.md
@ -24,21 +24,25 @@ curl -X POST --location 'http://0.0.0.0:4000/end_user/new' \
 ### 2. Add eu models to model-group 
-Add eu models to a model group. For azure models, litellm can automatically infer the region (no need to set it). 
+Add eu models to a model group. Use the 'region_name' param to specify the region for each model.
 Supported regions are 'eu' and 'us'.
 ```yaml
 model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
-        model: azure/gpt-35-turbo-eu # 👈 EU azure model
+      model: azure/gpt-35-turbo # 👈 EU azure model
      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
      api_key: os.environ/AZURE_EUROPE_API_KEY
      region_name: "eu"
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: os.environ/AZURE_API_KEY
      region_name: "us"
 router_settings:
  enable_pre_call_checks: true # 👈 IMPORTANT
@ -74,6 +78,7 @@ Expected API Base in response headers
 ```
 x-litellm-api-base: "https://my-endpoint-europe-berri-992.openai.azure.com/"
 x-litellm-model-region: "eu" # 👈 CONFIRMS REGION-BASED ROUTING WORKED
 ```
 ### FAQ 
--- a/docs/my-website/docs/proxy/guardrails/guardrails_ai.md
+++ b/docs/my-website/docs/proxy/guardrails/guardrails_ai.md
@ -0,0 +1,118 @@
 import Image from '@theme/IdealImage';
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 # Guardrails.ai
 Use [Guardrails.ai](https://www.guardrailsai.com/) to add checks to LLM output.
 ## Pre-requisites
 - Setup Guardrails AI Server. [quick start](https://www.guardrailsai.com/docs/getting_started/guardrails_server)
 ## Usage
 1. Setup config.yaml 
 ```yaml
 model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY
 guardrails:
  - guardrail_name: "guardrails_ai-guard"
    litellm_params:
      guardrail: guardrails_ai
      guard_name: "gibberish_guard" # 👈 Guardrail AI guard name
      mode: "post_call"
      api_base: os.environ/GUARDRAILS_AI_API_BASE # 👈 Guardrails AI API Base. Defaults to "http://0.0.0.0:8000"
 ```
 2. Start LiteLLM Gateway 
 ```shell
 litellm --config config.yaml --detailed_debug
 ```
 3. Test request 
 **[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys#request-format)**
 ```shell
 curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-npnwjPQciVRok5yNZgKmFQ" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "hi my email is ishaan@berri.ai"}
    ],
    "guardrails": ["guardrails_ai-guard"]
  }'
 ```
 ## ✨ Control Guardrails per Project (API Key)
 :::info
 ✨ This is an Enterprise only feature [Contact us to get a free trial](https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat)
 :::
 Use this to control what guardrails run per project. In this tutorial we only want the following guardrails to run for 1 project (API Key)
 - `guardrails`: ["aporia-pre-guard", "aporia-post-guard"]
 **Step 1** Create Key with guardrail settings
 <Tabs>
 <TabItem value="/key/generate" label="/key/generate">
 ```shell
 curl -X POST 'http://0.0.0.0:4000/key/generate' \
    -H 'Authorization: Bearer sk-1234' \
    -H 'Content-Type: application/json' \
    -D '{
            "guardrails": ["guardrails_ai-guard"]
        }
    }'
 ```
 </TabItem>
 <TabItem value="/key/update" label="/key/update">
 ```shell
 curl --location 'http://0.0.0.0:4000/key/update' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
        "key": "sk-jNm1Zar7XfNdZXp49Z1kSQ",
        "guardrails": ["guardrails_ai-guard"]
        }
 }'
 ```
 </TabItem>
 </Tabs>
 **Step 2** Test it with new key
 ```shell
 curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: Bearer sk-jNm1Zar7XfNdZXp49Z1kSQ' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "my email is ishaan@berri.ai"
        }
    ]
 }'
 ```
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -95,6 +95,7 @@ const sidebars = {
          items: [
            "proxy/guardrails/quick_start", 
            "proxy/guardrails/aporia_api", 
            "proxy/guardrails/guardrails_ai",
            "proxy/guardrails/lakera_ai", 
            "proxy/guardrails/bedrock",  
            "proxy/guardrails/pii_masking_v2", 
--- a/litellm/init.py
+++ b/litellm/init.py
@ -1005,10 +1005,11 @@ from .llms.fireworks_ai.embed.fireworks_ai_transformation import (
 from .llms.volcengine import VolcEngineConfig
 from .llms.text_completion_codestral import MistralTextCompletionConfig
 from .llms.AzureOpenAI.azure import (
    AzureOpenAIConfig,
    AzureOpenAIError,
    AzureOpenAIAssistantsAPIConfig,
 )
 from .llms.AzureOpenAI.chat.gpt_transformation import AzureOpenAIConfig
 from .llms.hosted_vllm.chat.transformation import HostedVLLMChatConfig
 from .llms.AzureOpenAI.chat.o1_transformation import AzureOpenAIO1Config
 from .llms.watsonx import IBMWatsonXAIConfig
--- a/litellm/integrations/custom_guardrail.py
+++ b/litellm/integrations/custom_guardrail.py
@ -1,4 +1,4 @@
-from typing import Literal, Optional
+from typing import List, Literal, Optional
 from litellm._logging import verbose_logger
 from litellm.integrations.custom_logger import CustomLogger
@ -10,11 +10,20 @@ class CustomGuardrail(CustomLogger):
    def __init__(
        self,
        guardrail_name: Optional[str] = None,
        supported_event_hooks: Optional[List[GuardrailEventHooks]] = None,
        event_hook: Optional[GuardrailEventHooks] = None,
        **kwargs,
    ):
        self.guardrail_name = guardrail_name
        self.supported_event_hooks = supported_event_hooks
        self.event_hook: Optional[GuardrailEventHooks] = event_hook
        if supported_event_hooks:
            ## validate event_hook is in supported_event_hooks
            if event_hook and event_hook not in supported_event_hooks:
                raise ValueError(
                    f"Event hook {event_hook} is not in the supported event hooks {supported_event_hooks}"
                )
        super().__init__(**kwargs)
    def should_run_guardrail(self, data, event_type: GuardrailEventHooks) -> bool:
--- a/litellm/llms/AzureOpenAI/azure.py
+++ b/litellm/llms/AzureOpenAI/azure.py
@ -26,10 +26,6 @@ from litellm.utils import (
 from ...types.llms.openai import (
    Batch,
    CancelBatchRequest,
    ChatCompletionToolChoiceFunctionParam,
    ChatCompletionToolChoiceObjectParam,
    ChatCompletionToolParam,
    ChatCompletionToolParamFunctionChunk,
    CreateBatchRequest,
    HttpxBinaryResponseContent,
    RetrieveBatchRequest,
@ -67,214 +63,6 @@ class AzureOpenAIError(Exception):
        )  # Call the base class constructor with the parameters it needs
 class AzureOpenAIConfig:
    """
    Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions
    The class `AzureOpenAIConfig` provides configuration for the OpenAI's Chat API interface, for use with Azure. It inherits from `OpenAIConfig`. Below are the parameters::
    - `frequency_penalty` (number or null): Defaults to 0. Allows a value between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, thereby minimizing repetition.
    - `function_call` (string or object): This optional parameter controls how the model calls functions.
    - `functions` (array): An optional parameter. It is a list of functions for which the model may generate JSON inputs.
    - `logit_bias` (map): This optional parameter modifies the likelihood of specified tokens appearing in the completion.
    - `max_tokens` (integer or null): This optional parameter helps to set the maximum number of tokens to generate in the chat completion.
    - `n` (integer or null): This optional parameter helps to set how many chat completion choices to generate for each input message.
    - `presence_penalty` (number or null): Defaults to 0. It penalizes new tokens based on if they appear in the text so far, hence increasing the model's likelihood to talk about new topics.
    - `stop` (string / array / null): Specifies up to 4 sequences where the API will stop generating further tokens.
    - `temperature` (number or null): Defines the sampling temperature to use, varying between 0 and 2.
    - `top_p` (number or null): An alternative to sampling with temperature, used for nucleus sampling.
    """
    def __init__(
        self,
        frequency_penalty: Optional[int] = None,
        function_call: Optional[Union[str, dict]] = None,
        functions: Optional[list] = None,
        logit_bias: Optional[dict] = None,
        max_tokens: Optional[int] = None,
        n: Optional[int] = None,
        presence_penalty: Optional[int] = None,
        stop: Optional[Union[str, list]] = None,
        temperature: Optional[int] = None,
        top_p: Optional[int] = None,
    ) -> None:
        locals_ = locals().copy()
        for key, value in locals_.items():
            if key != "self" and value is not None:
                setattr(self.__class__, key, value)
    @classmethod
    def get_config(cls):
        return {
            k: v
            for k, v in cls.__dict__.items()
            if not k.startswith("__")
            and not isinstance(
                v,
                (
                    types.FunctionType,
                    types.BuiltinFunctionType,
                    classmethod,
                    staticmethod,
                ),
            )
            and v is not None
        }
    def get_supported_openai_params(self):
        return [
            "temperature",
            "n",
            "stream",
            "stream_options",
            "stop",
            "max_tokens",
            "max_completion_tokens",
            "tools",
            "tool_choice",
            "presence_penalty",
            "frequency_penalty",
            "logit_bias",
            "user",
            "function_call",
            "functions",
            "tools",
            "tool_choice",
            "top_p",
            "logprobs",
            "top_logprobs",
            "response_format",
            "seed",
            "extra_headers",
        ]
    def map_openai_params(
        self,
        non_default_params: dict,
        optional_params: dict,
        model: str,
        api_version: str,  # Y-M-D-{optional}
        drop_params,
    ) -> dict:
        supported_openai_params = self.get_supported_openai_params()
        api_version_times = api_version.split("-")
        api_version_year = api_version_times[0]
        api_version_month = api_version_times[1]
        api_version_day = api_version_times[2]
        for param, value in non_default_params.items():
            if param == "tool_choice":
                """
                This parameter requires API version 2023-12-01-preview or later
                tool_choice='required' is not supported as of 2024-05-01-preview
                """
                ## check if api version supports this param ##
                if (
                    api_version_year < "2023"
                    or (api_version_year == "2023" and api_version_month < "12")
                    or (
                        api_version_year == "2023"
                        and api_version_month == "12"
                        and api_version_day < "01"
                    )
                ):
                    if litellm.drop_params is True or (
                        drop_params is not None and drop_params is True
                    ):
                        pass
                    else:
                        raise UnsupportedParamsError(
                            status_code=400,
                            message=f"""Azure does not support 'tool_choice', for api_version={api_version}. Bump your API version to '2023-12-01-preview' or later. This parameter requires 'api_version="2023-12-01-preview"' or later. Azure API Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions""",
                        )
                elif value == "required" and (
                    api_version_year == "2024" and api_version_month <= "05"
                ):  ## check if tool_choice value is supported ##
                    if litellm.drop_params is True or (
                        drop_params is not None and drop_params is True
                    ):
                        pass
                    else:
                        raise UnsupportedParamsError(
                            status_code=400,
                            message=f"Azure does not support '{value}' as a {param} param, for api_version={api_version}. To drop 'tool_choice=required' for calls with this Azure API version, set `litellm.drop_params=True` or for proxy:\n\n`litellm_settings:\n drop_params: true`\nAzure API Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions",
                        )
                else:
                    optional_params["tool_choice"] = value
            elif param == "response_format" and isinstance(value, dict):
                json_schema: Optional[dict] = None
                schema_name: str = ""
                if "response_schema" in value:
                    json_schema = value["response_schema"]
                    schema_name = "json_tool_call"
                elif "json_schema" in value:
                    json_schema = value["json_schema"]["schema"]
                    schema_name = value["json_schema"]["name"]
                """
                Follow similar approach to anthropic - translate to a single tool call. 
                When using tools in this way: - https://docs.anthropic.com/en/docs/build-with-claude/tool-use#json-mode
                - You usually want to provide a single tool
                - You should set tool_choice (see Forcing tool use) to instruct the model to explicitly use that tool
                - Remember that the model will pass the input to the tool, so the name of the tool and description should be from the model’s perspective.
                """
                if json_schema is not None and (
                    (api_version_year <= "2024" and api_version_month < "08")
                    or "gpt-4o" not in model
                ):  # azure api version "2024-08-01-preview" onwards supports 'json_schema' only for gpt-4o
                    _tool_choice = ChatCompletionToolChoiceObjectParam(
                        type="function",
                        function=ChatCompletionToolChoiceFunctionParam(
                            name=schema_name
                        ),
                    )
                    _tool = ChatCompletionToolParam(
                        type="function",
                        function=ChatCompletionToolParamFunctionChunk(
                            name=schema_name, parameters=json_schema
                        ),
                    )
                    optional_params["tools"] = [_tool]
                    optional_params["tool_choice"] = _tool_choice
                    optional_params["json_mode"] = True
                else:
                    optional_params["response_format"] = value
            elif param == "max_completion_tokens":
                # TODO - Azure OpenAI will probably add support for this, we should pass it through when Azure adds support
                optional_params["max_tokens"] = value
            elif param in supported_openai_params:
                optional_params[param] = value
        return optional_params
    def get_mapped_special_auth_params(self) -> dict:
        return {"token": "azure_ad_token"}
    def map_special_auth_params(self, non_default_params: dict, optional_params: dict):
        for param, value in non_default_params.items():
            if param == "token":
                optional_params["azure_ad_token"] = value
        return optional_params
    def get_eu_regions(self) -> List[str]:
        """
        Source: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-model-availability
        """
        return ["europe", "sweden", "switzerland", "france", "uk"]
 class AzureOpenAIAssistantsAPIConfig:
    """
    Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/assistants-reference-messages?tabs=python#create-message
@ -620,11 +408,9 @@ class AzureChatCompletion(BaseLLM):
                data = {"model": None, "messages": messages, **optional_params}
            else:
-                data = {
+                data = litellm.AzureOpenAIConfig.transform_request(
-                    "model": model,  # type: ignore
+                    model=model, messages=messages, optional_params=optional_params
-                    "messages": messages,
+                )
                    **optional_params,
                }
            if acompletion is True:
                if optional_params.get("stream", False):
--- a/litellm/llms/AzureOpenAI/chat/gpt_transformation.py
+++ b/litellm/llms/AzureOpenAI/chat/gpt_transformation.py
@ -0,0 +1,250 @@
 import types
 from typing import List, Optional, Type, Union
 import litellm
 from ....exceptions import UnsupportedParamsError
 from ....types.llms.openai import (
    AllMessageValues,
    ChatCompletionToolChoiceFunctionParam,
    ChatCompletionToolChoiceObjectParam,
    ChatCompletionToolParam,
    ChatCompletionToolParamFunctionChunk,
 )
 from ...prompt_templates.factory import convert_to_azure_openai_messages
 class AzureOpenAIConfig:
    """
    Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions
    The class `AzureOpenAIConfig` provides configuration for the OpenAI's Chat API interface, for use with Azure. It inherits from `OpenAIConfig`. Below are the parameters::
    - `frequency_penalty` (number or null): Defaults to 0. Allows a value between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, thereby minimizing repetition.
    - `function_call` (string or object): This optional parameter controls how the model calls functions.
    - `functions` (array): An optional parameter. It is a list of functions for which the model may generate JSON inputs.
    - `logit_bias` (map): This optional parameter modifies the likelihood of specified tokens appearing in the completion.
    - `max_tokens` (integer or null): This optional parameter helps to set the maximum number of tokens to generate in the chat completion.
    - `n` (integer or null): This optional parameter helps to set how many chat completion choices to generate for each input message.
    - `presence_penalty` (number or null): Defaults to 0. It penalizes new tokens based on if they appear in the text so far, hence increasing the model's likelihood to talk about new topics.
    - `stop` (string / array / null): Specifies up to 4 sequences where the API will stop generating further tokens.
    - `temperature` (number or null): Defines the sampling temperature to use, varying between 0 and 2.
    - `top_p` (number or null): An alternative to sampling with temperature, used for nucleus sampling.
    """
    def __init__(
        self,
        frequency_penalty: Optional[int] = None,
        function_call: Optional[Union[str, dict]] = None,
        functions: Optional[list] = None,
        logit_bias: Optional[dict] = None,
        max_tokens: Optional[int] = None,
        n: Optional[int] = None,
        presence_penalty: Optional[int] = None,
        stop: Optional[Union[str, list]] = None,
        temperature: Optional[int] = None,
        top_p: Optional[int] = None,
    ) -> None:
        locals_ = locals().copy()
        for key, value in locals_.items():
            if key != "self" and value is not None:
                setattr(self.__class__, key, value)
    @classmethod
    def get_config(cls):
        return {
            k: v
            for k, v in cls.__dict__.items()
            if not k.startswith("__")
            and not isinstance(
                v,
                (
                    types.FunctionType,
                    types.BuiltinFunctionType,
                    classmethod,
                    staticmethod,
                ),
            )
            and v is not None
        }
    def get_supported_openai_params(self):
        return [
            "temperature",
            "n",
            "stream",
            "stream_options",
            "stop",
            "max_tokens",
            "max_completion_tokens",
            "tools",
            "tool_choice",
            "presence_penalty",
            "frequency_penalty",
            "logit_bias",
            "user",
            "function_call",
            "functions",
            "tools",
            "tool_choice",
            "top_p",
            "logprobs",
            "top_logprobs",
            "response_format",
            "seed",
            "extra_headers",
        ]
    def map_openai_params(
        self,
        non_default_params: dict,
        optional_params: dict,
        model: str,
        api_version: str,  # Y-M-D-{optional}
        drop_params,
    ) -> dict:
        supported_openai_params = self.get_supported_openai_params()
        api_version_times = api_version.split("-")
        api_version_year = api_version_times[0]
        api_version_month = api_version_times[1]
        api_version_day = api_version_times[2]
        for param, value in non_default_params.items():
            if param == "tool_choice":
                """
                This parameter requires API version 2023-12-01-preview or later
                tool_choice='required' is not supported as of 2024-05-01-preview
                """
                ## check if api version supports this param ##
                if (
                    api_version_year < "2023"
                    or (api_version_year == "2023" and api_version_month < "12")
                    or (
                        api_version_year == "2023"
                        and api_version_month == "12"
                        and api_version_day < "01"
                    )
                ):
                    if litellm.drop_params is True or (
                        drop_params is not None and drop_params is True
                    ):
                        pass
                    else:
                        raise UnsupportedParamsError(
                            status_code=400,
                            message=f"""Azure does not support 'tool_choice', for api_version={api_version}. Bump your API version to '2023-12-01-preview' or later. This parameter requires 'api_version="2023-12-01-preview"' or later. Azure API Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions""",
                        )
                elif value == "required" and (
                    api_version_year == "2024" and api_version_month <= "05"
                ):  ## check if tool_choice value is supported ##
                    if litellm.drop_params is True or (
                        drop_params is not None and drop_params is True
                    ):
                        pass
                    else:
                        raise UnsupportedParamsError(
                            status_code=400,
                            message=f"Azure does not support '{value}' as a {param} param, for api_version={api_version}. To drop 'tool_choice=required' for calls with this Azure API version, set `litellm.drop_params=True` or for proxy:\n\n`litellm_settings:\n drop_params: true`\nAzure API Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions",
                        )
                else:
                    optional_params["tool_choice"] = value
            elif param == "response_format" and isinstance(value, dict):
                json_schema: Optional[dict] = None
                schema_name: str = ""
                if "response_schema" in value:
                    json_schema = value["response_schema"]
                    schema_name = "json_tool_call"
                elif "json_schema" in value:
                    json_schema = value["json_schema"]["schema"]
                    schema_name = value["json_schema"]["name"]
                """
                Follow similar approach to anthropic - translate to a single tool call. 
                When using tools in this way: - https://docs.anthropic.com/en/docs/build-with-claude/tool-use#json-mode
                - You usually want to provide a single tool
                - You should set tool_choice (see Forcing tool use) to instruct the model to explicitly use that tool
                - Remember that the model will pass the input to the tool, so the name of the tool and description should be from the model’s perspective.
                """
                if json_schema is not None and (
                    (api_version_year <= "2024" and api_version_month < "08")
                    or "gpt-4o" not in model
                ):  # azure api version "2024-08-01-preview" onwards supports 'json_schema' only for gpt-4o
                    _tool_choice = ChatCompletionToolChoiceObjectParam(
                        type="function",
                        function=ChatCompletionToolChoiceFunctionParam(
                            name=schema_name
                        ),
                    )
                    _tool = ChatCompletionToolParam(
                        type="function",
                        function=ChatCompletionToolParamFunctionChunk(
                            name=schema_name, parameters=json_schema
                        ),
                    )
                    optional_params["tools"] = [_tool]
                    optional_params["tool_choice"] = _tool_choice
                    optional_params["json_mode"] = True
                else:
                    optional_params["response_format"] = value
            elif param == "max_completion_tokens":
                # TODO - Azure OpenAI will probably add support for this, we should pass it through when Azure adds support
                optional_params["max_tokens"] = value
            elif param in supported_openai_params:
                optional_params[param] = value
        return optional_params
    @classmethod
    def transform_request(
        cls, model: str, messages: List[AllMessageValues], optional_params: dict
    ) -> dict:
        messages = convert_to_azure_openai_messages(messages)
        return {
            "model": model,
            "messages": messages,
            **optional_params,
        }
    def get_mapped_special_auth_params(self) -> dict:
        return {"token": "azure_ad_token"}
    def map_special_auth_params(self, non_default_params: dict, optional_params: dict):
        for param, value in non_default_params.items():
            if param == "token":
                optional_params["azure_ad_token"] = value
        return optional_params
    def get_eu_regions(self) -> List[str]:
        """
        Source: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-model-availability
        """
        return ["europe", "sweden", "switzerland", "france", "uk"]
    def get_us_regions(self) -> List[str]:
        """
        Source: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#gpt-4-and-gpt-4-turbo-model-availability
        """
        return [
            "us",
            "eastus",
            "eastus2",
            "eastus2euap",
            "eastus3",
            "southcentralus",
            "westus",
            "westus2",
            "westus3",
            "westus4",
        ]
--- a/litellm/llms/bedrock/common_utils.py
+++ b/litellm/llms/bedrock/common_utils.py
@ -53,6 +53,17 @@ class AmazonBedrockGlobalConfig:
            "eu-central-1",
        ]
    def get_us_regions(self) -> List[str]:
        """
        Source: https://www.aws-services.info/bedrock.html
        """
        return [
            "us-east-2",
            "us-east-1",
            "us-west-2",
            "us-gov-west-1",
        ]
 class AmazonTitanConfig:
    """
--- a/litellm/llms/prompt_templates/common_utils.py
+++ b/litellm/llms/prompt_templates/common_utils.py
@ -2,9 +2,11 @@
 Common utility functions used for translating messages across providers
 """
-from typing import List
+import json
 from typing import Dict, List
 from litellm.types.llms.openai import AllMessageValues
 from litellm.types.utils import Choices, ModelResponse, StreamingChoices
 def convert_content_list_to_str(message: AllMessageValues) -> str:
@ -27,3 +29,43 @@ def convert_content_list_to_str(message: AllMessageValues) -> str:
            texts = message_content
    return texts
 def convert_openai_message_to_only_content_messages(
    messages: List[AllMessageValues],
 ) -> List[Dict[str, str]]:
    """
    Converts OpenAI messages to only content messages
    Used for calling guardrails integrations which expect string content
    """
    converted_messages = []
    user_roles = ["user", "tool", "function"]
    for message in messages:
        if message.get("role") in user_roles:
            converted_messages.append(
                {"role": "user", "content": convert_content_list_to_str(message)}
            )
        elif message.get("role") == "assistant":
            converted_messages.append(
                {"role": "assistant", "content": convert_content_list_to_str(message)}
            )
    return converted_messages
 def get_content_from_model_response(response: ModelResponse) -> str:
    """
    Gets content from model response
    """
    content = ""
    for choice in response.choices:
        if isinstance(choice, Choices):
            content += choice.message.content if choice.message.content else ""
            if choice.message.function_call:
                content += choice.message.function_call.model_dump_json()
            if choice.message.tool_calls:
                for tc in choice.message.tool_calls:
                    content += tc.model_dump_json()
        elif isinstance(choice, StreamingChoices):
            content += getattr(choice, "delta", {}).get("content", "") or ""
    return content
--- a/litellm/llms/prompt_templates/factory.py
+++ b/litellm/llms/prompt_templates/factory.py
@ -920,6 +920,31 @@ def anthropic_messages_pt_xml(messages: list):
 # ------------------------------------------------------------------------------
 def _azure_tool_call_invoke_helper(
    function_call_params: ChatCompletionToolCallFunctionChunk,
 ) -> Optional[ChatCompletionToolCallFunctionChunk]:
    """
    Azure requires 'arguments' to be a string.
    """
    if function_call_params.get("arguments") is None:
        function_call_params["arguments"] = ""
    return function_call_params
 def convert_to_azure_openai_messages(
    messages: List[AllMessageValues],
 ) -> List[AllMessageValues]:
    for m in messages:
        if m["role"] == "assistant":
            function_call = m.get("function_call", None)
            if function_call is not None:
                m["function_call"] = _azure_tool_call_invoke_helper(function_call)
    return messages
 # ------------------------------------------------------------------------------
 def infer_protocol_value(
    value: Any,
 ) -> Literal[
--- a/litellm/llms/vertex_ai_and_google_ai_studio/gemini/vertex_and_google_ai_studio_gemini.py
+++ b/litellm/llms/vertex_ai_and_google_ai_studio/gemini/vertex_and_google_ai_studio_gemini.py
@ -254,6 +254,21 @@ class VertexAIConfig:
            "europe-west9",
        ]
    def get_us_regions(self) -> List[str]:
        """
        Source: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#available-regions
        """
        return [
            "us-central1",
            "us-east1",
            "us-east4",
            "us-east5",
            "us-south1",
            "us-west1",
            "us-west4",
            "us-west5",
        ]
 class VertexGeminiConfig:
    """
--- a/litellm/llms/watsonx.py
+++ b/litellm/llms/watsonx.py
@ -178,6 +178,14 @@ class IBMWatsonXAIConfig:
            "eu-gb",
        ]
    def get_us_regions(self) -> List[str]:
        """
        Source: https://www.ibm.com/docs/en/watsonx/saas?topic=integrations-regional-availability
        """
        return [
            "us-south",
        ]
 def convert_messages_to_prompt(model, messages, provider, custom_prompt_dict) -> str:
    # handle anthropic prompts and amazon titan prompts
--- a/litellm/main.py
+++ b/litellm/main.py
@ -4059,12 +4059,12 @@ def text_completion(
                    return response["choices"][0]
                with concurrent.futures.ThreadPoolExecutor() as executor:
-                    futures = [
+                    completed_futures = [
                        executor.submit(process_prompt, i, individual_prompt)
                        for i, individual_prompt in enumerate(prompt)
                    ]
                    for i, future in enumerate(
-                        concurrent.futures.as_completed(futures)
+                        concurrent.futures.as_completed(completed_futures)
                    ):
                        responses[i] = future.result()
                    text_completion_response.choices = responses  # type: ignore
--- a/litellm/proxy/_new_secret_config.yaml
+++ b/litellm/proxy/_new_secret_config.yaml
@ -1,15 +1,27 @@
 model_list:
-  # - model_name: openai-gpt-4o-realtime-audio
+  - model_name: gpt-3.5-turbo
  #   litellm_params:
  #     model: azure/gpt-4o-realtime-preview
  #     api_key: os.environ/AZURE_SWEDEN_API_KEY
  #     api_base: os.environ/AZURE_SWEDEN_API_BASE
  - model_name: openai-gpt-4o-realtime-audio
    litellm_params:
-      model: openai/gpt-4o-realtime-preview-2024-10-01
+      model: azure/gpt-35-turbo # 👈 EU azure model
-      api_key: os.environ/OPENAI_API_KEY
+      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
-      api_base: http://localhost:8080
+      api_key: os.environ/AZURE_EUROPE_API_KEY
      region_name: "eu"
  - model_name: gpt-4o
    litellm_params:
      model: azure/gpt-4o
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_key: os.environ/AZURE_API_KEY
      region_name: "us"
  - model_name: gpt-3.5-turbo-end-user-test
    litellm_params:
      model: gpt-3.5-turbo
      region_name: "eu"
    model_info:
      id: "1"
-litellm_settings:
+# guardrails:
-  success_callback: ["langfuse"]
+#   - guardrail_name: "gibberish-guard"
 #     litellm_params:
 #       guardrail: guardrails_ai
 #       guard_name: "gibberish_guard"
 #       mode: "post_call"
 #       api_base: os.environ/GUARDRAILS_AI_API_BASE
--- a/litellm/proxy/_types.py
+++ b/litellm/proxy/_types.py
@ -767,6 +767,9 @@ class DeleteUserRequest(LiteLLMBase):
    user_ids: List[str]  # required
 AllowedModelRegion = Literal["eu", "us"]
 class NewCustomerRequest(LiteLLMBase):
    """
    Create a new customer, allocate a budget to them
@ -777,7 +780,7 @@ class NewCustomerRequest(LiteLLMBase):
    blocked: bool = False  # allow/disallow requests for this end-user
    max_budget: Optional[float] = None
    budget_id: Optional[str] = None  # give either a budget_id or max_budget
-    allowed_model_region: Optional[Literal["eu"]] = (
+    allowed_model_region: Optional[AllowedModelRegion] = (
        None  # require all user requests to use models in this specific region
    )
    default_model: Optional[str] = (
@ -804,7 +807,7 @@ class UpdateCustomerRequest(LiteLLMBase):
    blocked: bool = False  # allow/disallow requests for this end-user
    max_budget: Optional[float] = None
    budget_id: Optional[str] = None  # give either a budget_id or max_budget
-    allowed_model_region: Optional[Literal["eu"]] = (
+    allowed_model_region: Optional[AllowedModelRegion] = (
        None  # require all user requests to use models in this specific region
    )
    default_model: Optional[str] = (
@ -1384,7 +1387,7 @@ class UserAPIKeyAuth(
    api_key: Optional[str] = None
    user_role: Optional[LitellmUserRoles] = None
-    allowed_model_region: Optional[Literal["eu"]] = None
+    allowed_model_region: Optional[AllowedModelRegion] = None
    parent_otel_span: Optional[Span] = None
    rpm_limit_per_model: Optional[Dict[str, int]] = None
    tpm_limit_per_model: Optional[Dict[str, int]] = None
@ -1466,7 +1469,7 @@ class LiteLLM_EndUserTable(LiteLLMBase):
    blocked: bool
    alias: Optional[str] = None
    spend: float = 0.0
-    allowed_model_region: Optional[Literal["eu"]] = None
+    allowed_model_region: Optional[AllowedModelRegion] = None
    default_model: Optional[str] = None
    litellm_budget_table: Optional[LiteLLM_BudgetTable] = None
@ -2019,3 +2022,11 @@ class LoggingCallbackStatus(TypedDict, total=False):
 class KeyHealthResponse(TypedDict, total=False):
    key: Literal["healthy", "unhealthy"]
    logging_callbacks: Optional[LoggingCallbackStatus]
 class SpecialHeaders(enum.Enum):
    """Used by user_api_key_auth.py to get litellm key"""
    openai_authorization = "Authorization"
    azure_authorization = "API-Key"
    anthropic_authorization = "x-api-key"
--- a/litellm/proxy/auth/user_api_key_auth.py
+++ b/litellm/proxy/auth/user_api_key_auth.py
@ -75,15 +75,17 @@ from litellm.proxy.common_utils.http_parsing_utils import _read_request_body
 from litellm.proxy.utils import _to_ns
 api_key_header = APIKeyHeader(
-    name="Authorization", auto_error=False, description="Bearer token"
+    name=SpecialHeaders.openai_authorization.value,
    auto_error=False,
    description="Bearer token",
 )
 azure_api_key_header = APIKeyHeader(
-    name="API-Key",
+    name=SpecialHeaders.azure_authorization.value,
    auto_error=False,
    description="Some older versions of the openai Python package will send an API-Key header with just the API key ",
 )
 anthropic_api_key_header = APIKeyHeader(
-    name="x-api-key",
+    name=SpecialHeaders.anthropic_authorization.value,
    auto_error=False,
    description="If anthropic client used.",
 )
--- a/litellm/proxy/guardrails/guardrail_hooks/guardrails_ai.py
+++ b/litellm/proxy/guardrails/guardrail_hooks/guardrails_ai.py
@ -0,0 +1,109 @@
 # +-------------------------------------------------------------+
 #
 #           Use GuardrailsAI for your LLM calls
 #
 # +-------------------------------------------------------------+
 #  Thank you for using Litellm! - Krrish & Ishaan
 import json
 from typing import Any, Dict, List, Literal, Optional, TypedDict, Union
 from fastapi import HTTPException
 import litellm
 from litellm._logging import verbose_proxy_logger
 from litellm.caching.caching import DualCache
 from litellm.integrations.custom_guardrail import CustomGuardrail
 from litellm.llms.prompt_templates.common_utils import (
    convert_openai_message_to_only_content_messages,
    get_content_from_model_response,
 )
 from litellm.proxy._types import UserAPIKeyAuth
 from litellm.proxy.common_utils.callback_utils import (
    add_guardrail_to_applied_guardrails_header,
 )
 from litellm.proxy.guardrails.guardrail_helpers import should_proceed_based_on_metadata
 from litellm.types.guardrails import GuardrailEventHooks
 from litellm.types.llms.openai import AllMessageValues
 class GuardrailsAIResponse(TypedDict):
    callId: str
    rawLlmOutput: str
    validatedOutput: str
    validationPassed: bool
 class GuardrailsAI(CustomGuardrail):
    def __init__(
        self,
        guard_name: str,
        api_base: Optional[str] = None,
        **kwargs,
    ):
        if guard_name is None:
            raise Exception(
                "GuardrailsAIException - Please pass the Guardrails AI guard name via 'litellm_params::guard_name'"
            )
        # store kwargs as optional_params
        self.guardrails_ai_api_base = api_base or "http://0.0.0.0:8000"
        self.guardrails_ai_guard_name = guard_name
        self.optional_params = kwargs
        supported_event_hooks = [GuardrailEventHooks.post_call]
        super().__init__(supported_event_hooks=supported_event_hooks, **kwargs)
    async def make_guardrails_ai_api_request(self, llm_output: str):
        from httpx import URL
        data = {"llmOutput": llm_output}
        _json_data = json.dumps(data)
        response = await litellm.module_level_aclient.post(
            url=str(
                URL(self.guardrails_ai_api_base).join(
                    f"guards/{self.guardrails_ai_guard_name}/validate"
                )
            ),
            data=_json_data,
            headers={
                "Content-Type": "application/json",
            },
        )
        verbose_proxy_logger.debug("guardrails_ai response: %s", response)
        _json_response = GuardrailsAIResponse(**response.json())  # type: ignore
        if _json_response.get("validationPassed") is False:
            raise HTTPException(
                status_code=400,
                detail={
                    "error": "Violated guardrail policy",
                    "guardrails_ai_response": _json_response,
                },
            )
        return _json_response
    async def async_post_call_success_hook(
        self,
        data: dict,
        user_api_key_dict: UserAPIKeyAuth,
        response,
    ):
        """
        Runs on response from LLM API call
        It can be used to reject a response
        """
        event_type: GuardrailEventHooks = GuardrailEventHooks.post_call
        if self.should_run_guardrail(data=data, event_type=event_type) is not True:
            return
        if not isinstance(response, litellm.ModelResponse):
            return
        response_str: str = get_content_from_model_response(response)
        if response_str is not None and len(response_str) > 0:
            await self.make_guardrails_ai_api_request(llm_output=response_str)
            add_guardrail_to_applied_guardrails_header(
                request_data=data, guardrail_name=self.guardrail_name
            )
        return
--- a/litellm/proxy/guardrails/init_guardrails.py
+++ b/litellm/proxy/guardrails/init_guardrails.py
@ -17,6 +17,7 @@ from litellm.types.guardrails import (
    GuardrailItemSpec,
    LakeraCategoryThresholds,
    LitellmParams,
    SupportedGuardrailIntegrations,
 )
 all_guardrails: List[GuardrailItem] = []
@ -86,8 +87,8 @@ Map guardrail_name: <pre_call>, <post_call>, during_call
 def init_guardrails_v2(
-    all_guardrails: dict,
+    all_guardrails: List[Dict],
-    config_file_path: str,
+    config_file_path: Optional[str] = None,
 ):
    # Convert the loaded data to the TypedDict structure
    guardrail_list = []
@ -124,7 +125,7 @@ def init_guardrails_v2(
                litellm_params["api_base"] = str(get_secret(litellm_params["api_base"]))  # type: ignore
        # Init guardrail CustomLoggerClass
-        if litellm_params["guardrail"] == "aporia":
+        if litellm_params["guardrail"] == SupportedGuardrailIntegrations.APORIA.value:
            from litellm.proxy.guardrails.guardrail_hooks.aporia_ai import (
                AporiaGuardrail,
            )
@ -136,7 +137,9 @@ def init_guardrails_v2(
                event_hook=litellm_params["mode"],
            )
            litellm.callbacks.append(_aporia_callback)  # type: ignore
-        if litellm_params["guardrail"] == "bedrock":
+        elif (
            litellm_params["guardrail"] == SupportedGuardrailIntegrations.BEDROCK.value
        ):
            from litellm.proxy.guardrails.guardrail_hooks.bedrock_guardrails import (
                BedrockGuardrail,
            )
@ -148,7 +151,7 @@ def init_guardrails_v2(
                guardrailVersion=litellm_params["guardrailVersion"],
            )
            litellm.callbacks.append(_bedrock_callback)  # type: ignore
-        elif litellm_params["guardrail"] == "lakera":
+        elif litellm_params["guardrail"] == SupportedGuardrailIntegrations.LAKERA.value:
            from litellm.proxy.guardrails.guardrail_hooks.lakera_ai import (
                lakeraAI_Moderation,
            )
@ -161,7 +164,9 @@ def init_guardrails_v2(
                category_thresholds=litellm_params.get("category_thresholds"),
            )
            litellm.callbacks.append(_lakera_callback)  # type: ignore
-        elif litellm_params["guardrail"] == "presidio":
+        elif (
            litellm_params["guardrail"] == SupportedGuardrailIntegrations.PRESIDIO.value
        ):
            from litellm.proxy.guardrails.guardrail_hooks.presidio import (
                _OPTIONAL_PresidioPIIMasking,
            )
@ -189,7 +194,10 @@ def init_guardrails_v2(
                litellm.callbacks.append(_success_callback)  # type: ignore
            litellm.callbacks.append(_presidio_callback)  # type: ignore
-        elif litellm_params["guardrail"] == "hide-secrets":
+        elif (
            litellm_params["guardrail"]
            == SupportedGuardrailIntegrations.HIDE_SECRETS.value
        ):
            from enterprise.enterprise_hooks.secret_detection import (
                _ENTERPRISE_SecretDetection,
            )
@ -201,10 +209,34 @@ def init_guardrails_v2(
            )
            litellm.callbacks.append(_secret_detection_object)  # type: ignore
        elif (
            litellm_params["guardrail"]
            == SupportedGuardrailIntegrations.GURDRAILS_AI.value
        ):
            from litellm.proxy.guardrails.guardrail_hooks.guardrails_ai import (
                GuardrailsAI,
            )
            _guard_name = litellm_params.get("guard_name")
            if _guard_name is None:
                raise Exception(
                    "GuardrailsAIException - Please pass the Guardrails AI guard name via 'litellm_params::guard_name'"
                )
            _guardrails_ai_callback = GuardrailsAI(
                api_base=litellm_params.get("api_base"),
                guard_name=_guard_name,
                guardrail_name=SupportedGuardrailIntegrations.GURDRAILS_AI.value,
            )
            litellm.callbacks.append(_guardrails_ai_callback)  # type: ignore
        elif (
            isinstance(litellm_params["guardrail"], str)
            and "." in litellm_params["guardrail"]
        ):
            if config_file_path is None:
                raise Exception(
                    "GuardrailsAIException - Please pass the config_file_path to initialize_guardrails_v2"
                )
            import os
            from litellm.proxy.utils import get_instance_fn
@ -238,6 +270,8 @@ def init_guardrails_v2(
                event_hook=litellm_params["mode"],
            )
            litellm.callbacks.append(_guardrail_callback)  # type: ignore
        else:
            raise ValueError(f"Unsupported guardrail: {litellm_params['guardrail']}")
        parsed_guardrail = Guardrail(
            guardrail_name=guardrail["guardrail_name"],
--- a/litellm/proxy/litellm_pre_call_utils.py
+++ b/litellm/proxy/litellm_pre_call_utils.py
@ -1,16 +1,20 @@
 import copy
-from typing import TYPE_CHECKING, Any, Dict, Optional
+from typing import TYPE_CHECKING, Any, Dict, Optional, Union
 from fastapi import Request
 from starlette.datastructures import Headers
 import litellm
 from litellm._logging import verbose_logger, verbose_proxy_logger
 from litellm.proxy._types import (
    AddTeamCallback,
    CommonProxyErrors,
    LiteLLMRoutes,
    SpecialHeaders,
    TeamCallbackMetadata,
    UserAPIKeyAuth,
 )
 from litellm.proxy.auth.auth_utils import get_request_route
 from litellm.types.utils import SupportedCacheControls
 if TYPE_CHECKING:
@ -137,6 +141,40 @@ def _get_dynamic_logging_metadata(
    return callback_settings_obj
 def clean_headers(
    headers: Headers, litellm_key_header_name: Optional[str] = None
 ) -> dict:
    """
    Removes litellm api key from headers
    """
    special_headers = [v.value.lower() for v in SpecialHeaders._member_map_.values()]
    special_headers = special_headers
    if litellm_key_header_name is not None:
        special_headers.append(litellm_key_header_name.lower())
    clean_headers = {}
    for header, value in headers.items():
        if header.lower() not in special_headers:
            clean_headers[header] = value
    return clean_headers
 def get_forwardable_headers(
    headers: Union[Headers, dict],
 ):
    """
    Get the headers that should be forwarded to the LLM Provider.
    Looks for any `x-` headers and sends them to the LLM Provider.
    """
    forwarded_headers = {}
    for header, value in headers.items():
        if header.lower().startswith("x-") and not header.lower().startswith(
            "x-stainless"
        ):  # causes openai sdk to fail
            forwarded_headers[header] = value
    return forwarded_headers
 async def add_litellm_data_to_request(
    data: dict,
    request: Request,
@ -163,8 +201,17 @@ async def add_litellm_data_to_request(
    safe_add_api_version_from_query_params(data, request)
-    _headers = dict(request.headers)
+    _headers = clean_headers(
        request.headers,
        litellm_key_header_name=(
            general_settings.get("litellm_key_header_name")
            if general_settings is not None
            else None
        ),
    )
    if get_forwardable_headers(_headers) != {}:
        data["headers"] = get_forwardable_headers(_headers)
    # Include original request and headers in the data
    data["proxy_server_request"] = {
        "url": str(request.url),
--- a/litellm/proxy/proxy_server.py
+++ b/litellm/proxy/proxy_server.py
@ -1992,7 +1992,7 @@ class ProxyConfig:
        )  # type:ignore
        # Guardrail settings
-        guardrails_v2: Optional[dict] = None
+        guardrails_v2: Optional[List[Dict]] = None
        if config is not None:
            guardrails_v2 = config.get("guardrails", None)
@ -5894,7 +5894,7 @@ async def new_end_user(
    - blocked: bool - Flag to allow or disallow requests for this end-user. Default is False.
    - max_budget: Optional[float] - The maximum budget allocated to the user. Either 'max_budget' or 'budget_id' should be provided, not both.
    - budget_id: Optional[str] - The identifier for an existing budget allocated to the user. Either 'max_budget' or 'budget_id' should be provided, not both.
-    - allowed_model_region: Optional[Literal["eu"]] - Require all user requests to use models in this specific region.
+    - allowed_model_region: Optional[Union[Literal["eu"], Literal["us"]]] - Require all user requests to use models in this specific region.
    - default_model: Optional[str] - If no equivalent model in the allowed region, default all requests to this model.
    - metadata: Optional[dict] = Metadata for customer, store information for customer. Example metadata = {"data_training_opt_out": True}
--- a/litellm/proxy/utils.py
+++ b/litellm/proxy/utils.py
@ -1580,15 +1580,9 @@ class PrismaClient:
                        }
                    )
                elif query_type == "find_all" and user_id_list is not None:
-                    user_id_values = ", ".join(f"'{item}'" for item in user_id_list)
+                    response = await self.db.litellm_usertable.find_many(
-                    sql_query = f"""
+                        where={"user_id": {"in": user_id_list}}
-                    SELECT *
+                    )
                    FROM "LiteLLM_UserTable"
                    WHERE "user_id" IN ({user_id_values})
                    """
                    # Execute the raw query
                    # The asterisk before `user_id_list` unpacks the list into separate arguments
                    response = await self.db.query_raw(sql_query)
                elif query_type == "find_all":
                    if expires is not None:
                        response = await self.db.litellm_usertable.find_many(  # type: ignore
--- a/litellm/router.py
+++ b/litellm/router.py
@ -126,6 +126,7 @@ from litellm.utils import (
    get_llm_provider,
    get_secret,
    get_utc_datetime,
    is_region_allowed,
 )
 from .router_utils.pattern_match_deployments import PatternMatchRouter
@ -4888,25 +4889,14 @@ class Router:
            if (
                request_kwargs is not None
                and request_kwargs.get("allowed_model_region") is not None
                and request_kwargs["allowed_model_region"] == "eu"
            ):
-                if _litellm_params.get("region_name") is not None and isinstance(
+                allowed_model_region = request_kwargs.get("allowed_model_region")
-                    _litellm_params["region_name"], str
+
                if allowed_model_region is not None:
                    if not is_region_allowed(
                        litellm_params=LiteLLM_Params(**_litellm_params),
                        allowed_model_region=allowed_model_region,
                    ):
                    # check if in allowed_model_region
                    if (
                        _is_region_eu(litellm_params=LiteLLM_Params(**_litellm_params))
                        is False
                    ):
                        invalid_model_indices.append(idx)
                        continue
                else:
                    verbose_router_logger.debug(
                        "Filtering out model - {}, as model_region=None, and allowed_model_region={}".format(
                            model_id, request_kwargs.get("allowed_model_region")
                        )
                    )
                    # filter out since region unknown, and user wants to filter for specific region
                        invalid_model_indices.append(idx)
                        continue
--- a/litellm/types/guardrails.py
+++ b/litellm/types/guardrails.py
@ -19,6 +19,15 @@ litellm_settings:
 """
 class SupportedGuardrailIntegrations(Enum):
    APORIA = "aporia"
    BEDROCK = "bedrock"
    GURDRAILS_AI = "guardrails_ai"
    LAKERA = "lakera"
    PRESIDIO = "presidio"
    HIDE_SECRETS = "hide-secrets"
 class Role(Enum):
    SYSTEM = "system"
    ASSISTANT = "assistant"
@ -92,6 +101,9 @@ class LitellmParams(TypedDict):
    # hide secrets params
    detect_secrets_config: Optional[dict]
    # guardrails ai params
    guard_name: Optional[str]
 class Guardrail(TypedDict):
    guardrail_name: str
--- a/litellm/utils.py
+++ b/litellm/utils.py
@ -175,7 +175,7 @@ from .exceptions import (
    UnprocessableEntityError,
    UnsupportedParamsError,
 )
-from .proxy._types import KeyManagementSystem
+from .proxy._types import AllowedModelRegion, KeyManagementSystem
 from .types.llms.openai import (
    ChatCompletionDeltaToolCallChunk,
    ChatCompletionToolCallChunk,
@ -3839,18 +3839,13 @@ def _get_model_region(
    return litellm_params.region_name
-def _is_region_eu(litellm_params: LiteLLM_Params) -> bool:
+def _infer_model_region(litellm_params: LiteLLM_Params) -> Optional[AllowedModelRegion]:
    """
-    Return true/false if a deployment is in the EU
+    Infer if a model is in the EU or US region
    """
    if litellm_params.region_name == "eu":
        return True
-    ## ELSE ##
+    Returns:
-    """
+    - str (region) - "eu" or "us"
-    - get provider 
+    - None (if region not found)
    - get provider regions 
    - return true if given region (get_provider_region) in eu region (config.get_eu_regions())
    """
    model, custom_llm_provider, _, _ = litellm.get_llm_provider(
        model=litellm_params.model, litellm_params=litellm_params
@ -3861,21 +3856,71 @@ def _is_region_eu(litellm_params: LiteLLM_Params) -> bool:
    )
    if model_region is None:
-        return False
+        verbose_logger.debug(
            "Cannot infer model region for model: {}".format(litellm_params.model)
        )
        return None
    if custom_llm_provider == "azure":
        eu_regions = litellm.AzureOpenAIConfig().get_eu_regions()
        us_regions = litellm.AzureOpenAIConfig().get_us_regions()
    elif custom_llm_provider == "vertex_ai":
        eu_regions = litellm.VertexAIConfig().get_eu_regions()
        us_regions = litellm.VertexAIConfig().get_us_regions()
    elif custom_llm_provider == "bedrock":
        eu_regions = litellm.AmazonBedrockGlobalConfig().get_eu_regions()
        us_regions = litellm.AmazonBedrockGlobalConfig().get_us_regions()
    elif custom_llm_provider == "watsonx":
        eu_regions = litellm.IBMWatsonXAIConfig().get_eu_regions()
        us_regions = litellm.IBMWatsonXAIConfig().get_us_regions()
    else:
-        return False
+        eu_regions = []
        us_regions = []
    for region in eu_regions:
        if region in model_region.lower():
            return "eu"
    for region in us_regions:
        if region in model_region.lower():
            return "us"
    return None
 def _is_region_eu(litellm_params: LiteLLM_Params) -> bool:
    """
    Return true/false if a deployment is in the EU
    """
    if litellm_params.region_name == "eu":
        return True
    ## Else - try and infer from model region
    model_region = _infer_model_region(litellm_params=litellm_params)
    if model_region is not None and model_region == "eu":
        return True
    return False
 def _is_region_us(litellm_params: LiteLLM_Params) -> bool:
    """
    Return true/false if a deployment is in the US
    """
    if litellm_params.region_name == "us":
        return True
    ## Else - try and infer from model region
    model_region = _infer_model_region(litellm_params=litellm_params)
    if model_region is not None and model_region == "us":
        return True
    return False
 def is_region_allowed(
    litellm_params: LiteLLM_Params, allowed_model_region: str
 ) -> bool:
    """
    Return true/false if a deployment is in the EU
    """
    if litellm_params.region_name == allowed_model_region:
        return True
    return False
--- a/tests/llm_translation/test_max_completion_tokens.py
+++ b/tests/llm_translation/test_max_completion_tokens.py
@ -226,7 +226,7 @@ def test_all_model_configs():
        optional_params={},
    ) == {"max_tokens": 10}
-    from litellm.llms.AzureOpenAI.azure import AzureOpenAIConfig
+    from litellm.llms.AzureOpenAI.chat.gpt_transformation import AzureOpenAIConfig
    assert "max_completion_tokens" in AzureOpenAIConfig().get_supported_openai_params()
    assert AzureOpenAIConfig().map_openai_params(
--- a/tests/local_testing/test_dynamic_rate_limit_handler.py
+++ b/tests/local_testing/test_dynamic_rate_limit_handler.py
@ -82,6 +82,7 @@ def user_api_key_auth() -> UserAPIKeyAuth:
@pytest.mark.parametrize("num_projects", [1, 2, 100])
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
 async def test_available_tpm(num_projects, dynamic_rate_limit_handler):
    model = "my-fake-model"
    ## SET CACHE W/ ACTIVE PROJECTS
--- a/tests/local_testing/test_guardrails_ai.py
+++ b/tests/local_testing/test_guardrails_ai.py
@ -0,0 +1,28 @@
 import os
 import sys
 import traceback
 sys.path.insert(
    0, os.path.abspath("../..")
 )  # Adds the parent directory to the system path
 import litellm
 from litellm.proxy.guardrails.init_guardrails import init_guardrails_v2
 def test_guardrails_ai():
    litellm.set_verbose = True
    litellm.guardrail_name_config_map = {}
    init_guardrails_v2(
        all_guardrails=[
            {
                "guardrail_name": "gibberish-guard",
                "litellm_params": {
                    "guardrail": "guardrails_ai",
                    "guard_name": "gibberish_guard",
                    "mode": "post_call",
                },
            }
        ],
        config_file_path="",
    )
--- a/tests/local_testing/test_prompt_factory.py
+++ b/tests/local_testing/test_prompt_factory.py
@ -436,3 +436,24 @@ def test_vertex_only_image_user_message():
 def test_convert_url():
    convert_url_to_base64("https://picsum.photos/id/237/200/300")
 def test_azure_tool_call_invoke_helper():
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the weather in Copenhagen?"},
        {"role": "assistant", "function_call": {"name": "get_weather"}},
    ]
    transformed_messages = litellm.AzureOpenAIConfig.transform_request(
        model="gpt-4o", messages=messages, optional_params={}
    )
    assert transformed_messages["messages"] == [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the weather in Copenhagen?"},
        {
            "role": "assistant",
            "function_call": {"name": "get_weather", "arguments": ""},
        },
    ]
--- a/tests/local_testing/test_proxy_custom_logger.py
+++ b/tests/local_testing/test_proxy_custom_logger.py
@ -72,7 +72,7 @@ def test_embedding(client):
        # assert len(litellm.callbacks) == 1 # assert litellm is initialized with 1 callback
        print("my_custom_logger", my_custom_logger)
-        assert my_custom_logger.async_success_embedding == False
+        assert my_custom_logger.async_success_embedding is False
        test_data = {"model": "azure-embedding-model", "input": ["hello"]}
        response = client.post("/embeddings", json=test_data, headers=headers)
@ -84,7 +84,7 @@ def test_embedding(client):
            id(my_custom_logger),
        )
        assert (
-            my_custom_logger.async_success_embedding == True
+            my_custom_logger.async_success_embedding is True
        )  # checks if the status of async_success is True, only the async_log_success_event can set this to true
        assert (
            my_custom_logger.async_embedding_kwargs["model"] == "azure-embedding-model"
@ -107,7 +107,6 @@ def test_embedding(client):
                "accept-encoding": "gzip, deflate",
                "connection": "keep-alive",
                "user-agent": "testclient",
                "authorization": "Bearer sk-1234",
                "content-length": "54",
                "content-type": "application/json",
            },
@ -194,6 +193,8 @@ def test_chat_completion(client):
            "mode": "chat",
            "db_model": False,
        }
        assert "authorization" not in proxy_server_request_object["headers"]
        assert proxy_server_request_object == {
            "url": "http://testserver/chat/completions",
            "method": "POST",
@ -203,7 +204,6 @@ def test_chat_completion(client):
                "accept-encoding": "gzip, deflate",
                "connection": "keep-alive",
                "user-agent": "testclient",
                "authorization": "Bearer sk-1234",
                "content-length": "123",
                "content-type": "application/json",
            },
--- a/tests/local_testing/test_proxy_server.py
+++ b/tests/local_testing/test_proxy_server.py
@ -173,6 +173,96 @@ def test_chat_completion(mock_acompletion, client_no_auth):
        pytest.fail(f"LiteLLM Proxy test failed. Exception - {str(e)}")
@pytest.mark.parametrize(
    "litellm_key_header_name",
    ["x-litellm-key", None],
 )
 def test_add_headers_to_request(litellm_key_header_name):
    from fastapi import Request
    from starlette.datastructures import URL
    import json
    from litellm.proxy.litellm_pre_call_utils import (
        clean_headers,
        get_forwardable_headers,
    )
    headers = {
        "Authorization": "Bearer 1234",
        "X-Custom-Header": "Custom-Value",
        "X-Stainless-Header": "Stainless-Value",
    }
    request = Request(scope={"type": "http"})
    request._url = URL(url="/chat/completions")
    request._body = json.dumps({"model": "gpt-3.5-turbo"}).encode("utf-8")
    request_headers = clean_headers(headers, litellm_key_header_name)
    forwarded_headers = get_forwardable_headers(request_headers)
    assert forwarded_headers == {"X-Custom-Header": "Custom-Value"}
@pytest.mark.parametrize(
    "litellm_key_header_name",
    ["x-litellm-key", None],
 )
@mock_patch_acompletion()
 def test_chat_completion_forward_headers(
    mock_acompletion, client_no_auth, litellm_key_header_name
 ):
    global headers
    try:
        if litellm_key_header_name is not None:
            gs = getattr(litellm.proxy.proxy_server, "general_settings")
            gs["litellm_key_header_name"] = litellm_key_header_name
            setattr(litellm.proxy.proxy_server, "general_settings", gs)
        # Your test data
        test_data = {
            "model": "gpt-3.5-turbo",
            "messages": [
                {"role": "user", "content": "hi"},
            ],
            "max_tokens": 10,
        }
        headers_to_forward = {
            "X-Custom-Header": "Custom-Value",
            "X-Another-Header": "Another-Value",
        }
        if litellm_key_header_name is not None:
            headers_to_not_forward = {litellm_key_header_name: "Bearer 1234"}
        else:
            headers_to_not_forward = {"Authorization": "Bearer 1234"}
        received_headers = {**headers_to_forward, **headers_to_not_forward}
        print("testing proxy server with chat completions")
        response = client_no_auth.post(
            "/v1/chat/completions", json=test_data, headers=received_headers
        )
        mock_acompletion.assert_called_once_with(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "user", "content": "hi"},
            ],
            max_tokens=10,
            litellm_call_id=mock.ANY,
            litellm_logging_obj=mock.ANY,
            request_timeout=mock.ANY,
            specific_deployment=True,
            metadata=mock.ANY,
            proxy_server_request=mock.ANY,
            headers={
                "x-custom-header": "Custom-Value",
                "x-another-header": "Another-Value",
            },
        )
        print(f"response - {response.text}")
        assert response.status_code == 200
        result = response.json()
        print(f"Received response: {result}")
    except Exception as e:
        pytest.fail(f"LiteLLM Proxy test failed. Exception - {str(e)}")
@mock_patch_acompletion()
@pytest.mark.asyncio
 async def test_team_disable_guardrails(mock_acompletion, client_no_auth):
--- a/tests/local_testing/test_router.py
+++ b/tests/local_testing/test_router.py
@ -1050,7 +1050,7 @@ def test_filter_invalid_params_pre_call_check():
        pytest.fail(f"Got unexpected exception on router! - {str(e)}")
-@pytest.mark.parametrize("allowed_model_region", ["eu", None])
+@pytest.mark.parametrize("allowed_model_region", ["eu", None, "us"])
 def test_router_region_pre_call_check(allowed_model_region):
    """
    If region based routing set
@ -1065,7 +1065,7 @@ def test_router_region_pre_call_check(allowed_model_region):
                "api_version": os.getenv("AZURE_API_VERSION"),
                "api_base": os.getenv("AZURE_API_BASE"),
                "base_model": "azure/gpt-35-turbo",
-                "region_name": "eu",
+                "region_name": allowed_model_region,
            },
            "model_info": {"id": "1"},
        },
@ -1091,7 +1091,9 @@ def test_router_region_pre_call_check(allowed_model_region):
    if allowed_model_region is None:
        assert len(_healthy_deployments) == 2
    else:
-        assert len(_healthy_deployments) == 1, "No models selected as healthy"
+        assert len(_healthy_deployments) == 1, "{} models selected as healthy".format(
            len(_healthy_deployments)
        )
        assert (
            _healthy_deployments[0]["model_info"]["id"] == "1"
        ), "Incorrect model id picked. Got id={}, expected id=1".format(
--- a/tests/local_testing/test_spend_logs.py
+++ b/tests/local_testing/test_spend_logs.py
@ -102,7 +102,6 @@ def test_spend_logs_payload():
                    "method": "POST",
                    "headers": {
                        "content-type": "application/json",
                        "authorization": "Bearer sk-1234",
                        "user-agent": "PostmanRuntime/7.32.3",
                        "accept": "*/*",
                        "postman-token": "92300061-eeaa-423b-a420-0b44896ecdc4",