Merge branch 'main' into litellm_gemini_context_caching

2024-08-26 22:22:17 -07:00 · 2024-08-26 22:22:17 -07:00 · 08bd4788dc
commit 08bd4788dc
parent 75bb9ff7fe 5aad9d2db7
78 changed files with 1284 additions and 354 deletions
--- a/docs/my-website/docs/caching/all_caches.md
+++ b/docs/my-website/docs/caching/all_caches.md
@ -193,7 +193,7 @@ response2 = completion(
    ],
    max_tokens=20,
 )
-print(f"response2: {response1}")
+print(f"response2: {response2}")
 assert response1.id == response2.id
 # response1 == response2, response 1 is cached
 ```
--- a/docs/my-website/docs/index.md
+++ b/docs/my-website/docs/index.md
@ -14,7 +14,7 @@ https://github.com/BerriAI/litellm

 ## How to use LiteLLM
 You can use litellm through either:
-1. [LiteLLM Proxy Server](#openai-proxy) - Server (LLM Gateway) to call 100+ LLMs, load balance, cost tracking across projects
+1. [LiteLLM Proxy Server](#litellm-proxy-server-llm-gateway) - Server (LLM Gateway) to call 100+ LLMs, load balance, cost tracking across projects
 2. [LiteLLM python SDK](#basic-usage) - Python Client to call 100+ LLMs, load balance, cost tracking

 ### **When to use LiteLLM Proxy Server (LLM Gateway)**
--- a/docs/my-website/docs/providers/litellm_proxy.md
+++ b/docs/my-website/docs/providers/litellm_proxy.md
@ -0,0 +1,89 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# LiteLLM Proxy (LLM Gateway)
+
+:::tip
+
+[LiteLLM Providers a **self hosted** proxy server (AI Gateway)](../simple_proxy) to call all the LLMs in the OpenAI format
+
+:::
+
+**[LiteLLM Proxy](../simple_proxy) is OpenAI compatible**, you just need the `openai/` prefix before the model
+
+## Required Variables
+
+```python
+os.environ["OPENAI_API_KEY"] = "" # "sk-1234" your litellm proxy api key 
+os.environ["OPENAI_API_BASE"] = "" # "http://localhost:4000" your litellm proxy api base
+```
+
+
+## Usage (Non Streaming)
+```python
+import os 
+import litellm
+from litellm import completion
+
+os.environ["OPENAI_API_KEY"] = ""
+
+# set custom api base to your proxy
+# either set .env or litellm.api_base
+# os.environ["OPENAI_API_BASE"] = ""
+litellm.api_base = "your-openai-proxy-url"
+
+
+messages = [{ "content": "Hello, how are you?","role": "user"}]
+
+# openai call
+response = completion(model="openai/your-model-name", messages)
+```
+
+## Usage - passing `api_base`, `api_key` per request
+
+If you need to set api_base dynamically, just pass it in completions instead - completions(...,api_base="your-proxy-api-base")
+
+```python
+import os 
+import litellm
+from litellm import completion
+
+os.environ["OPENAI_API_KEY"] = ""
+
+messages = [{ "content": "Hello, how are you?","role": "user"}]
+
+# openai call
+response = completion(
+    model="openai/your-model-name", 
+    messages, 
+    api_base = "your-litellm-proxy-url",
+    api_key = "your-litellm-proxy-api-key"
+)
+```
+## Usage - Streaming
+
+```python
+import os 
+import litellm
+from litellm import completion
+
+os.environ["OPENAI_API_KEY"] = ""
+
+messages = [{ "content": "Hello, how are you?","role": "user"}]
+
+# openai call
+response = completion(
+    model="openai/your-model-name", 
+    messages, 
+    api_base = "your-litellm-proxy-url", 
+    stream=True
+)
+
+for chunk in response:
+    print(chunk)
+```
+
+
+## **Usage with Langchain, LLamaindex, OpenAI Js, Anthropic SDK, Instructor**
+
+#### [Follow this doc to see how to use litellm proxy with langchain, llamaindex, anthropic etc](../proxy/user_keys)
--- a/docs/my-website/docs/providers/vertex.md
+++ b/docs/my-website/docs/providers/vertex.md
@ -1194,6 +1194,14 @@ response = completion(
 |------------------|--------------------------------------|
 | gemini-pro   | `completion('gemini-pro', messages)`, `completion('vertex_ai/gemini-pro', messages)` |

+## Fine-tuned Models
+
+Fine tuned models on vertex have a numerical model/endpoint id. 
+
+| Model Name       | Function Call                        |
+|------------------|--------------------------------------|
+| your fine tuned model   | `completion(model='vertex_ai/4965075652664360960', messages)`|
+
 ## Gemini Pro Vision
 | Model Name       | Function Call                        |
 |------------------|--------------------------------------|
--- a/docs/my-website/docs/proxy/guardrails/aporia_api.md
+++ b/docs/my-website/docs/proxy/guardrails/aporia_api.md
@ -71,7 +71,7 @@ litellm --config config.yaml --detailed_debug

 ## 4. Test request 

-**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys##request-format)**
+**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys#request-format)**

 <Tabs>
 <TabItem label="Unsuccessful call" value = "not-allowed">
--- a/docs/my-website/docs/proxy/guardrails/bedrock.md
+++ b/docs/my-website/docs/proxy/guardrails/bedrock.md
@ -40,7 +40,7 @@ litellm --config config.yaml --detailed_debug

 ### 3. Test request 

-**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys##request-format)**
+**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys#request-format)**

 <Tabs>
 <TabItem label="Unsuccessful call" value = "not-allowed">
--- a/docs/my-website/docs/proxy/guardrails/custom_guardrail.md
+++ b/docs/my-website/docs/proxy/guardrails/custom_guardrail.md
@ -202,7 +202,7 @@ litellm --config config.yaml --detailed_debug
 #### Test `"custom-pre-guard"`


-**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys##request-format)**
+**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys#request-format)**

 <Tabs>
 <TabItem label="Modify input" value = "not-allowed">
@ -282,7 +282,7 @@ curl -i http://localhost:4000/v1/chat/completions \
 #### Test `"custom-during-guard"`


-**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys##request-format)**
+**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys#request-format)**

 <Tabs>
 <TabItem label="Unsuccessful call" value = "not-allowed">
@ -346,7 +346,7 @@ curl -i http://localhost:4000/v1/chat/completions \



-**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys##request-format)**
+**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys#request-format)**

 <Tabs>
 <TabItem label="Unsuccessful call" value = "not-allowed">
--- a/docs/my-website/docs/proxy/guardrails/lakera_ai.md
+++ b/docs/my-website/docs/proxy/guardrails/lakera_ai.md
@ -46,7 +46,7 @@ litellm --config config.yaml --detailed_debug

 ### 3. Test request 

-**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys##request-format)**
+**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys#request-format)**

 <Tabs>
 <TabItem label="Unsuccessful call" value = "not-allowed">
--- a/docs/my-website/docs/proxy/guardrails/quick_start.md
+++ b/docs/my-website/docs/proxy/guardrails/quick_start.md
@ -48,7 +48,7 @@ litellm --config config.yaml --detailed_debug

 ## 3. Test request 

-**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys##request-format)**
+**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys#request-format)**

 <Tabs>
 <TabItem label="Unsuccessful call" value = "not-allowed">
--- a/docs/my-website/docs/proxy/user_keys.md
+++ b/docs/my-website/docs/proxy/user_keys.md
@ -810,6 +810,9 @@ print(result)
 </TabItem>
 </Tabs>

+## Using with Vertex, Boto3, Anthropic SDK (Native format)
+
+👉 **[Here's how to use litellm proxy with Vertex, boto3, Anthropic SDK - in the native format](../pass_through/vertex_ai.md)**

 ## Advanced

--- a/docs/my-website/docs/tutorials/litellm_proxy_aporia.md
+++ b/docs/my-website/docs/tutorials/litellm_proxy_aporia.md
@ -72,7 +72,7 @@ litellm --config config.yaml --detailed_debug

 ## 4. Test request 

-**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys##request-format)**
+**[Langchain, OpenAI SDK Usage Examples](../proxy/user_keys#request-format)**

 <Tabs>
 <TabItem label="Unsuccessful call" value = "not-allowed">
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -128,6 +128,7 @@ const sidebars = {
        "providers/anthropic", 
        "providers/aws_sagemaker",
        "providers/bedrock", 
+        "providers/litellm_proxy", 
        "providers/mistral", 
        "providers/codestral",
        "providers/cohere", 
--- a/litellm/init.py
+++ b/litellm/init.py
@ -838,7 +838,7 @@ from .llms.databricks import DatabricksConfig, DatabricksEmbeddingConfig
 from .llms.predibase import PredibaseConfig
 from .llms.anthropic_text import AnthropicTextConfig
 from .llms.replicate import ReplicateConfig
-from .llms.cohere import CohereConfig
+from .llms.cohere.completion import CohereConfig
 from .llms.clarifai import ClarifaiConfig
 from .llms.ai21 import AI21Config
 from .llms.together_ai import TogetherAIConfig
--- a/litellm/litellm_core_utils/streaming_utils.py
+++ b/litellm/litellm_core_utils/streaming_utils.py
@ -10,7 +10,5 @@ def generic_chunk_has_all_required_fields(chunk: dict) -> bool:
    """
    _all_fields = GChunk.__annotations__

-    # this is an optional field in GenericStreamingChunk, it's not required to be present
-    _all_fields.pop("provider_specific_fields", None)
-
-    return all(key in chunk for key in _all_fields)
+    decision = all(key in _all_fields for key in chunk)
+    return decision
--- a/litellm/llms/cohere/chat.py
+++ b/litellm/llms/cohere/chat.py
@ -13,7 +13,7 @@ import litellm
 from litellm.types.llms.cohere import ToolResultObject
 from litellm.utils import Choices, Message, ModelResponse, Usage

-from .prompt_templates.factory import cohere_message_pt, cohere_messages_pt_v2
+from ..prompt_templates.factory import cohere_message_pt, cohere_messages_pt_v2


 class CohereError(Exception):
--- a/litellm/llms/cohere/completion.py
+++ b/litellm/llms/cohere/completion.py
@ -1,6 +1,5 @@
-#################### OLD ########################
-##### See `cohere_chat.py` for `/chat` calls ####
-#################################################
+##### Calls /generate endpoint #######
+
 import json
 import os
 import time
@ -252,163 +251,3 @@ def completion(
        )
        setattr(model_response, "usage", usage)
        return model_response
-
-
-def _process_embedding_response(
-    embeddings: list,
-    model_response: litellm.EmbeddingResponse,
-    model: str,
-    encoding: Any,
-    input: list,
-) -> litellm.EmbeddingResponse:
-    output_data = []
-    for idx, embedding in enumerate(embeddings):
-        output_data.append(
-            {"object": "embedding", "index": idx, "embedding": embedding}
-        )
-    model_response.object = "list"
-    model_response.data = output_data
-    model_response.model = model
-    input_tokens = 0
-    for text in input:
-        input_tokens += len(encoding.encode(text))
-
-    setattr(
-        model_response,
-        "usage",
-        Usage(
-            prompt_tokens=input_tokens, completion_tokens=0, total_tokens=input_tokens
-        ),
-    )
-
-    return model_response
-
-
-async def async_embedding(
-    model: str,
-    data: dict,
-    input: list,
-    model_response: litellm.utils.EmbeddingResponse,
-    timeout: Union[float, httpx.Timeout],
-    logging_obj: LiteLLMLoggingObj,
-    optional_params: dict,
-    api_base: str,
-    api_key: Optional[str],
-    headers: dict,
-    encoding: Callable,
-    client: Optional[AsyncHTTPHandler] = None,
-):
-
-    ## LOGGING
-    logging_obj.pre_call(
-        input=input,
-        api_key=api_key,
-        additional_args={
-            "complete_input_dict": data,
-            "headers": headers,
-            "api_base": api_base,
-        },
-    )
-    ## COMPLETION CALL
-    if client is None:
-        client = AsyncHTTPHandler(concurrent_limit=1)
-
-    response = await client.post(api_base, headers=headers, data=json.dumps(data))
-
-    ## LOGGING
-    logging_obj.post_call(
-        input=input,
-        api_key=api_key,
-        additional_args={"complete_input_dict": data},
-        original_response=response,
-    )
-
-    embeddings = response.json()["embeddings"]
-
-    ## PROCESS RESPONSE ##
-    return _process_embedding_response(
-        embeddings=embeddings,
-        model_response=model_response,
-        model=model,
-        encoding=encoding,
-        input=input,
-    )
-
-
-def embedding(
-    model: str,
-    input: list,
-    model_response: litellm.EmbeddingResponse,
-    logging_obj: LiteLLMLoggingObj,
-    optional_params: dict,
-    headers: dict,
-    encoding: Any,
-    api_key: Optional[str] = None,
-    aembedding: Optional[bool] = None,
-    timeout: Union[float, httpx.Timeout] = httpx.Timeout(None),
-    client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
-):
-    headers = validate_environment(api_key, headers=headers)
-    embed_url = "https://api.cohere.ai/v1/embed"
-    model = model
-    data = {"model": model, "texts": input, **optional_params}
-
-    if "3" in model and "input_type" not in data:
-        # cohere v3 embedding models require input_type, if no input_type is provided, default to "search_document"
-        data["input_type"] = "search_document"
-
-    ## LOGGING
-    logging_obj.pre_call(
-        input=input,
-        api_key=api_key,
-        additional_args={"complete_input_dict": data},
-    )
-
-    ## ROUTING
-    if aembedding is True:
-        return async_embedding(
-            model=model,
-            data=data,
-            input=input,
-            model_response=model_response,
-            timeout=timeout,
-            logging_obj=logging_obj,
-            optional_params=optional_params,
-            api_base=embed_url,
-            api_key=api_key,
-            headers=headers,
-            encoding=encoding,
-        )
-    ## COMPLETION CALL
-    if client is None or not isinstance(client, HTTPHandler):
-        client = HTTPHandler(concurrent_limit=1)
-    response = client.post(embed_url, headers=headers, data=json.dumps(data))
-    ## LOGGING
-    logging_obj.post_call(
-        input=input,
-        api_key=api_key,
-        additional_args={"complete_input_dict": data},
-        original_response=response,
-    )
-    """
-        response 
-        {
-            'object': "list",
-            'data': [
-            
-            ]
-            'model', 
-            'usage'
-        }
-    """
-    if response.status_code != 200:
-        raise CohereError(message=response.text, status_code=response.status_code)
-    embeddings = response.json()["embeddings"]
-
-    return _process_embedding_response(
-        embeddings=embeddings,
-        model_response=model_response,
-        model=model,
-        encoding=encoding,
-        input=input,
-    )
--- a/litellm/llms/cohere/embed.py
+++ b/litellm/llms/cohere/embed.py
@ -0,0 +1,201 @@
+import json
+import os
+import time
+import traceback
+import types
+from enum import Enum
+from typing import Any, Callable, Optional, Union
+
+import httpx  # type: ignore
+import requests  # type: ignore
+
+import litellm
+from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLoggingObj
+from litellm.llms.custom_httpx.http_handler import AsyncHTTPHandler, HTTPHandler
+from litellm.utils import Choices, Message, ModelResponse, Usage
+
+
+def validate_environment(api_key, headers: dict):
+    headers.update(
+        {
+            "Request-Source": "unspecified:litellm",
+            "accept": "application/json",
+            "content-type": "application/json",
+        }
+    )
+    if api_key:
+        headers["Authorization"] = f"Bearer {api_key}"
+    return headers
+
+
+class CohereError(Exception):
+    def __init__(self, status_code, message):
+        self.status_code = status_code
+        self.message = message
+        self.request = httpx.Request(
+            method="POST", url="https://api.cohere.ai/v1/generate"
+        )
+        self.response = httpx.Response(status_code=status_code, request=self.request)
+        super().__init__(
+            self.message
+        )  # Call the base class constructor with the parameters it needs
+
+
+def _process_embedding_response(
+    embeddings: list,
+    model_response: litellm.EmbeddingResponse,
+    model: str,
+    encoding: Any,
+    input: list,
+) -> litellm.EmbeddingResponse:
+    output_data = []
+    for idx, embedding in enumerate(embeddings):
+        output_data.append(
+            {"object": "embedding", "index": idx, "embedding": embedding}
+        )
+    model_response.object = "list"
+    model_response.data = output_data
+    model_response.model = model
+    input_tokens = 0
+    for text in input:
+        input_tokens += len(encoding.encode(text))
+
+    setattr(
+        model_response,
+        "usage",
+        Usage(
+            prompt_tokens=input_tokens, completion_tokens=0, total_tokens=input_tokens
+        ),
+    )
+
+    return model_response
+
+
+async def async_embedding(
+    model: str,
+    data: dict,
+    input: list,
+    model_response: litellm.utils.EmbeddingResponse,
+    timeout: Union[float, httpx.Timeout],
+    logging_obj: LiteLLMLoggingObj,
+    optional_params: dict,
+    api_base: str,
+    api_key: Optional[str],
+    headers: dict,
+    encoding: Callable,
+    client: Optional[AsyncHTTPHandler] = None,
+):
+
+    ## LOGGING
+    logging_obj.pre_call(
+        input=input,
+        api_key=api_key,
+        additional_args={
+            "complete_input_dict": data,
+            "headers": headers,
+            "api_base": api_base,
+        },
+    )
+    ## COMPLETION CALL
+    if client is None:
+        client = AsyncHTTPHandler(concurrent_limit=1)
+
+    response = await client.post(api_base, headers=headers, data=json.dumps(data))
+
+    ## LOGGING
+    logging_obj.post_call(
+        input=input,
+        api_key=api_key,
+        additional_args={"complete_input_dict": data},
+        original_response=response,
+    )
+
+    embeddings = response.json()["embeddings"]
+
+    ## PROCESS RESPONSE ##
+    return _process_embedding_response(
+        embeddings=embeddings,
+        model_response=model_response,
+        model=model,
+        encoding=encoding,
+        input=input,
+    )
+
+
+def embedding(
+    model: str,
+    input: list,
+    model_response: litellm.EmbeddingResponse,
+    logging_obj: LiteLLMLoggingObj,
+    optional_params: dict,
+    headers: dict,
+    encoding: Any,
+    api_key: Optional[str] = None,
+    aembedding: Optional[bool] = None,
+    timeout: Union[float, httpx.Timeout] = httpx.Timeout(None),
+    client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
+):
+    headers = validate_environment(api_key, headers=headers)
+    embed_url = "https://api.cohere.ai/v1/embed"
+    model = model
+    data = {"model": model, "texts": input, **optional_params}
+
+    if "3" in model and "input_type" not in data:
+        # cohere v3 embedding models require input_type, if no input_type is provided, default to "search_document"
+        data["input_type"] = "search_document"
+
+    ## LOGGING
+    logging_obj.pre_call(
+        input=input,
+        api_key=api_key,
+        additional_args={"complete_input_dict": data},
+    )
+
+    ## ROUTING
+    if aembedding is True:
+        return async_embedding(
+            model=model,
+            data=data,
+            input=input,
+            model_response=model_response,
+            timeout=timeout,
+            logging_obj=logging_obj,
+            optional_params=optional_params,
+            api_base=embed_url,
+            api_key=api_key,
+            headers=headers,
+            encoding=encoding,
+        )
+    ## COMPLETION CALL
+    if client is None or not isinstance(client, HTTPHandler):
+        client = HTTPHandler(concurrent_limit=1)
+    response = client.post(embed_url, headers=headers, data=json.dumps(data))
+    ## LOGGING
+    logging_obj.post_call(
+        input=input,
+        api_key=api_key,
+        additional_args={"complete_input_dict": data},
+        original_response=response,
+    )
+    """
+        response 
+        {
+            'object': "list",
+            'data': [
+            
+            ]
+            'model', 
+            'usage'
+        }
+    """
+    if response.status_code != 200:
+        raise CohereError(message=response.text, status_code=response.status_code)
+    embeddings = response.json()["embeddings"]
+
+    return _process_embedding_response(
+        embeddings=embeddings,
+        model_response=model_response,
+        model=model,
+        encoding=encoding,
+        input=input,
+    )
--- a/litellm/llms/databricks.py
+++ b/litellm/llms/databricks.py
@ -7,7 +7,7 @@ import time
 import types
 from enum import Enum
 from functools import partial
-from typing import Callable, List, Literal, Optional, Tuple, Union
+from typing import Any, Callable, List, Literal, Optional, Tuple, Union

 import httpx  # type: ignore
 import requests  # type: ignore
@ -22,7 +22,11 @@ from litellm.types.llms.openai import (
    ChatCompletionToolCallFunctionChunk,
    ChatCompletionUsageBlock,
 )
-from litellm.types.utils import GenericStreamingChunk, ProviderField
+from litellm.types.utils import (
+    CustomStreamingDecoder,
+    GenericStreamingChunk,
+    ProviderField,
+)
 from litellm.utils import CustomStreamWrapper, EmbeddingResponse, ModelResponse, Usage

 from .base import BaseLLM
@ -171,15 +175,21 @@ async def make_call(
    model: str,
    messages: list,
    logging_obj,
+    streaming_decoder: Optional[CustomStreamingDecoder] = None,
 ):
    response = await client.post(api_base, headers=headers, data=data, stream=True)

    if response.status_code != 200:
        raise DatabricksError(status_code=response.status_code, message=response.text)

-    completion_stream = ModelResponseIterator(
-        streaming_response=response.aiter_lines(), sync_stream=False
-    )
+    if streaming_decoder is not None:
+        completion_stream: Any = streaming_decoder.aiter_bytes(
+            response.aiter_bytes(chunk_size=1024)
+        )
+    else:
+        completion_stream = ModelResponseIterator(
+            streaming_response=response.aiter_lines(), sync_stream=False
+        )
    # LOGGING
    logging_obj.post_call(
        input=messages,
@ -199,6 +209,7 @@ def make_sync_call(
    model: str,
    messages: list,
    logging_obj,
+    streaming_decoder: Optional[CustomStreamingDecoder] = None,
 ):
    if client is None:
        client = HTTPHandler()  # Create a new client if none provided
@ -208,9 +219,14 @@ def make_sync_call(
    if response.status_code != 200:
        raise DatabricksError(status_code=response.status_code, message=response.read())

-    completion_stream = ModelResponseIterator(
-        streaming_response=response.iter_lines(), sync_stream=True
-    )
+    if streaming_decoder is not None:
+        completion_stream = streaming_decoder.iter_bytes(
+            response.iter_bytes(chunk_size=1024)
+        )
+    else:
+        completion_stream = ModelResponseIterator(
+            streaming_response=response.iter_lines(), sync_stream=True
+        )

    # LOGGING
    logging_obj.post_call(
@ -283,6 +299,7 @@ class DatabricksChatCompletion(BaseLLM):
        logger_fn=None,
        headers={},
        client: Optional[AsyncHTTPHandler] = None,
+        streaming_decoder: Optional[CustomStreamingDecoder] = None,
    ) -> CustomStreamWrapper:

        data["stream"] = True
@ -296,6 +313,7 @@ class DatabricksChatCompletion(BaseLLM):
                model=model,
                messages=messages,
                logging_obj=logging_obj,
+                streaming_decoder=streaming_decoder,
            ),
            model=model,
            custom_llm_provider=custom_llm_provider,
@ -371,6 +389,9 @@ class DatabricksChatCompletion(BaseLLM):
        timeout: Optional[Union[float, httpx.Timeout]] = None,
        client: Optional[Union[HTTPHandler, AsyncHTTPHandler]] = None,
        custom_endpoint: Optional[bool] = None,
+        streaming_decoder: Optional[
+            CustomStreamingDecoder
+        ] = None,  # if openai-compatible api needs custom stream decoder - e.g. sagemaker
    ):
        custom_endpoint = custom_endpoint or optional_params.pop(
            "custom_endpoint", None
@ -436,6 +457,7 @@ class DatabricksChatCompletion(BaseLLM):
                    headers=headers,
                    client=client,
                    custom_llm_provider=custom_llm_provider,
+                    streaming_decoder=streaming_decoder,
                )
            else:
                return self.acompletion_function(
@ -473,6 +495,7 @@ class DatabricksChatCompletion(BaseLLM):
                        model=model,
                        messages=messages,
                        logging_obj=logging_obj,
+                        streaming_decoder=streaming_decoder,
                    ),
                    model=model,
                    custom_llm_provider=custom_llm_provider,
--- a/litellm/llms/sagemaker/sagemaker.py
+++ b/litellm/llms/sagemaker/sagemaker.py
@ -24,8 +24,11 @@ from litellm.llms.custom_httpx.http_handler import (
 from litellm.types.llms.openai import (
    ChatCompletionToolCallChunk,
    ChatCompletionUsageBlock,
+    OpenAIChatCompletionChunk,
 )
+from litellm.types.utils import CustomStreamingDecoder
 from litellm.types.utils import GenericStreamingChunk as GChunk
+from litellm.types.utils import StreamingChatCompletionChunk
 from litellm.utils import (
    CustomStreamWrapper,
    EmbeddingResponse,
@ -34,8 +37,8 @@ from litellm.utils import (
    get_secret,
 )

-from .base_aws_llm import BaseAWSLLM
-from .prompt_templates.factory import custom_prompt, prompt_factory
+from ..base_aws_llm import BaseAWSLLM
+from ..prompt_templates.factory import custom_prompt, prompt_factory

 _response_stream_shape_cache = None

@ -241,6 +244,10 @@ class SagemakerLLM(BaseAWSLLM):
                aws_region_name=aws_region_name,
            )

+            custom_stream_decoder = AWSEventStreamDecoder(
+                model="", is_messages_api=True
+            )
+
            return openai_like_chat_completions.completion(
                model=model,
                messages=messages,
@ -259,6 +266,7 @@ class SagemakerLLM(BaseAWSLLM):
                headers=prepared_request.headers,
                custom_endpoint=True,
                custom_llm_provider="sagemaker_chat",
+                streaming_decoder=custom_stream_decoder,  # type: ignore
            )

        ## Load Config
@ -332,7 +340,7 @@ class SagemakerLLM(BaseAWSLLM):
                )
                return response
            else:
-                if stream is not None and stream == True:
+                if stream is not None and stream is True:
                    sync_handler = _get_httpx_client()
                    sync_response = sync_handler.post(
                        url=prepared_request.url,
@ -847,12 +855,21 @@ def get_response_stream_shape():


 class AWSEventStreamDecoder:
-    def __init__(self, model: str) -> None:
+    def __init__(self, model: str, is_messages_api: Optional[bool] = None) -> None:
        from botocore.parsers import EventStreamJSONParser

        self.model = model
        self.parser = EventStreamJSONParser()
        self.content_blocks: List = []
+        self.is_messages_api = is_messages_api
+
+    def _chunk_parser_messages_api(
+        self, chunk_data: dict
+    ) -> StreamingChatCompletionChunk:
+
+        openai_chunk = StreamingChatCompletionChunk(**chunk_data)
+
+        return openai_chunk

    def _chunk_parser(self, chunk_data: dict) -> GChunk:
        verbose_logger.debug("in sagemaker chunk parser, chunk_data %s", chunk_data)
@ -868,6 +885,7 @@ class AWSEventStreamDecoder:
                index=_index,
                is_finished=True,
                finish_reason="stop",
+                usage=None,
            )

        return GChunk(
@ -875,9 +893,12 @@ class AWSEventStreamDecoder:
            index=_index,
            is_finished=is_finished,
            finish_reason=finish_reason,
+            usage=None,
        )

-    def iter_bytes(self, iterator: Iterator[bytes]) -> Iterator[GChunk]:
+    def iter_bytes(
+        self, iterator: Iterator[bytes]
+    ) -> Iterator[Optional[Union[GChunk, StreamingChatCompletionChunk]]]:
        """Given an iterator that yields lines, iterate over it & yield every event encountered"""
        from botocore.eventstream import EventStreamBuffer

@ -898,7 +919,10 @@ class AWSEventStreamDecoder:
                    # Try to parse the accumulated JSON
                    try:
                        _data = json.loads(accumulated_json)
-                        yield self._chunk_parser(chunk_data=_data)
+                        if self.is_messages_api:
+                            yield self._chunk_parser_messages_api(chunk_data=_data)
+                        else:
+                            yield self._chunk_parser(chunk_data=_data)
                        # Reset accumulated_json after successful parsing
                        accumulated_json = ""
                    except json.JSONDecodeError:
@ -909,16 +933,20 @@ class AWSEventStreamDecoder:
        if accumulated_json:
            try:
                _data = json.loads(accumulated_json)
-                yield self._chunk_parser(chunk_data=_data)
-            except json.JSONDecodeError:
+                if self.is_messages_api:
+                    yield self._chunk_parser_messages_api(chunk_data=_data)
+                else:
+                    yield self._chunk_parser(chunk_data=_data)
+            except json.JSONDecodeError as e:
                # Handle or log any unparseable data at the end
                verbose_logger.error(
                    f"Warning: Unparseable JSON data remained: {accumulated_json}"
                )
+                yield None

    async def aiter_bytes(
        self, iterator: AsyncIterator[bytes]
-    ) -> AsyncIterator[GChunk]:
+    ) -> AsyncIterator[Optional[Union[GChunk, StreamingChatCompletionChunk]]]:
        """Given an async iterator that yields lines, iterate over it & yield every event encountered"""
        from botocore.eventstream import EventStreamBuffer

@ -940,7 +968,10 @@ class AWSEventStreamDecoder:
                    # Try to parse the accumulated JSON
                    try:
                        _data = json.loads(accumulated_json)
-                        yield self._chunk_parser(chunk_data=_data)
+                        if self.is_messages_api:
+                            yield self._chunk_parser_messages_api(chunk_data=_data)
+                        else:
+                            yield self._chunk_parser(chunk_data=_data)
                        # Reset accumulated_json after successful parsing
                        accumulated_json = ""
                    except json.JSONDecodeError:
@ -951,12 +982,16 @@ class AWSEventStreamDecoder:
        if accumulated_json:
            try:
                _data = json.loads(accumulated_json)
-                yield self._chunk_parser(chunk_data=_data)
+                if self.is_messages_api:
+                    yield self._chunk_parser_messages_api(chunk_data=_data)
+                else:
+                    yield self._chunk_parser(chunk_data=_data)
            except json.JSONDecodeError:
                # Handle or log any unparseable data at the end
                verbose_logger.error(
                    f"Warning: Unparseable JSON data remained: {accumulated_json}"
                )
+                yield None

    def _parse_message_from_event(self, event) -> Optional[str]:
        response_dict = event.to_response_dict()
--- a/litellm/llms/vertex_ai_and_google_ai_studio/vertex_and_google_ai_studio_gemini.py
+++ b/litellm/llms/vertex_ai_and_google_ai_studio/vertex_and_google_ai_studio_gemini.py
@ -32,6 +32,7 @@ from litellm.types.llms.openai import (
    ChatCompletionResponseMessage,
    ChatCompletionToolCallChunk,
    ChatCompletionToolCallFunctionChunk,
+    ChatCompletionToolParamFunctionChunk,
    ChatCompletionUsageBlock,
 )
 from litellm.types.llms.vertex_ai import (
@ -303,10 +304,48 @@ class GoogleAIStudioGeminiConfig:  # key diff from VertexAI - 'frequency_penalty
            "stream",
            "tools",
            "tool_choice",
+            "functions",
            "response_format",
            "n",
            "stop",
        ]
+    def _map_function(self, value: List[dict]) -> List[Tools]:
+        gtool_func_declarations = []
+        googleSearchRetrieval: Optional[dict] = None
+
+        for tool in value:
+            openai_function_object: Optional[ChatCompletionToolParamFunctionChunk] = (
+                None
+            )
+            if "function" in tool:  # tools list
+                openai_function_object = ChatCompletionToolParamFunctionChunk(  # type: ignore
+                    **tool["function"]
+                )
+            elif "name" in tool:  # functions list
+                openai_function_object = ChatCompletionToolParamFunctionChunk(**tool)  # type: ignore
+
+            # check if grounding
+            if tool.get("googleSearchRetrieval", None) is not None:
+                googleSearchRetrieval = tool["googleSearchRetrieval"]
+            elif openai_function_object is not None:
+                gtool_func_declaration = FunctionDeclaration(
+                    name=openai_function_object["name"],
+                    description=openai_function_object.get("description", ""),
+                    parameters=openai_function_object.get("parameters", {}),
+                )
+                gtool_func_declarations.append(gtool_func_declaration)
+            else:
+                # assume it's a provider-specific param
+                verbose_logger.warning(
+                    "Invalid tool={}. Use `litellm.set_verbose` or `litellm --detailed_debug` to see raw request."
+                )
+
+        _tools = Tools(
+            function_declarations=gtool_func_declarations,
+        )
+        if googleSearchRetrieval is not None:
+            _tools["googleSearchRetrieval"] = googleSearchRetrieval
+        return [_tools]

    def map_tool_choice_values(
        self, model: str, tool_choice: Union[str, dict]
@ -370,26 +409,11 @@ class GoogleAIStudioGeminiConfig:  # key diff from VertexAI - 'frequency_penalty
                    if "json_schema" in value and "schema" in value["json_schema"]:  # type: ignore
                        optional_params["response_mime_type"] = "application/json"
                        optional_params["response_schema"] = value["json_schema"]["schema"]  # type: ignore
-            if param == "tools" and isinstance(value, list):
-                gtool_func_declarations = []
-                for tool in value:
-                    _parameters = tool.get("function", {}).get("parameters", {})
-                    _properties = _parameters.get("properties", {})
-                    if isinstance(_properties, dict):
-                        for _, _property in _properties.items():
-                            if "enum" in _property and "format" not in _property:
-                                _property["format"] = "enum"
-
-                    gtool_func_declaration = FunctionDeclaration(
-                        name=tool["function"]["name"],
-                        description=tool["function"].get("description", ""),
-                    )
-                    if len(_parameters.keys()) > 0:
-                        gtool_func_declaration["parameters"] = _parameters
-                    gtool_func_declarations.append(gtool_func_declaration)
-                optional_params["tools"] = [
-                    Tools(function_declarations=gtool_func_declarations)
-                ]
+            if (param == "tools" or param == "functions") and isinstance(value, list):
+                optional_params["tools"] = self._map_function(value=value)
+                optional_params["litellm_param_is_function_call"] = (
+                    True if param == "functions" else False
+                )
            if param == "tool_choice" and (
                isinstance(value, str) or isinstance(value, dict)
            ):
@ -513,6 +537,7 @@ class VertexGeminiConfig:
            "max_tokens",
            "stream",
            "tools",
+            "functions",
            "tool_choice",
            "response_format",
            "n",
@ -548,6 +573,44 @@ class VertexGeminiConfig:
                status_code=400,
            )

+    def _map_function(self, value: List[dict]) -> List[Tools]:
+        gtool_func_declarations = []
+        googleSearchRetrieval: Optional[dict] = None
+
+        for tool in value:
+            openai_function_object: Optional[ChatCompletionToolParamFunctionChunk] = (
+                None
+            )
+            if "function" in tool:  # tools list
+                openai_function_object = ChatCompletionToolParamFunctionChunk(  # type: ignore
+                    **tool["function"]
+                )
+            elif "name" in tool:  # functions list
+                openai_function_object = ChatCompletionToolParamFunctionChunk(**tool)  # type: ignore
+
+            # check if grounding
+            if tool.get("googleSearchRetrieval", None) is not None:
+                googleSearchRetrieval = tool["googleSearchRetrieval"]
+            elif openai_function_object is not None:
+                gtool_func_declaration = FunctionDeclaration(
+                    name=openai_function_object["name"],
+                    description=openai_function_object.get("description", ""),
+                    parameters=openai_function_object.get("parameters", {}),
+                )
+                gtool_func_declarations.append(gtool_func_declaration)
+            else:
+                # assume it's a provider-specific param
+                verbose_logger.warning(
+                    "Invalid tool={}. Use `litellm.set_verbose` or `litellm --detailed_debug` to see raw request."
+                )
+
+        _tools = Tools(
+            function_declarations=gtool_func_declarations,
+        )
+        if googleSearchRetrieval is not None:
+            _tools["googleSearchRetrieval"] = googleSearchRetrieval
+        return [_tools]
+
    def map_openai_params(
        self,
        model: str,
@ -589,33 +652,11 @@ class VertexGeminiConfig:
                optional_params["frequency_penalty"] = value
            if param == "presence_penalty":
                optional_params["presence_penalty"] = value
-            if param == "tools" and isinstance(value, list):
-                gtool_func_declarations = []
-                googleSearchRetrieval: Optional[dict] = None
-                provider_specific_tools: List[dict] = []
-                for tool in value:
-                    # check if grounding
-                    try:
-                        gtool_func_declaration = FunctionDeclaration(
-                            name=tool["function"]["name"],
-                            description=tool["function"].get("description", ""),
-                            parameters=tool["function"].get("parameters", {}),
-                        )
-                        gtool_func_declarations.append(gtool_func_declaration)
-                    except KeyError:
-                        if tool.get("googleSearchRetrieval", None) is not None:
-                            googleSearchRetrieval = tool["googleSearchRetrieval"]
-                        else:
-                            # assume it's a provider-specific param
-                            verbose_logger.warning(
-                                "Got KeyError parsing tool={}. Assuming it's a provider-specific param. Use `litellm.set_verbose` or `litellm --detailed_debug` to see raw request."
-                            )
-                _tools = Tools(
-                    function_declarations=gtool_func_declarations,
+            if (param == "tools" or param == "functions") and isinstance(value, list):
+                optional_params["tools"] = self._map_function(value=value)
+                optional_params["litellm_param_is_function_call"] = (
+                    True if param == "functions" else False
                )
-                if googleSearchRetrieval is not None:
-                    _tools["googleSearchRetrieval"] = googleSearchRetrieval
-                optional_params["tools"] = [_tools] + provider_specific_tools
            if param == "tool_choice" and (
                isinstance(value, str) or isinstance(value, dict)
            ):
@ -774,6 +815,7 @@ class VertexLLM(BaseLLM):
        model_response: ModelResponse,
        logging_obj: litellm.litellm_core_utils.litellm_logging.Logging,
        optional_params: dict,
+        litellm_params: dict,
        api_key: str,
        data: Union[dict, str],
        messages: List,
@ -790,7 +832,6 @@ class VertexLLM(BaseLLM):
        )

        print_verbose(f"raw model_response: {response.text}")
-
        ## RESPONSE OBJECT
        try:
            completion_response = GenerateContentResponseBody(**response.json())  # type: ignore
@ -898,6 +939,7 @@ class VertexLLM(BaseLLM):
            chat_completion_message = {"role": "assistant"}
            content_str = ""
            tools: List[ChatCompletionToolCallChunk] = []
+            functions: Optional[ChatCompletionToolCallFunctionChunk] = None
            for idx, candidate in enumerate(completion_response["candidates"]):
                if "content" not in candidate:
                    continue
@ -920,19 +962,24 @@ class VertexLLM(BaseLLM):
                            candidate["content"]["parts"][0]["functionCall"]["args"]
                        ),
                    )
-                    _tool_response_chunk = ChatCompletionToolCallChunk(
-                        id=f"call_{str(uuid.uuid4())}",
-                        type="function",
-                        function=_function_chunk,
-                        index=candidate.get("index", idx),
-                    )
-                    tools.append(_tool_response_chunk)
-
+                    if litellm_params.get("litellm_param_is_function_call") is True:
+                        functions = _function_chunk
+                    else:
+                        _tool_response_chunk = ChatCompletionToolCallChunk(
+                            id=f"call_{str(uuid.uuid4())}",
+                            type="function",
+                            function=_function_chunk,
+                            index=candidate.get("index", idx),
+                        )
+                        tools.append(_tool_response_chunk)
                chat_completion_message["content"] = (
                    content_str if len(content_str) > 0 else None
                )
-                chat_completion_message["tool_calls"] = tools
+                if len(tools) > 0:
+                    chat_completion_message["tool_calls"] = tools

+                if functions is not None:
+                    chat_completion_message["function_call"] = functions
                choice = litellm.Choices(
                    finish_reason=candidate.get("finishReason", "stop"),
                    index=candidate.get("index", idx),
@ -1155,6 +1202,15 @@ class VertexLLM(BaseLLM):
            else:
                url = f"https://{vertex_location}-aiplatform.googleapis.com/{version}/projects/{vertex_project}/locations/{vertex_location}/publishers/google/models/{model}:{endpoint}"

+            # if model is only numeric chars then it's a fine tuned gemini model
+            # model = 4965075652664360960
+            # send to this url: url = f"https://{vertex_location}-aiplatform.googleapis.com/{version}/projects/{vertex_project}/locations/{vertex_location}/endpoints/{model}:{endpoint}"
+            if model.isdigit():
+                # It's a fine-tuned Gemini model
+                url = f"https://{vertex_location}-aiplatform.googleapis.com/{version}/projects/{vertex_project}/locations/{vertex_location}/endpoints/{model}:{endpoint}"
+                if stream is True:
+                    url += "?alt=sse"
+
        if (
            api_base is not None
        ):  # for cloudflare ai gateway - https://github.com/BerriAI/litellm/issues/4317
@ -1220,7 +1276,7 @@ class VertexLLM(BaseLLM):
        logging_obj,
        stream,
        optional_params: dict,
-        litellm_params=None,
+        litellm_params: dict,
        logger_fn=None,
        headers={},
        client: Optional[AsyncHTTPHandler] = None,
@ -1254,6 +1310,7 @@ class VertexLLM(BaseLLM):
            messages=messages,
            print_verbose=print_verbose,
            optional_params=optional_params,
+            litellm_params=litellm_params,
            encoding=encoding,
        )

@ -1275,7 +1332,7 @@ class VertexLLM(BaseLLM):
        vertex_location: Optional[str],
        vertex_credentials: Optional[str],
        gemini_api_key: Optional[str],
-        litellm_params=None,
+        litellm_params: dict,
        logger_fn=None,
        extra_headers: Optional[dict] = None,
        client: Optional[Union[AsyncHTTPHandler, HTTPHandler]] = None,
@ -1287,7 +1344,6 @@ class VertexLLM(BaseLLM):
            optional_params=optional_params
        )

-        print_verbose("Incoming Vertex Args - {}".format(locals()))
        auth_header, url = self._get_token_and_url(
            model=model,
            gemini_api_key=gemini_api_key,
@ -1299,7 +1355,6 @@ class VertexLLM(BaseLLM):
            api_base=api_base,
            should_use_v1beta1_features=should_use_v1beta1_features,
        )
-        print_verbose("Updated URL - {}".format(url))

        ## TRANSFORMATION ##
        ### CHECK CONTEXT CACHING ###
@ -1339,6 +1394,18 @@ class VertexLLM(BaseLLM):
                )
                optional_params.pop("response_schema")

+        # Check for any 'litellm_param_*' set during optional param mapping
+
+        remove_keys = []
+        for k, v in optional_params.items():
+            if k.startswith("litellm_param_"):
+                litellm_params.update({k: v})
+                remove_keys.append(k)
+
+        optional_params = {
+            k: v for k, v in optional_params.items() if k not in remove_keys
+        }
+
        try:
            content = _gemini_convert_messages_with_history(messages=messages)
            tools: Optional[Tools] = optional_params.pop("tools", None)
@ -1470,6 +1537,7 @@ class VertexLLM(BaseLLM):
            model_response=model_response,
            logging_obj=logging_obj,
            optional_params=optional_params,
+            litellm_params=litellm_params,
            api_key="",
            data=data,  # type: ignore
            messages=messages,
--- a/litellm/main.py
+++ b/litellm/main.py
@ -82,8 +82,6 @@ from .llms import (
    bedrock,
    clarifai,
    cloudflare,
-    cohere,
-    cohere_chat,
    gemini,
    huggingface_restapi,
    maritalk,
@ -105,6 +103,9 @@ from .llms.anthropic_text import AnthropicTextCompletion
 from .llms.azure import AzureChatCompletion, _check_dynamic_azure_params
 from .llms.azure_text import AzureTextCompletion
 from .llms.bedrock_httpx import BedrockConverseLLM, BedrockLLM
+from .llms.cohere import chat as cohere_chat
+from .llms.cohere import completion as cohere_completion  # type: ignore
+from .llms.cohere import embed as cohere_embed
 from .llms.custom_llm import CustomLLM, custom_chat_llm_router
 from .llms.databricks import DatabricksChatCompletion
 from .llms.huggingface_restapi import Huggingface
@ -117,7 +118,7 @@ from .llms.prompt_templates.factory import (
    prompt_factory,
    stringify_json_tool_call_content,
 )
-from .llms.sagemaker import SagemakerLLM
+from .llms.sagemaker.sagemaker import SagemakerLLM
 from .llms.text_completion_codestral import CodestralTextCompletion
 from .llms.text_to_speech.vertex_ai import VertexTextToSpeechAPI
 from .llms.triton import TritonChatCompletion
@ -1651,7 +1652,7 @@ def completion(
            if extra_headers is not None:
                headers.update(extra_headers)

-            model_response = cohere.completion(
+            model_response = cohere_completion.completion(
                model=model,
                messages=messages,
                api_base=api_base,
@ -2014,7 +2015,7 @@ def completion(
                model_response=model_response,
                print_verbose=print_verbose,
                optional_params=new_params,
-                litellm_params=litellm_params,
+                litellm_params=litellm_params,  # type: ignore
                logger_fn=logger_fn,
                encoding=encoding,
                vertex_location=vertex_ai_location,
@ -2101,7 +2102,7 @@ def completion(
                    model_response=model_response,
                    print_verbose=print_verbose,
                    optional_params=new_params,
-                    litellm_params=litellm_params,
+                    litellm_params=litellm_params,  # type: ignore
                    logger_fn=logger_fn,
                    encoding=encoding,
                    vertex_location=vertex_ai_location,
@ -3463,7 +3464,7 @@ def embedding(
                headers = extra_headers
            else:
                headers = {}
-            response = cohere.embedding(
+            response = cohere_embed.embedding(
                model=model,
                input=input,
                optional_params=optional_params,
--- a/litellm/model_prices_and_context_window_backup.json
+++ b/litellm/model_prices_and_context_window_backup.json
@ -2189,6 +2189,18 @@
        "mode": "image_generation",
        "source": "https://cloud.google.com/vertex-ai/generative-ai/pricing"
    },
+    "vertex_ai/imagen-3.0-generate-001": {
+        "cost_per_image": 0.04,
+        "litellm_provider": "vertex_ai-image-models",
+        "mode": "image_generation",
+        "source": "https://cloud.google.com/vertex-ai/generative-ai/pricing"
+    },
+    "vertex_ai/imagen-3.0-fast-generate-001": {
+        "cost_per_image": 0.02,
+        "litellm_provider": "vertex_ai-image-models",
+        "mode": "image_generation",
+        "source": "https://cloud.google.com/vertex-ai/generative-ai/pricing"
+    },
    "text-embedding-004": {
        "max_tokens": 3072,
        "max_input_tokens": 3072,
--- a/litellm/proxy/_experimental/out/404.html
+++ b/litellm/proxy/_experimental/out/404.html
--- a/litellm/proxy/_experimental/out/_next/static/LO0Sm6uVF0pa4RdHSL0dN/_buildManifest.js
+++ b/litellm/proxy/_experimental/out/_next/static/LO0Sm6uVF0pa4RdHSL0dN/_buildManifest.js
--- a/litellm/proxy/_experimental/out/_next/static/LO0Sm6uVF0pa4RdHSL0dN/_ssgManifest.js
+++ b/litellm/proxy/_experimental/out/_next/static/LO0Sm6uVF0pa4RdHSL0dN/_ssgManifest.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/131-73d0a4f8e09896fe.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/131-73d0a4f8e09896fe.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/131-cb6bfe24e23e121b.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/131-cb6bfe24e23e121b.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/605-35a95945041f7699.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/605-35a95945041f7699.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/777-50d836152fad178b.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/777-50d836152fad178b.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/777-5360b5460eba0779.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/777-5360b5460eba0779.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/model_hub/page-baad96761e038837.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/model_hub/page-baad96761e038837.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-0034957a9fa387e0.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/onboarding/page-0034957a9fa387e0.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/page-01641b817a14ea88.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/page-01641b817a14ea88.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/page-b77076dbc8208d12.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/page-b77076dbc8208d12.js
--- a/litellm/proxy/_experimental/out/index.html
+++ b/litellm/proxy/_experimental/out/index.html
@ -1 +1 @@
-<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-193a7eac80c8baba.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-f593049e31b05aeb.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-8316d07d1f41e39f.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-9b4fb13a7db53edf.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-193a7eac80c8baba.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/a34f9d1faa5f3315-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/cd10067a0a3408b4.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[26520,[\"665\",\"static/chunks/3014691f-b24e8254c7593934.js\",\"936\",\"static/chunks/2f6dbc85-cac2949a76539886.js\",\"505\",\"static/chunks/505-5ff3c318fddfa35c.js\",\"131\",\"static/chunks/131-cb6bfe24e23e121b.js\",\"684\",\"static/chunks/684-16b194c83a169f6d.js\",\"605\",\"static/chunks/605-8e4b96f972af8eaf.js\",\"777\",\"static/chunks/777-50d836152fad178b.js\",\"931\",\"static/chunks/app/page-b77076dbc8208d12.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/cd10067a0a3408b4.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"cjLC-FNUY9ME2ZrO3jtsn\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_86ef86\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
+<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-193a7eac80c8baba.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-f593049e31b05aeb.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-8316d07d1f41e39f.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-9b4fb13a7db53edf.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-193a7eac80c8baba.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/a34f9d1faa5f3315-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/cd10067a0a3408b4.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[18018,[\"665\",\"static/chunks/3014691f-b24e8254c7593934.js\",\"936\",\"static/chunks/2f6dbc85-cac2949a76539886.js\",\"505\",\"static/chunks/505-5ff3c318fddfa35c.js\",\"131\",\"static/chunks/131-73d0a4f8e09896fe.js\",\"684\",\"static/chunks/684-16b194c83a169f6d.js\",\"605\",\"static/chunks/605-35a95945041f7699.js\",\"777\",\"static/chunks/777-5360b5460eba0779.js\",\"931\",\"static/chunks/app/page-01641b817a14ea88.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/cd10067a0a3408b4.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"LO0Sm6uVF0pa4RdHSL0dN\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_86ef86\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
--- a/litellm/proxy/_experimental/out/index.txt
+++ b/litellm/proxy/_experimental/out/index.txt
@ -1,7 +1,7 @@
 2:I[77831,[],""]
-3:I[26520,["665","static/chunks/3014691f-b24e8254c7593934.js","936","static/chunks/2f6dbc85-cac2949a76539886.js","505","static/chunks/505-5ff3c318fddfa35c.js","131","static/chunks/131-cb6bfe24e23e121b.js","684","static/chunks/684-16b194c83a169f6d.js","605","static/chunks/605-8e4b96f972af8eaf.js","777","static/chunks/777-50d836152fad178b.js","931","static/chunks/app/page-b77076dbc8208d12.js"],""]
+3:I[18018,["665","static/chunks/3014691f-b24e8254c7593934.js","936","static/chunks/2f6dbc85-cac2949a76539886.js","505","static/chunks/505-5ff3c318fddfa35c.js","131","static/chunks/131-73d0a4f8e09896fe.js","684","static/chunks/684-16b194c83a169f6d.js","605","static/chunks/605-35a95945041f7699.js","777","static/chunks/777-5360b5460eba0779.js","931","static/chunks/app/page-01641b817a14ea88.js"],""]
 4:I[5613,[],""]
 5:I[31778,[],""]
-0:["cjLC-FNUY9ME2ZrO3jtsn",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
+0:["LO0Sm6uVF0pa4RdHSL0dN",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/ui/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","meta","5",{"name":"next-size-adjust"}]]
 1:null
--- a/litellm/proxy/_experimental/out/model_hub.html
+++ b/litellm/proxy/_experimental/out/model_hub.html
--- a/litellm/proxy/_experimental/out/model_hub.txt
+++ b/litellm/proxy/_experimental/out/model_hub.txt
@ -1,7 +1,7 @@
 2:I[77831,[],""]
-3:I[87494,["505","static/chunks/505-5ff3c318fddfa35c.js","131","static/chunks/131-cb6bfe24e23e121b.js","777","static/chunks/777-50d836152fad178b.js","418","static/chunks/app/model_hub/page-79eee78ed9fccf89.js"],""]
+3:I[87494,["505","static/chunks/505-5ff3c318fddfa35c.js","131","static/chunks/131-73d0a4f8e09896fe.js","777","static/chunks/777-5360b5460eba0779.js","418","static/chunks/app/model_hub/page-baad96761e038837.js"],""]
 4:I[5613,[],""]
 5:I[31778,[],""]
-0:["cjLC-FNUY9ME2ZrO3jtsn",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
+0:["LO0Sm6uVF0pa4RdHSL0dN",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/ui/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","meta","5",{"name":"next-size-adjust"}]]
 1:null
--- a/litellm/proxy/_experimental/out/onboarding.html
+++ b/litellm/proxy/_experimental/out/onboarding.html
--- a/litellm/proxy/_experimental/out/onboarding.txt
+++ b/litellm/proxy/_experimental/out/onboarding.txt
@ -1,7 +1,7 @@
 2:I[77831,[],""]
-3:I[667,["665","static/chunks/3014691f-b24e8254c7593934.js","505","static/chunks/505-5ff3c318fddfa35c.js","684","static/chunks/684-16b194c83a169f6d.js","777","static/chunks/777-50d836152fad178b.js","461","static/chunks/app/onboarding/page-8be9c2a4a5c886c5.js"],""]
+3:I[667,["665","static/chunks/3014691f-b24e8254c7593934.js","505","static/chunks/505-5ff3c318fddfa35c.js","684","static/chunks/684-16b194c83a169f6d.js","777","static/chunks/777-5360b5460eba0779.js","461","static/chunks/app/onboarding/page-0034957a9fa387e0.js"],""]
 4:I[5613,[],""]
 5:I[31778,[],""]
-0:["cjLC-FNUY9ME2ZrO3jtsn",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
+0:["LO0Sm6uVF0pa4RdHSL0dN",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/ui/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","meta","5",{"name":"next-size-adjust"}]]
 1:null
--- a/litellm/proxy/_types.py
+++ b/litellm/proxy/_types.py
@ -1299,7 +1299,6 @@ class LiteLLM_VerificationToken(LiteLLMBase):
    model_max_budget: Dict = {}
    soft_budget_cooldown: bool = False
    litellm_budget_table: Optional[dict] = None
-
    org_id: Optional[str] = None  # org id for a given key

    model_config = ConfigDict(protected_namespaces=())
--- a/litellm/proxy/management_endpoints/key_management_endpoints.py
+++ b/litellm/proxy/management_endpoints/key_management_endpoints.py
@ -966,3 +966,96 @@ async def delete_verification_token(tokens: List, user_id: Optional[str] = None)
        verbose_proxy_logger.debug(traceback.format_exc())
        raise e
    return deleted_tokens
+
+
+@router.post(
+    "/key/{key:path}/regenerate",
+    tags=["key management"],
+    dependencies=[Depends(user_api_key_auth)],
+)
+@management_endpoint_wrapper
+async def regenerate_key_fn(
+    key: str,
+    user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
+    litellm_changed_by: Optional[str] = Header(
+        None,
+        description="The litellm-changed-by header enables tracking of actions performed by authorized users on behalf of other users, providing an audit trail for accountability",
+    ),
+) -> GenerateKeyResponse:
+    from litellm.proxy.proxy_server import (
+        hash_token,
+        premium_user,
+        prisma_client,
+        user_api_key_cache,
+    )
+
+    """
+    Endpoint for regenerating a key
+    """
+
+    if premium_user is not True:
+        raise ValueError(
+            f"Regenerating Virtual Keys is an Enterprise feature, {CommonProxyErrors.not_premium_user.value}"
+        )
+
+    # Check if key exists, raise exception if key is not in the DB
+
+    ### 1. Create New copy that is duplicate of existing key
+    ######################################################################
+
+    # create duplicate of existing key
+    # set token = new token generated
+    # insert new token in DB
+
+    # create hash of token
+    if prisma_client is None:
+        raise HTTPException(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            detail={"error": "DB not connected. prisma_client is None"},
+        )
+
+    if "sk" not in key:
+        hashed_api_key = key
+    else:
+        hashed_api_key = hash_token(key)
+
+    _key_in_db = await prisma_client.db.litellm_verificationtoken.find_unique(
+        where={"token": hashed_api_key},
+    )
+    if _key_in_db is None:
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail={"error": f"Key {key} not found."},
+        )
+
+    verbose_proxy_logger.debug("key_in_db: %s", _key_in_db)
+
+    new_token = f"sk-{secrets.token_urlsafe(16)}"
+    new_token_hash = hash_token(new_token)
+    new_token_key_name = f"sk-...{new_token[-4:]}"
+
+    # update new token in DB
+    updated_token = await prisma_client.db.litellm_verificationtoken.update(
+        where={"token": hashed_api_key},
+        data={
+            "token": new_token_hash,
+            "key_name": new_token_key_name,
+        },
+    )
+    updated_token_dict = {}
+    if updated_token is not None:
+        updated_token_dict = dict(updated_token)
+
+    updated_token_dict["token"] = new_token
+
+    ### 3. remove existing key entry from cache
+    ######################################################################
+    if key:
+        user_api_key_cache.delete_cache(key)
+
+    if hashed_api_key:
+        user_api_key_cache.delete_cache(hashed_api_key)
+
+    return GenerateKeyResponse(
+        **updated_token_dict,
+    )
--- a/litellm/proxy/prisma_migration.py
+++ b/litellm/proxy/prisma_migration.py
@ -51,6 +51,10 @@ while retry_count < max_retries and exit_code != 0:
    retry_count += 1
    print(f"Attempt {retry_count}...")  # noqa

+    # run prisma generate
+    result = subprocess.run(["prisma", "generate"], capture_output=True)
+    exit_code = result.returncode
+
    # Run the Prisma db push command
    result = subprocess.run(
        ["prisma", "db", "push", "--accept-data-loss"], capture_output=True
--- a/litellm/tests/test_amazing_vertex_completion.py
+++ b/litellm/tests/test_amazing_vertex_completion.py
@ -2121,6 +2121,90 @@ def test_get_token_url():
    pass


+@pytest.mark.asyncio
+async def test_completion_fine_tuned_model():
+    # load_vertex_ai_credentials()
+    mock_response = AsyncMock()
+
+    def return_val():
+        return {
+            "candidates": [
+                {
+                    "content": {
+                        "role": "model",
+                        "parts": [
+                            {
+                                "text": "A canvas vast, a boundless blue,\nWhere clouds paint tales and winds imbue.\nThe sun descends in fiery hue,\nStars shimmer bright, a gentle few.\n\nThe moon ascends, a pearl of light,\nGuiding travelers through the night.\nThe sky embraces, holds all tight,\nA tapestry of wonder, bright."
+                            }
+                        ],
+                    },
+                    "finishReason": "STOP",
+                    "safetyRatings": [
+                        {
+                            "category": "HARM_CATEGORY_HATE_SPEECH",
+                            "probability": "NEGLIGIBLE",
+                            "probabilityScore": 0.028930664,
+                            "severity": "HARM_SEVERITY_NEGLIGIBLE",
+                            "severityScore": 0.041992188,
+                        },
+                        # ... other safety ratings ...
+                    ],
+                    "avgLogprobs": -0.95772853367765187,
+                }
+            ],
+            "usageMetadata": {
+                "promptTokenCount": 7,
+                "candidatesTokenCount": 71,
+                "totalTokenCount": 78,
+            },
+        }
+
+    mock_response.json = return_val
+    mock_response.status_code = 200
+
+    expected_payload = {
+        "contents": [
+            {"role": "user", "parts": [{"text": "Write a short poem about the sky"}]}
+        ],
+        "generationConfig": {},
+    }
+
+    with patch(
+        "litellm.llms.custom_httpx.http_handler.AsyncHTTPHandler.post",
+        return_value=mock_response,
+    ) as mock_post:
+        # Act: Call the litellm.completion function
+        response = await litellm.acompletion(
+            model="vertex_ai_beta/4965075652664360960",
+            messages=[{"role": "user", "content": "Write a short poem about the sky"}],
+        )
+
+        # Assert
+        mock_post.assert_called_once()
+        url, kwargs = mock_post.call_args
+        print("url = ", url)
+
+        # this is the fine-tuned model endpoint
+        assert (
+            url[0]
+            == "https://us-central1-aiplatform.googleapis.com/v1/projects/adroit-crow-413218/locations/us-central1/endpoints/4965075652664360960:generateContent"
+        )
+
+        print("call args = ", kwargs)
+        args_to_vertexai = kwargs["json"]
+
+        print("args to vertex ai call:", args_to_vertexai)
+
+        assert args_to_vertexai == expected_payload
+        assert response.choices[0].message.content.startswith("A canvas vast")
+        assert response.choices[0].finish_reason == "stop"
+        assert response.usage.total_tokens == 78
+
+        # Optional: Print for debugging
+        print("Arguments passed to Vertex AI:", args_to_vertexai)
+        print("Response:", response)
+
+
 def mock_gemini_request(*args, **kwargs):
    print(f"kwargs: {kwargs}")
    mock_response = MagicMock()
--- a/litellm/tests/test_completion.py
+++ b/litellm/tests/test_completion.py
@ -2691,8 +2691,61 @@ def test_completion_hf_model_no_provider():
 # test_completion_hf_model_no_provider()


-@pytest.mark.skip(reason="anyscale stopped serving public api endpoints")
-def test_completion_anyscale_with_functions():
+def gemini_mock_post(*args, **kwargs):
+    mock_response = MagicMock()
+    mock_response.status_code = 200
+    mock_response.headers = {"Content-Type": "application/json"}
+    mock_response.json = MagicMock(
+        return_value={
+            "candidates": [
+                {
+                    "content": {
+                        "parts": [
+                            {
+                                "functionCall": {
+                                    "name": "get_current_weather",
+                                    "args": {"location": "Boston, MA"},
+                                }
+                            }
+                        ],
+                        "role": "model",
+                    },
+                    "finishReason": "STOP",
+                    "index": 0,
+                    "safetyRatings": [
+                        {
+                            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
+                            "probability": "NEGLIGIBLE",
+                        },
+                        {
+                            "category": "HARM_CATEGORY_HARASSMENT",
+                            "probability": "NEGLIGIBLE",
+                        },
+                        {
+                            "category": "HARM_CATEGORY_HATE_SPEECH",
+                            "probability": "NEGLIGIBLE",
+                        },
+                        {
+                            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
+                            "probability": "NEGLIGIBLE",
+                        },
+                    ],
+                }
+            ],
+            "usageMetadata": {
+                "promptTokenCount": 86,
+                "candidatesTokenCount": 19,
+                "totalTokenCount": 105,
+            },
+        }
+    )
+
+    return mock_response
+
+
+@pytest.mark.asyncio
+async def test_completion_functions_param():
+    litellm.set_verbose = True
    function1 = [
        {
            "name": "get_current_weather",
@ -2711,18 +2764,33 @@ def test_completion_anyscale_with_functions():
        }
    ]
    try:
-        messages = [{"role": "user", "content": "What is the weather like in Boston?"}]
-        response = completion(
-            model="anyscale/mistralai/Mistral-7B-Instruct-v0.1",
-            messages=messages,
-            functions=function1,
-        )
-        # Add any assertions here to check the response
-        print(response)
+        from litellm.llms.custom_httpx.http_handler import AsyncHTTPHandler

-        cost = litellm.completion_cost(completion_response=response)
-        print("cost to make anyscale completion=", cost)
-        assert cost > 0.0
+        messages = [{"role": "user", "content": "What is the weather like in Boston?"}]
+
+        client = AsyncHTTPHandler(concurrent_limit=1)
+
+        with patch.object(client, "post", side_effect=gemini_mock_post) as mock_client:
+            response: litellm.ModelResponse = await litellm.acompletion(
+                model="gemini/gemini-1.5-pro",
+                messages=messages,
+                functions=function1,
+                client=client,
+            )
+            print(response)
+            # Add any assertions here to check the response
+            mock_client.assert_called()
+            print(f"mock_client.call_args.kwargs: {mock_client.call_args.kwargs}")
+            assert "tools" in mock_client.call_args.kwargs["json"]
+            assert (
+                "litellm_param_is_function_call"
+                not in mock_client.call_args.kwargs["json"]
+            )
+            assert (
+                "litellm_param_is_function_call"
+                not in mock_client.call_args.kwargs["json"]["generationConfig"]
+            )
+            assert response.choices[0].message.function_call is not None
    except Exception as e:
        pytest.fail(f"Error occurred: {e}")

--- a/litellm/tests/test_function_calling.py
+++ b/litellm/tests/test_function_calling.py
@ -142,6 +142,8 @@ def test_parallel_function_call(model):
                drop_params=True,
            )  # get a new response from the model where it can see the function response
            print("second response\n", second_response)
+    except litellm.InternalServerError:
+        pass
    except litellm.RateLimitError:
        pass
    except Exception as e:
--- a/litellm/tests/test_key_generate_prisma.py
+++ b/litellm/tests/test_key_generate_prisma.py
@ -56,6 +56,7 @@ from litellm.proxy.management_endpoints.key_management_endpoints import (
    generate_key_fn,
    generate_key_helper_fn,
    info_key_fn,
+    regenerate_key_fn,
    update_key_fn,
 )
 from litellm.proxy.management_endpoints.team_endpoints import (
@ -2935,3 +2936,105 @@ async def test_team_access_groups(prisma_client):
                "not allowed to call model" in e.message
                and "Allowed team models" in e.message
            )
+
+
+################ Unit Tests for testing regeneration of keys ###########
+@pytest.mark.asyncio()
+async def test_regenerate_api_key(prisma_client):
+    litellm.set_verbose = True
+    setattr(litellm.proxy.proxy_server, "prisma_client", prisma_client)
+    setattr(litellm.proxy.proxy_server, "master_key", "sk-1234")
+    await litellm.proxy.proxy_server.prisma_client.connect()
+    import uuid
+
+    # generate new key
+    key_alias = f"test_alias_regenerate_key-{uuid.uuid4()}"
+    spend = 100
+    max_budget = 400
+    models = ["fake-openai-endpoint"]
+    new_key = await generate_key_fn(
+        data=GenerateKeyRequest(
+            key_alias=key_alias, spend=spend, max_budget=max_budget, models=models
+        ),
+        user_api_key_dict=UserAPIKeyAuth(
+            user_role=LitellmUserRoles.PROXY_ADMIN,
+            api_key="sk-1234",
+            user_id="1234",
+        ),
+    )
+
+    generated_key = new_key.key
+    print(generated_key)
+
+    # assert the new key works as expected
+    request = Request(scope={"type": "http"})
+    request._url = URL(url="/chat/completions")
+
+    async def return_body():
+        return_string = f'{{"model": "fake-openai-endpoint"}}'
+        # return string as bytes
+        return return_string.encode()
+
+    request.body = return_body
+    result = await user_api_key_auth(request=request, api_key=f"Bearer {generated_key}")
+    print(result)
+
+    # regenerate the key
+    new_key = await regenerate_key_fn(
+        key=generated_key,
+        user_api_key_dict=UserAPIKeyAuth(
+            user_role=LitellmUserRoles.PROXY_ADMIN,
+            api_key="sk-1234",
+            user_id="1234",
+        ),
+    )
+    print("response from regenerate_key_fn", new_key)
+
+    # assert the new key works as expected
+    request = Request(scope={"type": "http"})
+    request._url = URL(url="/chat/completions")
+
+    async def return_body_2():
+        return_string = f'{{"model": "fake-openai-endpoint"}}'
+        # return string as bytes
+        return return_string.encode()
+
+    request.body = return_body_2
+    result = await user_api_key_auth(request=request, api_key=f"Bearer {new_key.key}")
+    print(result)
+
+    # assert the old key stops working
+    request = Request(scope={"type": "http"})
+    request._url = URL(url="/chat/completions")
+
+    async def return_body_3():
+        return_string = f'{{"model": "fake-openai-endpoint"}}'
+        # return string as bytes
+        return return_string.encode()
+
+    request.body = return_body_3
+    try:
+        result = await user_api_key_auth(
+            request=request, api_key=f"Bearer {generated_key}"
+        )
+        print(result)
+        pytest.fail(f"This should have failed!. the key has been regenerated")
+    except Exception as e:
+        print("got expected exception", e)
+        assert "Invalid proxy server token passed" in e.message
+
+    # Check that the regenerated key has the same spend, max_budget, models and key_alias
+    assert new_key.spend == spend, f"Expected spend {spend} but got {new_key.spend}"
+    assert (
+        new_key.max_budget == max_budget
+    ), f"Expected max_budget {max_budget} but got {new_key.max_budget}"
+    assert (
+        new_key.key_alias == key_alias
+    ), f"Expected key_alias {key_alias} but got {new_key.key_alias}"
+    assert (
+        new_key.models == models
+    ), f"Expected models {models} but got {new_key.models}"
+
+    assert new_key.key_name == f"sk-...{new_key.key[-4:]}"
+
+    pass
--- a/litellm/tests/test_sagemaker.py
+++ b/litellm/tests/test_sagemaker.py
@ -120,15 +120,24 @@ async def test_completion_sagemaker_messages_api(sync_mode):

@pytest.mark.asyncio()
@pytest.mark.parametrize("sync_mode", [False, True])
-async def test_completion_sagemaker_stream(sync_mode):
+@pytest.mark.parametrize(
+    "model",
+    [
+        "sagemaker_chat/huggingface-pytorch-tgi-inference-2024-08-23-15-48-59-245",
+        "sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614",
+    ],
+)
+async def test_completion_sagemaker_stream(sync_mode, model):
    try:
+        from litellm.tests.test_streaming import streaming_format_tests
+
        litellm.set_verbose = False
        print("testing sagemaker")
        verbose_logger.setLevel(logging.DEBUG)
        full_text = ""
        if sync_mode is True:
            response = litellm.completion(
-                model="sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614",
+                model=model,
                messages=[
                    {"role": "user", "content": "hi - what is ur name"},
                ],
@ -138,14 +147,15 @@ async def test_completion_sagemaker_stream(sync_mode):
                input_cost_per_second=0.000420,
            )

-            for chunk in response:
+            for idx, chunk in enumerate(response):
                print(chunk)
+                streaming_format_tests(idx=idx, chunk=chunk)
                full_text += chunk.choices[0].delta.content or ""

            print("SYNC RESPONSE full text", full_text)
        else:
            response = await litellm.acompletion(
-                model="sagemaker/jumpstart-dft-hf-textgeneration1-mp-20240815-185614",
+                model=model,
                messages=[
                    {"role": "user", "content": "hi - what is ur name"},
                ],
@ -156,10 +166,12 @@ async def test_completion_sagemaker_stream(sync_mode):
            )

            print("streaming response")
-
+            idx = 0
            async for chunk in response:
                print(chunk)
+                streaming_format_tests(idx=idx, chunk=chunk)
                full_text += chunk.choices[0].delta.content or ""
+                idx += 1

            print("ASYNC RESPONSE full text", full_text)

--- a/litellm/tests/test_streaming.py
+++ b/litellm/tests/test_streaming.py
@ -755,27 +755,40 @@ async def test_completion_gemini_stream(sync_mode):
    try:
        litellm.set_verbose = True
        print("Streaming gemini response")
-        messages = [
-            {"role": "system", "content": "You are a helpful assistant."},
+        function1 = [
            {
-                "role": "user",
-                "content": "Who was Alexander?",
-            },
+                "name": "get_current_weather",
+                "description": "Get the current weather in a given location",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "location": {
+                            "type": "string",
+                            "description": "The city and state, e.g. San Francisco, CA",
+                        },
+                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+                    },
+                    "required": ["location"],
+                },
+            }
        ]
+        messages = [{"role": "user", "content": "What is the weather like in Boston?"}]
        print("testing gemini streaming")
        complete_response = ""
        # Add any assertions here to check the response
        non_empty_chunks = 0
-
+        chunks = []
        if sync_mode:
            response = completion(
                model="gemini/gemini-1.5-flash",
                messages=messages,
                stream=True,
+                functions=function1,
            )

            for idx, chunk in enumerate(response):
                print(chunk)
+                chunks.append(chunk)
                # print(chunk.choices[0].delta)
                chunk, finished = streaming_format_tests(idx, chunk)
                if finished:
@ -787,11 +800,13 @@ async def test_completion_gemini_stream(sync_mode):
                model="gemini/gemini-1.5-flash",
                messages=messages,
                stream=True,
+                functions=function1,
            )

            idx = 0
            async for chunk in response:
                print(chunk)
+                chunks.append(chunk)
                # print(chunk.choices[0].delta)
                chunk, finished = streaming_format_tests(idx, chunk)
                if finished:
@ -800,10 +815,17 @@ async def test_completion_gemini_stream(sync_mode):
                complete_response += chunk
                idx += 1

-        if complete_response.strip() == "":
-            raise Exception("Empty response received")
+        # if complete_response.strip() == "":
+        #     raise Exception("Empty response received")
        print(f"completion_response: {complete_response}")
-        assert non_empty_chunks > 1
+
+        complete_response = litellm.stream_chunk_builder(
+            chunks=chunks, messages=messages
+        )
+
+        assert complete_response.choices[0].message.function_call is not None
+
+        # assert non_empty_chunks > 1
    except litellm.InternalServerError as e:
        pass
    except litellm.RateLimitError as e:
--- a/litellm/types/llms/openai.py
+++ b/litellm/types/llms/openai.py
@ -29,6 +29,7 @@ from openai.types.beta.thread_create_params import (
 from openai.types.beta.threads.message import Message as OpenAIMessage
 from openai.types.beta.threads.message_content import MessageContent
 from openai.types.beta.threads.run import Run
+from openai.types.chat import ChatCompletionChunk
 from pydantic import BaseModel, Field
 from typing_extensions import Dict, Required, TypedDict, override

@ -458,6 +459,7 @@ class ChatCompletionResponseMessage(TypedDict, total=False):
    content: Optional[str]
    tool_calls: List[ChatCompletionToolCallChunk]
    role: Literal["assistant"]
+    function_call: ChatCompletionToolCallFunctionChunk


 class ChatCompletionUsageBlock(TypedDict):
@ -466,6 +468,13 @@ class ChatCompletionUsageBlock(TypedDict):
    total_tokens: int


+class OpenAIChatCompletionChunk(ChatCompletionChunk):
+    def __init__(self, **kwargs):
+        # Set the 'object' kwarg to 'chat.completion.chunk'
+        kwargs["object"] = "chat.completion.chunk"
+        super().__init__(**kwargs)
+
+
 class Hyperparameters(BaseModel):
    batch_size: Optional[Union[str, int]] = None  # "Number of examples in each batch."
    learning_rate_multiplier: Optional[Union[str, float]] = (
--- a/litellm/types/llms/vertex_ai.py
+++ b/litellm/types/llms/vertex_ai.py
@ -90,7 +90,7 @@ class Schema(TypedDict, total=False):
 class FunctionDeclaration(TypedDict, total=False):
    name: Required[str]
    description: str
-    parameters: Schema
+    parameters: Union[Schema, dict]
    response: Schema


--- a/litellm/types/utils.py
+++ b/litellm/types/utils.py
@ -5,11 +5,16 @@ from enum import Enum
 from typing import Any, Dict, List, Literal, Optional, Tuple, Union

 from openai._models import BaseModel as OpenAIObject
+from openai.types.completion_usage import CompletionUsage
 from pydantic import ConfigDict, Field, PrivateAttr
 from typing_extensions import Callable, Dict, Required, TypedDict, override

 from ..litellm_core_utils.core_helpers import map_finish_reason
-from .llms.openai import ChatCompletionToolCallChunk, ChatCompletionUsageBlock
+from .llms.openai import (
+    ChatCompletionToolCallChunk,
+    ChatCompletionUsageBlock,
+    OpenAIChatCompletionChunk,
+)


 def _generate_id():  # private helper function
@ -85,7 +90,7 @@ class GenericStreamingChunk(TypedDict, total=False):
    tool_use: Optional[ChatCompletionToolCallChunk]
    is_finished: Required[bool]
    finish_reason: Required[str]
-    usage: Optional[ChatCompletionUsageBlock]
+    usage: Required[Optional[ChatCompletionUsageBlock]]
    index: int

    # use this dict if you want to return any provider specific fields in the response
@ -448,9 +453,6 @@ class Choices(OpenAIObject):
        setattr(self, key, value)


-from openai.types.completion_usage import CompletionUsage
-
-
 class Usage(CompletionUsage):
    def __init__(
        self,
@ -499,7 +501,7 @@ class StreamingChoices(OpenAIObject):
    ):
        super(StreamingChoices, self).__init__(**params)
        if finish_reason:
-            self.finish_reason = finish_reason
+            self.finish_reason = map_finish_reason(finish_reason)
        else:
            self.finish_reason = None
        self.index = index
@ -535,6 +537,17 @@ class StreamingChoices(OpenAIObject):
        setattr(self, key, value)


+class StreamingChatCompletionChunk(OpenAIChatCompletionChunk):
+    def __init__(self, **kwargs):
+
+        new_choices = []
+        for choice in kwargs["choices"]:
+            new_choice = StreamingChoices(**choice).model_dump()
+            new_choices.append(new_choice)
+        kwargs["choices"] = new_choices
+        super().__init__(**kwargs)
+
+
 class ModelResponse(OpenAIObject):
    id: str
    """A unique identifier for the completion."""
@ -1231,3 +1244,20 @@ class StandardLoggingPayload(TypedDict):
    response: Optional[Union[str, list, dict]]
    model_parameters: dict
    hidden_params: StandardLoggingHiddenParams
+
+
+from typing import AsyncIterator, Iterator
+
+
+class CustomStreamingDecoder:
+    async def aiter_bytes(
+        self, iterator: AsyncIterator[bytes]
+    ) -> AsyncIterator[
+        Optional[Union[GenericStreamingChunk, StreamingChatCompletionChunk]]
+    ]:
+        raise NotImplementedError
+
+    def iter_bytes(
+        self, iterator: Iterator[bytes]
+    ) -> Iterator[Optional[Union[GenericStreamingChunk, StreamingChatCompletionChunk]]]:
+        raise NotImplementedError
--- a/litellm/utils.py
+++ b/litellm/utils.py
@ -4613,7 +4613,11 @@ def get_llm_provider(
            if custom_llm_provider == "perplexity":
                # perplexity is openai compatible, we just need to set this to custom_openai and have the api_base be https://api.perplexity.ai
                api_base = api_base or get_secret("PERPLEXITY_API_BASE") or "https://api.perplexity.ai"  # type: ignore
-                dynamic_api_key = api_key or get_secret("PERPLEXITYAI_API_KEY")
+                dynamic_api_key = (
+                    api_key
+                    or get_secret("PERPLEXITYAI_API_KEY")
+                    or get_secret("PERPLEXITY_API_KEY")
+                )
            elif custom_llm_provider == "anyscale":
                # anyscale is openai compatible, we just need to set this to custom_openai and have the api_base be https://api.endpoints.anyscale.com/v1
                api_base = api_base or get_secret("ANYSCALE_API_BASE") or "https://api.endpoints.anyscale.com/v1"  # type: ignore
@ -6679,10 +6683,14 @@ def exception_type(
                    else:
                        message = str(original_exception)

-                if message is not None and isinstance(message, str):
+                if message is not None and isinstance(
+                    message, str
+                ):  # done to prevent user-confusion. Relevant issue - https://github.com/BerriAI/litellm/issues/1414
                    message = message.replace("OPENAI", custom_llm_provider.upper())
-                    message = message.replace("openai", custom_llm_provider)
-                    message = message.replace("OpenAI", custom_llm_provider)
+                    message = message.replace(
+                        "openai.OpenAIError",
+                        "{}.{}Error".format(custom_llm_provider, custom_llm_provider),
+                    )
                if custom_llm_provider == "openai":
                    exception_provider = "OpenAI" + "Exception"
                else:
@ -8805,6 +8813,7 @@ class CustomStreamWrapper:
        self.chunks: List = (
            []
        )  # keep track of the returned chunks - used for calculating the input/output tokens for stream options
+        self.is_function_call = self.check_is_function_call(logging_obj=logging_obj)

    def __iter__(self):
        return self
@ -8812,6 +8821,19 @@ class CustomStreamWrapper:
    def __aiter__(self):
        return self

+    def check_is_function_call(self, logging_obj) -> bool:
+        if hasattr(logging_obj, "optional_params") and isinstance(
+            logging_obj.optional_params, dict
+        ):
+            if (
+                "litellm_param_is_function_call" in logging_obj.optional_params
+                and logging_obj.optional_params["litellm_param_is_function_call"]
+                is True
+            ):
+                return True
+
+        return False
+
    def process_chunk(self, chunk: str):
        """
        NLP Cloud streaming returns the entire response, for each chunk. Process this, to only return the delta.
@ -10309,6 +10331,12 @@ class CustomStreamWrapper:

            ## CHECK FOR TOOL USE
            if "tool_calls" in completion_obj and len(completion_obj["tool_calls"]) > 0:
+                if self.is_function_call is True:  # user passed in 'functions' param
+                    completion_obj["function_call"] = completion_obj["tool_calls"][0][
+                        "function"
+                    ]
+                    completion_obj["tool_calls"] = None
+
                self.tool_call = True

            ## RETURN ARG
@ -10320,8 +10348,13 @@ class CustomStreamWrapper:
                )
                or (
                    "tool_calls" in completion_obj
+                    and completion_obj["tool_calls"] is not None
                    and len(completion_obj["tool_calls"]) > 0
                )
+                or (
+                    "function_call" in completion_obj
+                    and completion_obj["function_call"] is not None
+                )
            ):  # cannot set content of an OpenAI Object to be an empty string
                self.safety_checker()
                hold, model_response_str = self.check_special_tokens(
@ -10381,6 +10414,7 @@ class CustomStreamWrapper:
                        if self.sent_first_chunk is False:
                            completion_obj["role"] = "assistant"
                            self.sent_first_chunk = True
+
                        model_response.choices[0].delta = Delta(**completion_obj)
                        if completion_obj.get("index") is not None:
                            model_response.choices[0].index = completion_obj.get(
--- a/model_prices_and_context_window.json
+++ b/model_prices_and_context_window.json
@ -2189,6 +2189,18 @@
        "mode": "image_generation",
        "source": "https://cloud.google.com/vertex-ai/generative-ai/pricing"
    },
+    "vertex_ai/imagen-3.0-generate-001": {
+        "cost_per_image": 0.04,
+        "litellm_provider": "vertex_ai-image-models",
+        "mode": "image_generation",
+        "source": "https://cloud.google.com/vertex-ai/generative-ai/pricing"
+    },
+    "vertex_ai/imagen-3.0-fast-generate-001": {
+        "cost_per_image": 0.02,
+        "litellm_provider": "vertex_ai-image-models",
+        "mode": "image_generation",
+        "source": "https://cloud.google.com/vertex-ai/generative-ai/pricing"
+    },
    "text-embedding-004": {
        "max_tokens": 3072,
        "max_input_tokens": 3072,
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [tool.poetry]
 name = "litellm"
-version = "1.44.6"
+version = "1.44.7"
 description = "Library to easily interface with LLM API providers"
 authors = ["BerriAI"]
 license = "MIT"
@ -91,7 +91,7 @@ requires = ["poetry-core", "wheel"]
 build-backend = "poetry.core.masonry.api"

 [tool.commitizen]
-version = "1.44.6"
+version = "1.44.7"
 version_files = [
    "pyproject.toml:^version"
 ]
--- a/ui/litellm-dashboard/out/404.html
+++ b/ui/litellm-dashboard/out/404.html
--- a/ui/litellm-dashboard/out/_next/static/LO0Sm6uVF0pa4RdHSL0dN/_buildManifest.js
+++ b/ui/litellm-dashboard/out/_next/static/LO0Sm6uVF0pa4RdHSL0dN/_buildManifest.js
--- a/ui/litellm-dashboard/out/_next/static/LO0Sm6uVF0pa4RdHSL0dN/_ssgManifest.js
+++ b/ui/litellm-dashboard/out/_next/static/LO0Sm6uVF0pa4RdHSL0dN/_ssgManifest.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/131-73d0a4f8e09896fe.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/131-73d0a4f8e09896fe.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/131-cb6bfe24e23e121b.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/131-cb6bfe24e23e121b.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/605-8e4b96f972af8eaf.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/605-8e4b96f972af8eaf.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/777-50d836152fad178b.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/777-50d836152fad178b.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/777-5360b5460eba0779.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/777-5360b5460eba0779.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/app/model_hub/page-baad96761e038837.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/app/model_hub/page-baad96761e038837.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/app/onboarding/page-0034957a9fa387e0.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/app/onboarding/page-0034957a9fa387e0.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/app/page-01641b817a14ea88.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/app/page-01641b817a14ea88.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/app/page-b77076dbc8208d12.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/app/page-b77076dbc8208d12.js
--- a/ui/litellm-dashboard/out/index.html
+++ b/ui/litellm-dashboard/out/index.html
@ -1 +1 @@
-<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-193a7eac80c8baba.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-f593049e31b05aeb.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-8316d07d1f41e39f.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-9b4fb13a7db53edf.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-193a7eac80c8baba.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/a34f9d1faa5f3315-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/cd10067a0a3408b4.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[26520,[\"665\",\"static/chunks/3014691f-b24e8254c7593934.js\",\"936\",\"static/chunks/2f6dbc85-cac2949a76539886.js\",\"505\",\"static/chunks/505-5ff3c318fddfa35c.js\",\"131\",\"static/chunks/131-cb6bfe24e23e121b.js\",\"684\",\"static/chunks/684-16b194c83a169f6d.js\",\"605\",\"static/chunks/605-8e4b96f972af8eaf.js\",\"777\",\"static/chunks/777-50d836152fad178b.js\",\"931\",\"static/chunks/app/page-b77076dbc8208d12.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/cd10067a0a3408b4.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"cjLC-FNUY9ME2ZrO3jtsn\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_86ef86\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
+<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-193a7eac80c8baba.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-f593049e31b05aeb.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-8316d07d1f41e39f.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-9b4fb13a7db53edf.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-193a7eac80c8baba.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/a34f9d1faa5f3315-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/cd10067a0a3408b4.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[18018,[\"665\",\"static/chunks/3014691f-b24e8254c7593934.js\",\"936\",\"static/chunks/2f6dbc85-cac2949a76539886.js\",\"505\",\"static/chunks/505-5ff3c318fddfa35c.js\",\"131\",\"static/chunks/131-73d0a4f8e09896fe.js\",\"684\",\"static/chunks/684-16b194c83a169f6d.js\",\"605\",\"static/chunks/605-35a95945041f7699.js\",\"777\",\"static/chunks/777-5360b5460eba0779.js\",\"931\",\"static/chunks/app/page-01641b817a14ea88.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/cd10067a0a3408b4.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"LO0Sm6uVF0pa4RdHSL0dN\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_86ef86\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
--- a/ui/litellm-dashboard/out/index.txt
+++ b/ui/litellm-dashboard/out/index.txt
@ -1,7 +1,7 @@
 2:I[77831,[],""]
-3:I[26520,["665","static/chunks/3014691f-b24e8254c7593934.js","936","static/chunks/2f6dbc85-cac2949a76539886.js","505","static/chunks/505-5ff3c318fddfa35c.js","131","static/chunks/131-cb6bfe24e23e121b.js","684","static/chunks/684-16b194c83a169f6d.js","605","static/chunks/605-8e4b96f972af8eaf.js","777","static/chunks/777-50d836152fad178b.js","931","static/chunks/app/page-b77076dbc8208d12.js"],""]
+3:I[18018,["665","static/chunks/3014691f-b24e8254c7593934.js","936","static/chunks/2f6dbc85-cac2949a76539886.js","505","static/chunks/505-5ff3c318fddfa35c.js","131","static/chunks/131-73d0a4f8e09896fe.js","684","static/chunks/684-16b194c83a169f6d.js","605","static/chunks/605-35a95945041f7699.js","777","static/chunks/777-5360b5460eba0779.js","931","static/chunks/app/page-01641b817a14ea88.js"],""]
 4:I[5613,[],""]
 5:I[31778,[],""]
-0:["cjLC-FNUY9ME2ZrO3jtsn",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
+0:["LO0Sm6uVF0pa4RdHSL0dN",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/ui/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","meta","5",{"name":"next-size-adjust"}]]
 1:null
--- a/ui/litellm-dashboard/out/model_hub.html
+++ b/ui/litellm-dashboard/out/model_hub.html
--- a/ui/litellm-dashboard/out/model_hub.txt
+++ b/ui/litellm-dashboard/out/model_hub.txt
@ -1,7 +1,7 @@
 2:I[77831,[],""]
-3:I[87494,["505","static/chunks/505-5ff3c318fddfa35c.js","131","static/chunks/131-cb6bfe24e23e121b.js","777","static/chunks/777-50d836152fad178b.js","418","static/chunks/app/model_hub/page-79eee78ed9fccf89.js"],""]
+3:I[87494,["505","static/chunks/505-5ff3c318fddfa35c.js","131","static/chunks/131-73d0a4f8e09896fe.js","777","static/chunks/777-5360b5460eba0779.js","418","static/chunks/app/model_hub/page-baad96761e038837.js"],""]
 4:I[5613,[],""]
 5:I[31778,[],""]
-0:["cjLC-FNUY9ME2ZrO3jtsn",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
+0:["LO0Sm6uVF0pa4RdHSL0dN",[[["",{"children":["model_hub",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["model_hub",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","model_hub","children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/ui/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","meta","5",{"name":"next-size-adjust"}]]
 1:null
--- a/ui/litellm-dashboard/out/onboarding.html
+++ b/ui/litellm-dashboard/out/onboarding.html
--- a/ui/litellm-dashboard/out/onboarding.txt
+++ b/ui/litellm-dashboard/out/onboarding.txt
@ -1,7 +1,7 @@
 2:I[77831,[],""]
-3:I[667,["665","static/chunks/3014691f-b24e8254c7593934.js","505","static/chunks/505-5ff3c318fddfa35c.js","684","static/chunks/684-16b194c83a169f6d.js","777","static/chunks/777-50d836152fad178b.js","461","static/chunks/app/onboarding/page-8be9c2a4a5c886c5.js"],""]
+3:I[667,["665","static/chunks/3014691f-b24e8254c7593934.js","505","static/chunks/505-5ff3c318fddfa35c.js","684","static/chunks/684-16b194c83a169f6d.js","777","static/chunks/777-5360b5460eba0779.js","461","static/chunks/app/onboarding/page-0034957a9fa387e0.js"],""]
 4:I[5613,[],""]
 5:I[31778,[],""]
-0:["cjLC-FNUY9ME2ZrO3jtsn",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
+0:["LO0Sm6uVF0pa4RdHSL0dN",[[["",{"children":["onboarding",{"children":["__PAGE__",{}]}]},"$undefined","$undefined",true],["",{"children":["onboarding",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children","onboarding","children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined","styles":null}]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_86ef86","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/cd10067a0a3408b4.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/ui/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","meta","5",{"name":"next-size-adjust"}]]
 1:null
--- a/ui/litellm-dashboard/package-lock.json
+++ b/ui/litellm-dashboard/package-lock.json
@ -4800,11 +4800,11 @@
      ]
    },
    "node_modules/micromatch": {
-      "version": "4.0.5",
-      "resolved": "https://registry.npmjs.org/micromatch/-/micromatch-4.0.5.tgz",
-      "integrity": "sha512-DMy+ERcEW2q8Z2Po+WNXuw3c5YaUSFjAO5GsJqfEl7UjvtIuFKO6ZrKvcItdy98dwFI2N1tg3zNIdKaQT+aNdA==",
+      "version": "4.0.8",
+      "resolved": "https://registry.npmjs.org/micromatch/-/micromatch-4.0.8.tgz",
+      "integrity": "sha512-PXwfBhYu0hBCPw8Dn0E+WDYb7af3dSLVWKi3HGv84IdF4TyFoC0ysxFd0Goxw7nSv4T/PzEJQxsYsEiFCKo2BA==",
      "dependencies": {
-        "braces": "^3.0.2",
+        "braces": "^3.0.3",
        "picomatch": "^2.3.1"
      },
      "engines": {
--- a/ui/litellm-dashboard/src/app/page.tsx
+++ b/ui/litellm-dashboard/src/app/page.tsx
@ -141,6 +141,7 @@ const CreateKeyPage = () => {
          <UserDashboard
              userID={userID}
              userRole={userRole}
+              premiumUser={premiumUser}
              teams={teams}
              keys={keys}
              setUserRole={setUserRole}
@ -175,6 +176,7 @@ const CreateKeyPage = () => {
            <UserDashboard
              userID={userID}
              userRole={userRole}
+              premiumUser={premiumUser}
              teams={teams}
              keys={keys}
              setUserRole={setUserRole}
--- a/ui/litellm-dashboard/src/components/networking.tsx
+++ b/ui/litellm-dashboard/src/components/networking.tsx
@ -770,6 +770,37 @@ export const claimOnboardingToken = async (
    throw error;
  }
 };
+
+export const regenerateKeyCall = async (accessToken: string, keyToRegenerate: string) => {
+  try {
+    const url = proxyBaseUrl
+      ? `${proxyBaseUrl}/key/${keyToRegenerate}/regenerate`
+      : `/key/${keyToRegenerate}/regenerate`;
+
+    const response = await fetch(url, {
+      method: "POST",
+      headers: {
+        [globalLitellmHeaderName]: `Bearer ${accessToken}`,
+        "Content-Type": "application/json",
+      },
+      body: JSON.stringify({}),
+    });
+
+    if (!response.ok) {
+      const errorData = await response.text();
+      handleError(errorData);
+      throw new Error("Network response was not ok");
+    }
+
+    const data = await response.json();
+    console.log("Regenerate key Response:", data);
+    return data;
+  } catch (error) {
+    console.error("Failed to regenerate key:", error);
+    throw error;
+  }
+};
+
 let ModelListerrorShown = false;
 let errorTimer: NodeJS.Timeout | null = null;

--- a/ui/litellm-dashboard/src/components/user_dashboard.tsx
+++ b/ui/litellm-dashboard/src/components/user_dashboard.tsx
@ -48,6 +48,7 @@ interface UserDashboardProps {
  setKeys: React.Dispatch<React.SetStateAction<Object[] | null>>;
  setProxySettings: React.Dispatch<React.SetStateAction<any>>;
  proxySettings: any;
+  premiumUser: boolean;
 }

 type TeamInterface = {
@ -68,6 +69,7 @@ const UserDashboard: React.FC<UserDashboardProps> = ({
  setKeys,
  setProxySettings,
  proxySettings,
+  premiumUser,
 }) => {
  const [userSpendData, setUserSpendData] = useState<UserSpendData | null>(
    null
@ -328,6 +330,7 @@ const UserDashboard: React.FC<UserDashboardProps> = ({
            selectedTeam={selectedTeam ? selectedTeam : null}
            data={keys}
            setData={setKeys}
+            premiumUser={premiumUser}
            teams={teams}
          />
          <CreateKey
--- a/ui/litellm-dashboard/src/components/view_key_table.tsx
+++ b/ui/litellm-dashboard/src/components/view_key_table.tsx
@ -1,12 +1,14 @@
 "use client";
 import React, { useEffect, useState } from "react";
 import { keyDeleteCall, modelAvailableCall } from "./networking";
-import { InformationCircleIcon, StatusOnlineIcon, TrashIcon, PencilAltIcon } from "@heroicons/react/outline";
-import { keySpendLogsCall, PredictedSpendLogsCall, keyUpdateCall, modelInfoCall } from "./networking";
+import { InformationCircleIcon, StatusOnlineIcon, TrashIcon, PencilAltIcon, RefreshIcon } from "@heroicons/react/outline";
+import { keySpendLogsCall, PredictedSpendLogsCall, keyUpdateCall, modelInfoCall, regenerateKeyCall } from "./networking";
 import {
  Badge,
  Card,
  Table,
+  Grid,
+  Col,
  Button,
  TableBody,
  TableCell,
@ -33,6 +35,8 @@ import {
  Select,
 } from "antd";

+import { CopyToClipboard } from "react-copy-to-clipboard";
+
 const { Option } = Select;
 const isLocal = process.env.NODE_ENV === "development";
 const proxyBaseUrl = isLocal ? "http://localhost:4000" : null;
@ -65,6 +69,7 @@ interface ViewKeyTableProps {
  data: any[] | null;
  setData: React.Dispatch<React.SetStateAction<any[] | null>>;
  teams: any[] | null;
+  premiumUser: boolean;
 }

 interface ItemData {
@ -92,7 +97,8 @@ const ViewKeyTable: React.FC<ViewKeyTableProps> = ({
  selectedTeam,
  data,
  setData,
-  teams
+  teams,
+  premiumUser
 }) => {
  const [isButtonClicked, setIsButtonClicked] = useState(false);
  const [isDeleteModalOpen, setIsDeleteModalOpen] = useState(false);
@ -109,6 +115,8 @@ const ViewKeyTable: React.FC<ViewKeyTableProps> = ({
  const [userModels, setUserModels] = useState([]);
  const initialKnownTeamIDs: Set<string> = new Set();
  const [modelLimitModalVisible, setModelLimitModalVisible] = useState(false);
+  const [regenerateDialogVisible, setRegenerateDialogVisible] = useState(false);
+  const [regeneratedKey, setRegeneratedKey] = useState<string | null>(null);

  const [knownTeamIDs, setKnownTeamIDs] = useState(initialKnownTeamIDs);

@ -612,6 +620,38 @@ const ViewKeyTable: React.FC<ViewKeyTableProps> = ({
    setKeyToDelete(null);
  };

+  const handleRegenerateKey = async () => {
+    if (!premiumUser) {
+      message.error("Regenerate API Key is an Enterprise feature. Please upgrade to use this feature.");
+      return;
+    }
+
+    try {
+      if (selectedToken == null) {
+        message.error("Please select a key to regenerate");
+        return;
+      }
+      const response = await regenerateKeyCall(accessToken, selectedToken.token);
+      setRegeneratedKey(response.key);
+
+      // Update the data state with the new key_name
+      if (data) {
+        const updatedData = data.map(item => 
+          item.token === selectedToken.token 
+            ? { ...item, key_name: response.key_name } 
+            : item
+        );
+        setData(updatedData);
+      }
+
+      setRegenerateDialogVisible(false);
+      message.success("API Key regenerated successfully");
+    } catch (error) {
+      console.error("Error regenerating key:", error);
+      message.error("Failed to regenerate API Key");
+    }
+  };
+
  if (data == null) {
    return;
  }
@ -768,6 +808,7 @@ const ViewKeyTable: React.FC<ViewKeyTableProps> = ({
                      size="sm"
                    />
                    
+                    
                
    <Modal
      open={infoDialogVisible}
@ -867,6 +908,14 @@ const ViewKeyTable: React.FC<ViewKeyTableProps> = ({
                    size="sm"
                    onClick={() => handleEditClick(item)}
                  />
+                  <Icon
+                      onClick={() => {
+                        setSelectedToken(item);
+                        setRegenerateDialogVisible(true);
+                      }}
+                      icon={RefreshIcon}
+                      size="sm"
+                    />
                  <Icon
                    onClick={() => handleDelete(item)}
                    icon={TrashIcon}
@ -942,6 +991,98 @@ const ViewKeyTable: React.FC<ViewKeyTableProps> = ({
          accessToken={accessToken}
        />
      )}
+
+    {/* Regenerate Key Confirmation Dialog */}
+    <Modal
+      title="Regenerate API Key"
+      visible={regenerateDialogVisible}
+      onCancel={() => setRegenerateDialogVisible(false)}
+      footer={[
+        <Button key="cancel" onClick={() => setRegenerateDialogVisible(false)} className="mr-2">
+          Cancel
+        </Button>,
+        <Button
+          key="regenerate"
+          onClick={handleRegenerateKey}
+          disabled={!premiumUser}
+        >
+          {premiumUser ? "Regenerate" : "Upgrade to Regenerate"}
+        </Button>
+      ]}
+    >
+      {premiumUser ? (
+        <>
+          <p>Are you sure you want to regenerate this key?</p>
+          <p>Key Alias:</p>
+          <pre>{selectedToken?.key_alias || 'No alias set'}</pre>
+        </>
+      ) : (
+        <div>
+          <p className="mb-2 text-gray-500 italic text-[12px]">Upgrade to use this feature</p>
+          <Button variant="primary" className="mb-2">
+            <a href="https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat" target="_blank">
+              Get Free Trial
+            </a>
+          </Button>
+        </div>
+      )}
+    </Modal>
+
+    {/* Regenerated Key Display Modal */}
+    {regeneratedKey && (
+      <Modal
+        visible={!!regeneratedKey}
+        onCancel={() => setRegeneratedKey(null)}
+        footer={[
+          <Button key="close" onClick={() => setRegeneratedKey(null)}>
+            Close
+          </Button>
+        ]}
+      >
+        <Grid numItems={1} className="gap-2 w-full">
+          <Title>Regenerated Key</Title>
+          <Col numColSpan={1}>
+            <p>
+              Please replace your old key with the new key generated. For
+              security reasons, <b>you will not be able to view it again</b> through
+              your LiteLLM account. If you lose this secret key, you will need to
+              generate a new one.
+            </p>
+          </Col>
+          <Col numColSpan={1}>
+            <Text className="mt-3">Key Alias:</Text>
+            <div
+              style={{
+                background: "#f8f8f8",
+                padding: "10px",
+                borderRadius: "5px",
+                marginBottom: "10px",
+              }}
+            >
+              <pre style={{ wordWrap: "break-word", whiteSpace: "normal" }}>
+                {selectedToken?.key_alias || 'No alias set'}
+              </pre>
+            </div>
+            <Text className="mt-3">New API Key:</Text>
+            <div
+              style={{
+                background: "#f8f8f8",
+                padding: "10px",
+                borderRadius: "5px",
+                marginBottom: "10px",
+              }}
+            >
+              <pre style={{ wordWrap: "break-word", whiteSpace: "normal" }}>
+                {regeneratedKey}
+              </pre>
+            </div>
+            <CopyToClipboard text={regeneratedKey} onCopy={() => message.success("API Key copied to clipboard")}>
+              <Button className="mt-3">Copy API Key</Button>
+            </CopyToClipboard>
+          </Col>
+        </Grid>
+      </Modal>
+    )}
    </div>
  );
 };