Merge branch 'main' into main

2024-05-06 09:40:23 -03:00 · 2024-05-06 09:40:23 -03:00 · b22517845e
commit b22517845e
parent 78303b79ee 918367cc7b
98 changed files with 3926 additions and 997 deletions
--- a/.github/pull_request_template.md
+++ b/.github/pull_request_template.md
@ -0,0 +1,47 @@
+<!-- This is just examples. You can remove all items if you want. -->
+<!-- Please remove all comments. -->
+
+## Title
+
+<!-- e.g. "Implement user authentication feature" -->
+
+## Relevant issues
+
+<!-- e.g. "Fixes #000" -->
+
+## Type
+
+<!-- Select the type of Pull Request -->
+<!-- Keep only the necessary ones -->
+
+🆕 New Feature
+🐛 Bug Fix
+🧹 Refactoring
+📖 Documentation
+💻 Development Environment
+🚄 Infrastructure
+✅ Test
+
+## Changes
+
+<!-- List of changes -->
+
+## Testing
+
+<!-- Test procedure -->
+
+## Notes
+
+<!-- Test results -->
+
+<!-- Points to note for the reviewer, consultation content, concerns -->
+
+## Pre-Submission Checklist (optional but appreciated):
+
+- [ ] I have included relevant documentation updates (stored in /docs/my-website)
+
+## OS Tests (optional but appreciated):
+
+- [ ] Tested on Windows
+- [ ] Tested on MacOS
+- [ ] Tested on Linux
--- a/README.md
+++ b/README.md
@ -248,7 +248,7 @@ Step 2: Navigate into the project, and install dependencies:

 ```
 cd litellm
-poetry install
+poetry install -E extra_proxy -E proxy
 ```

 Step 3: Test your change:
--- a/docs/my-website/docs/completion/input.md
+++ b/docs/my-website/docs/completion/input.md
@ -84,7 +84,7 @@ def completion(
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stop=None,
-    max_tokens: Optional[float] = None,
+    max_tokens: Optional[int] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
--- a/docs/my-website/docs/completion/token_usage.md
+++ b/docs/my-website/docs/completion/token_usage.md
@ -1,7 +1,7 @@
 # Completion Token Usage & Cost
 By default LiteLLM returns token usage in all completion requests ([See here](https://litellm.readthedocs.io/en/latest/output/))

-However, we also expose 5 helper functions + **[NEW]** an API to calculate token usage across providers:
+However, we also expose some helper functions + **[NEW]** an API to calculate token usage across providers:

 - `encode`: This encodes the text passed in, using the model-specific tokenizer. [**Jump to code**](#1-encode)

@ -9,17 +9,19 @@ However, we also expose 5 helper functions + **[NEW]** an API to calculate token

 - `token_counter`: This returns the number of tokens for a given input - it uses the tokenizer based on the model, and defaults to tiktoken if no model-specific tokenizer is available. [**Jump to code**](#3-token_counter)

- `cost_per_token`: This returns the cost (in USD) for prompt (input) and completion (output) tokens. Uses the live list from `api.litellm.ai`. [**Jump to code**](#4-cost_per_token)
+- `create_pretrained_tokenizer` and `create_tokenizer`: LiteLLM provides default tokenizer support for OpenAI, Cohere, Anthropic, Llama2, and Llama3 models. If you are using a different model, you can create a custom tokenizer and pass it as `custom_tokenizer` to the `encode`, `decode`, and `token_counter` methods. [**Jump to code**](#4-create_pretrained_tokenizer-and-create_tokenizer)

- `completion_cost`: This returns the overall cost (in USD) for a given LLM API Call. It combines `token_counter` and `cost_per_token` to return the cost for that query (counting both cost of input and output). [**Jump to code**](#5-completion_cost)
+- `cost_per_token`: This returns the cost (in USD) for prompt (input) and completion (output) tokens. Uses the live list from `api.litellm.ai`. [**Jump to code**](#5-cost_per_token)

- `get_max_tokens`: This returns the maximum number of tokens allowed for the given model. [**Jump to code**](#6-get_max_tokens)
+- `completion_cost`: This returns the overall cost (in USD) for a given LLM API Call. It combines `token_counter` and `cost_per_token` to return the cost for that query (counting both cost of input and output). [**Jump to code**](#6-completion_cost)

- `model_cost`: This returns a dictionary for all models, with their max_tokens, input_cost_per_token and output_cost_per_token. It uses the `api.litellm.ai` call shown below. [**Jump to code**](#7-model_cost)
+- `get_max_tokens`: This returns the maximum number of tokens allowed for the given model. [**Jump to code**](#7-get_max_tokens)

- `register_model`: This registers new / overrides existing models (and their pricing details) in the model cost dictionary. [**Jump to code**](#8-register_model)
+- `model_cost`: This returns a dictionary for all models, with their max_tokens, input_cost_per_token and output_cost_per_token. It uses the `api.litellm.ai` call shown below. [**Jump to code**](#8-model_cost)

- `api.litellm.ai`: Live token + price count across [all supported models](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). [**Jump to code**](#9-apilitellmai)
+- `register_model`: This registers new / overrides existing models (and their pricing details) in the model cost dictionary. [**Jump to code**](#9-register_model)
+
+- `api.litellm.ai`: Live token + price count across [all supported models](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). [**Jump to code**](#10-apilitellmai)

 📣 This is a community maintained list. Contributions are welcome! ❤️

@ -60,7 +62,24 @@ messages = [{"user": "role", "content": "Hey, how's it going"}]
 print(token_counter(model="gpt-3.5-turbo", messages=messages))
 ```

-### 4. `cost_per_token`
+### 4. `create_pretrained_tokenizer` and `create_tokenizer`
+
+```python
+from litellm import create_pretrained_tokenizer, create_tokenizer
+
+# get tokenizer from huggingface repo
+custom_tokenizer_1 = create_pretrained_tokenizer("Xenova/llama-3-tokenizer")
+
+# use tokenizer from json file
+with open("tokenizer.json") as f:
+    json_data = json.load(f)
+
+json_str = json.dumps(json_data)
+
+custom_tokenizer_2 = create_tokenizer(json_str)
+```
+
+### 5. `cost_per_token`

 ```python
 from litellm import cost_per_token
@ -72,7 +91,7 @@ prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = cost_per_toke
 print(prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar)
 ```

-### 5. `completion_cost`
+### 6. `completion_cost`

 * Input: Accepts a `litellm.completion()` response **OR** prompt + completion strings
 * Output: Returns a `float` of cost for the `completion` call 
@ -99,7 +118,7 @@ cost = completion_cost(model="bedrock/anthropic.claude-v2", prompt="Hey!", compl
 formatted_string = f"${float(cost):.10f}"
 print(formatted_string)
 ```
-### 6. `get_max_tokens`
+### 7. `get_max_tokens`

 Input: Accepts a model name - e.g., gpt-3.5-turbo (to get a complete list, call litellm.model_list).
 Output: Returns the maximum number of tokens allowed for the given model
@ -112,7 +131,7 @@ model = "gpt-3.5-turbo"
 print(get_max_tokens(model)) # Output: 4097
 ```

-### 7. `model_cost`
+### 8. `model_cost`

 * Output: Returns a dict object containing the max_tokens, input_cost_per_token, output_cost_per_token for all models on [community-maintained list](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)

@ -122,7 +141,7 @@ from litellm import model_cost
 print(model_cost) # {'gpt-3.5-turbo': {'max_tokens': 4000, 'input_cost_per_token': 1.5e-06, 'output_cost_per_token': 2e-06}, ...}
 ```

-### 8. `register_model`
+### 9. `register_model`

 * Input: Provide EITHER a model cost dictionary or a url to a hosted json blob
 * Output: Returns updated model_cost dictionary + updates litellm.model_cost with model details.  
@ -157,5 +176,3 @@ export LITELLM_LOCAL_MODEL_COST_MAP="True"
 ```

 Note: this means you will need to upgrade to get updated pricing, and newer models. 
-
-
--- a/docs/my-website/docs/exception_mapping.md
+++ b/docs/my-website/docs/exception_mapping.md
@ -13,7 +13,7 @@ LiteLLM maps exceptions across all providers to their OpenAI counterparts.
 | >=500       | InternalServerError      |
 | N/A         | ContextWindowExceededError|
 | 400         | ContentPolicyViolationError|
-| N/A         | APIConnectionError       |
+| 500         | APIConnectionError       |


 Base case we return APIConnectionError
@ -74,6 +74,28 @@ except Exception as e:

 ```

+## Usage - Should you retry exception? 
+
+```
+import litellm
+import openai
+
+try:
+    response = litellm.completion(
+                model="gpt-4",
+                messages=[
+                    {
+                        "role": "user",
+                        "content": "hello, write a 20 pageg essay"
+                    }
+                ],
+                timeout=0.01, # this will raise a timeout exception
+            )
+except openai.APITimeoutError as e:
+    should_retry = litellm._should_retry(e.status_code)
+    print(f"should_retry: {should_retry}")
+```
+
 ## Details 

 To see how it's implemented - [check out the code](https://github.com/BerriAI/litellm/blob/a42c197e5a6de56ea576c73715e6c7c6b19fa249/litellm/utils.py#L1217)
@ -86,21 +108,34 @@ To see how it's implemented - [check out the code](https://github.com/BerriAI/li

 Base case - we return the original exception.

-|               | ContextWindowExceededError | AuthenticationError | InvalidRequestError | RateLimitError | ServiceUnavailableError |
-|---------------|----------------------------|---------------------|---------------------|---------------|-------------------------|
-| Anthropic     | ✅                          | ✅                   | ✅                   | ✅             |                         |
-| OpenAI        | ✅                          | ✅                     |✅                     |✅               |✅|
-| Azure OpenAI        | ✅                          | ✅                     |✅                     |✅               |✅|
-| Replicate     | ✅                          | ✅                   | ✅                   | ✅             | ✅                       |
-| Cohere        | ✅                          | ✅                   | ✅                    | ✅             | ✅                        |
-| Huggingface   | ✅                          | ✅                   | ✅                   | ✅             |                         |
-| Openrouter    | ✅                          | ✅                   | ✅                    | ✅             |                         |
-| AI21          | ✅                          | ✅                   | ✅                   | ✅             |                         |
-| VertexAI          |                           |                   |✅                   |             |                         |
-| Bedrock          |                           |                   |✅                   |             |                         |
-| Sagemaker          |                           |                   |✅                   |             |                         |
-| TogetherAI    | ✅                          | ✅                   | ✅                   | ✅             |                         |
-| AlephAlpha    | ✅                          | ✅                   | ✅                   | ✅             | ✅                        |
+| custom_llm_provider        | Timeout | ContextWindowExceededError | BadRequestError | NotFoundError | ContentPolicyViolationError | AuthenticationError | APIError | RateLimitError | ServiceUnavailableError | PermissionDeniedError | UnprocessableEntityError |
+|----------------------------|---------|----------------------------|------------------|---------------|-----------------------------|---------------------|----------|----------------|-------------------------|-----------------------|-------------------------|
+| openai                     | ✓       | ✓                          | ✓                |               | ✓                           | ✓                   |          |                |                         |                       |                           |
+| text-completion-openai     | ✓       | ✓                          | ✓                |               | ✓                           | ✓                   |          |                |                         |                       |                           |
+| custom_openai              | ✓       | ✓                          | ✓                |               | ✓                           | ✓                   |          |                |                         |                       |                           |
+| openai_compatible_providers| ✓       | ✓                          | ✓                |               | ✓                           | ✓                   |          |                |                         |                       |                           |
+| anthropic                  | ✓       | ✓                          | ✓                | ✓             |                             | ✓                   |          |                | ✓                       | ✓                     |                           |
+| replicate                  | ✓       | ✓                          | ✓                | ✓             |                             | ✓                   |          | ✓              | ✓                       |                       |                           |
+| bedrock                    | ✓       | ✓                          | ✓                | ✓             |                             | ✓                   |          | ✓              | ✓                       | ✓                     |                           |
+| sagemaker                  |         | ✓                          | ✓                |               |                             |                     |          |                |                         |                       |                           |
+| vertex_ai                  | ✓       |                            | ✓                |               |                             |                     | ✓        |                |                         |                       | ✓                         |
+| palm                       | ✓       | ✓                          |                  |               |                             |                     | ✓        |                |                         |                       |                           |
+| gemini                     | ✓       | ✓                          |                  |               |                             |                     | ✓        |                |                         |                       |                           |
+| cloudflare                 |         |                            | ✓                |               |                             | ✓                   |          |                |                         |                       |                           |
+| cohere                     |         | ✓                          | ✓                |               |                             | ✓                   |          |                | ✓                       |                       |                           |
+| cohere_chat                |         | ✓                          | ✓                |               |                             | ✓                   |          |                | ✓                       |                       |                           |
+| huggingface                | ✓       | ✓                          | ✓                |               |                             | ✓                   |          | ✓              | ✓                       |                       |                           |
+| ai21                       | ✓       | ✓                          | ✓                | ✓             |                             | ✓                   |          | ✓              |                         |                       |                           |
+| nlp_cloud                  | ✓       | ✓                          | ✓                |               |                             | ✓                   | ✓        | ✓              | ✓                       |                       |                           |
+| together_ai                | ✓       | ✓                          | ✓                |               |                             | ✓                   |          |                |                         |                       |                           |
+| aleph_alpha                |         |                            | ✓                |               |                             | ✓                   |          |                |                         |                       |                           |
+| ollama                     | ✓       |                            | ✓                |               |                             |                     |          |                | ✓                       |                       |                           |
+| ollama_chat                | ✓       |                            | ✓                |               |                             |                     |          |                | ✓                       |                       |                           |
+| vllm                       |         |                            |                  |               |                             | ✓                   | ✓        |                |                         |                       |                           |
+| azure                      | ✓       | ✓                          | ✓                | ✓             | ✓                           | ✓                   |          |                | ✓                       |                       |                           |
+
+- "✓" indicates that the specified `custom_llm_provider` can raise the corresponding exception.
+- Empty cells indicate the lack of association or that the provider does not raise that particular exception type as indicated by the function.


 > For a deeper understanding of these exceptions, you can check out [this](https://github.com/BerriAI/litellm/blob/d7e58d13bf9ba9edbab2ab2f096f3de7547f35fa/litellm/utils.py#L1544) implementation for additional insights.
--- a/docs/my-website/docs/observability/greenscale_integration.md
+++ b/docs/my-website/docs/observability/greenscale_integration.md
@ -1,4 +1,4 @@
-# Greenscale Tutorial
+# Greenscale - Track LLM Spend and Responsible Usage

 [Greenscale](https://greenscale.ai/) is a production monitoring platform for your LLM-powered app that provides you granular key insights into your GenAI spending and responsible usage. Greenscale only captures metadata to minimize the exposure risk of personally identifiable information (PII).

--- a/docs/my-website/docs/providers/bedrock.md
+++ b/docs/my-website/docs/providers/bedrock.md
@ -535,7 +535,8 @@ print(response)

 | Model Name           | Function Call                               |
 |----------------------|---------------------------------------------|
-| Titan Embeddings - G1 | `embedding(model="bedrock/amazon.titan-embed-text-v1", input=input)` |
+| Titan Embeddings V2 | `embedding(model="bedrock/amazon.titan-embed-text-v2:0", input=input)` |
+| Titan Embeddings - V1 | `embedding(model="bedrock/amazon.titan-embed-text-v1", input=input)` |
 | Cohere Embeddings - English | `embedding(model="bedrock/cohere.embed-english-v3", input=input)` |
 | Cohere Embeddings - Multilingual | `embedding(model="bedrock/cohere.embed-multilingual-v3", input=input)` |

--- a/docs/my-website/docs/proxy/logging.md
+++ b/docs/my-website/docs/proxy/logging.md
@ -914,39 +914,72 @@ Test Request
 litellm --test
 ```

-## Logging Proxy Input/Output Traceloop (OpenTelemetry)
+## Logging Proxy Input/Output in OpenTelemetry format using Traceloop's OpenLLMetry

-Traceloop allows you to log LLM Input/Output in the OpenTelemetry format
+[OpenLLMetry](https://github.com/traceloop/openllmetry) _(built and maintained by Traceloop)_ is a set of extensions
+built on top of [OpenTelemetry](https://opentelemetry.io/) that gives you complete observability over your LLM
+application. Because it uses OpenTelemetry under the
+hood, [it can be connected to various observability solutions](https://www.traceloop.com/docs/openllmetry/integrations/introduction)
+like:

-We will use the `--config` to set `litellm.success_callback = ["traceloop"]` this will log all successfull LLM calls to traceloop
+* [Traceloop](https://www.traceloop.com/docs/openllmetry/integrations/traceloop)
+* [Axiom](https://www.traceloop.com/docs/openllmetry/integrations/axiom)
+* [Azure Application Insights](https://www.traceloop.com/docs/openllmetry/integrations/azure)
+* [Datadog](https://www.traceloop.com/docs/openllmetry/integrations/datadog)
+* [Dynatrace](https://www.traceloop.com/docs/openllmetry/integrations/dynatrace)
+* [Grafana Tempo](https://www.traceloop.com/docs/openllmetry/integrations/grafana)
+* [Honeycomb](https://www.traceloop.com/docs/openllmetry/integrations/honeycomb)
+* [HyperDX](https://www.traceloop.com/docs/openllmetry/integrations/hyperdx)
+* [Instana](https://www.traceloop.com/docs/openllmetry/integrations/instana)
+* [New Relic](https://www.traceloop.com/docs/openllmetry/integrations/newrelic)
+* [OpenTelemetry Collector](https://www.traceloop.com/docs/openllmetry/integrations/otel-collector)
+* [Service Now Cloud Observability](https://www.traceloop.com/docs/openllmetry/integrations/service-now)
+* [Sentry](https://www.traceloop.com/docs/openllmetry/integrations/sentry)
+* [SigNoz](https://www.traceloop.com/docs/openllmetry/integrations/signoz)
+* [Splunk](https://www.traceloop.com/docs/openllmetry/integrations/splunk)

-**Step 1** Install traceloop-sdk and set Traceloop API key
+We will use the `--config` to set `litellm.success_callback = ["traceloop"]` to achieve this, steps are listed below.
+
+**Step 1:** Install the SDK

 ```shell
-pip install traceloop-sdk -U
+pip install traceloop-sdk
 ```

-Traceloop outputs standard OpenTelemetry data that can be connected to your observability stack. Send standard OpenTelemetry from LiteLLM Proxy to [Traceloop](https://www.traceloop.com/docs/openllmetry/integrations/traceloop), [Dynatrace](https://www.traceloop.com/docs/openllmetry/integrations/dynatrace), [Datadog](https://www.traceloop.com/docs/openllmetry/integrations/datadog)
-, [New Relic](https://www.traceloop.com/docs/openllmetry/integrations/newrelic), [Honeycomb](https://www.traceloop.com/docs/openllmetry/integrations/honeycomb), [Grafana Tempo](https://www.traceloop.com/docs/openllmetry/integrations/grafana), [Splunk](https://www.traceloop.com/docs/openllmetry/integrations/splunk), [OpenTelemetry Collector](https://www.traceloop.com/docs/openllmetry/integrations/otel-collector)
+**Step 2:** Configure Environment Variable for trace exporting
+
+You will need to configure where to export your traces. Environment variables will control this, example: For Traceloop
+you should use `TRACELOOP_API_KEY`, whereas for Datadog you use `TRACELOOP_BASE_URL`. For more
+visit [the Integrations Catalog](https://www.traceloop.com/docs/openllmetry/integrations/introduction).
+
+If you are using Datadog as the observability solutions then you can set `TRACELOOP_BASE_URL` as:
+
+```shell
+TRACELOOP_BASE_URL=http://<datadog-agent-hostname>:4318
+```
+
+**Step 3**: Create a `config.yaml` file and set `litellm_settings`: `success_callback`

-**Step 2**: Create a `config.yaml` file and set `litellm_settings`: `success_callback`
 ```yaml
 model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
+      api_key: my-fake-key # replace api_key with actual key
 litellm_settings:
-  success_callback: ["traceloop"]
+  success_callback: [ "traceloop" ]
 ```

-**Step 3**: Start the proxy, make a test request
+**Step 4**: Start the proxy, make a test request

 Start proxy
+
 ```shell
 litellm --config config.yaml --debug
 ```

 Test Request
+
 ```
 curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Content-Type: application/json' \
--- a/docs/my-website/docs/proxy/prod.md
+++ b/docs/my-website/docs/proxy/prod.md
@ -3,34 +3,38 @@ import TabItem from '@theme/TabItem';

 # ⚡ Best Practices for Production

-Expected Performance in Production
+## 1. Use this config.yaml
+Use this config.yaml in production (with your own LLMs)

-1 LiteLLM Uvicorn Worker on Kubernetes
-
-| Description | Value |
-|--------------|-------|
-| Avg latency | `50ms` |
-| Median latency | `51ms` |
-| `/chat/completions` Requests/second | `35` |
-| `/chat/completions` Requests/minute | `2100` |
-| `/chat/completions` Requests/hour | `126K` |
-
-
-## 1. Switch off Debug Logging
-
-Remove `set_verbose: True` from your config.yaml
 ```yaml
+model_list:
+  - model_name: fake-openai-endpoint
+    litellm_params:
+      model: openai/fake
+      api_key: fake-key
+      api_base: https://exampleopenaiendpoint-production.up.railway.app/
+
+general_settings:
+  master_key: sk-1234      # enter your own master key, ensure it starts with 'sk-'
+  alerting: ["slack"]      # Setup slack alerting - get alerts on LLM exceptions, Budget Alerts, Slow LLM Responses
+  proxy_batch_write_at: 60 # Batch write spend updates every 60s
+
 litellm_settings:
-  set_verbose: True
+  set_verbose: False      # Switch off Debug Logging, ensure your logs do not have any debugging on
 ```

-You should only see the following level of details in logs on the proxy server
+Set slack webhook url in your env
 ```shell
-# INFO:     192.168.2.205:11774 - "POST /chat/completions HTTP/1.1" 200 OK
-# INFO:     192.168.2.205:34717 - "POST /chat/completions HTTP/1.1" 200 OK
-# INFO:     192.168.2.205:29734 - "POST /chat/completions HTTP/1.1" 200 OK
+export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/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"
 ```

+:::info
+
+Need Help or want dedicated support ? Talk to a founder [here]: (https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat)
+
+:::
+
+
 ## 2. On Kubernetes - Use 1 Uvicorn worker [Suggested CMD]

 Use this Docker `CMD`. This will start the proxy with 1 Uvicorn Async Worker
@ -40,21 +44,12 @@ Use this Docker `CMD`. This will start the proxy with 1 Uvicorn Async Worker
 CMD ["--port", "4000", "--config", "./proxy_server_config.yaml"]
 ```

-## 3. Batch write spend updates every 60s

-The default proxy batch write is 10s. This is to make it easy to see spend when debugging locally. 
+## 3. Use Redis 'port','host', 'password'. NOT 'redis_url'

-In production, we recommend using a longer interval period of 60s. This reduces the number of connections used to make DB writes. 
+If you decide to use Redis, DO NOT use 'redis_url'. We recommend usig redis port, host, and password params. 

-```yaml
-general_settings:
-  master_key: sk-1234
-  proxy_batch_write_at: 60 # 👈 Frequency of batch writing logs to server (in seconds)
-```
-
-## 4. use Redis 'port','host', 'password'. NOT 'redis_url'
-
-When connecting to Redis use redis port, host, and password params. Not 'redis_url'. We've seen a 80 RPS difference between these 2 approaches when using the async redis client. 
+`redis_url`is 80 RPS slower

 This is still something we're investigating. Keep track of it [here](https://github.com/BerriAI/litellm/issues/3188)

@ -69,103 +64,31 @@ router_settings:
  redis_password: os.environ/REDIS_PASSWORD
 ```

-## 5. Switch off resetting budgets
+## Extras
+### Expected Performance in Production

-Add this to your config.yaml. (Only spend per Key, User and Team will be tracked - spend per API Call will not be written to the LiteLLM Database)
-```yaml
-general_settings:
-  disable_reset_budget: true
-```
+1 LiteLLM Uvicorn Worker on Kubernetes

-## 6. Move spend logs to separate server (BETA)
-
-Writing each spend log to the db can slow down your proxy. In testing we saw a 70% improvement in median response time, by moving writing spend logs to a separate server. 
-
-👉 [LiteLLM Spend Logs Server](https://github.com/BerriAI/litellm/tree/main/litellm-js/spend-logs)
+| Description | Value |
+|--------------|-------|
+| Avg latency | `50ms` |
+| Median latency | `51ms` |
+| `/chat/completions` Requests/second | `35` |
+| `/chat/completions` Requests/minute | `2100` |
+| `/chat/completions` Requests/hour | `126K` |


-**Spend Logs**  
-This is a log of the key, tokens, model, and latency for each call on the proxy. 
+### Verifying Debugging logs are off

-[**Full Payload**](https://github.com/BerriAI/litellm/blob/8c9623a6bc4ad9da0a2dac64249a60ed8da719e8/litellm/proxy/utils.py#L1769)
-
-
-**1. Start the spend logs server**
-
-```bash
-docker run -p 3000:3000 \
-  -e DATABASE_URL="postgres://.." \
-  ghcr.io/berriai/litellm-spend_logs:main-latest
-
-# RUNNING on http://0.0.0.0:3000
-```
-
-**2. Connect to proxy**
-
-
-Example litellm_config.yaml
-
-```yaml
-model_list:
- model_name: fake-openai-endpoint
-  litellm_params:
-    model: openai/my-fake-model
-    api_key: my-fake-key
-    api_base: https://exampleopenaiendpoint-production.up.railway.app/
-
-general_settings:
-  master_key: sk-1234
-  proxy_batch_write_at: 5 # 👈 Frequency of batch writing logs to server (in seconds)
-```
-
-Add `SPEND_LOGS_URL` as an environment variable when starting the proxy 
-
-```bash
-docker run \
-    -v $(pwd)/litellm_config.yaml:/app/config.yaml \
-    -e DATABASE_URL="postgresql://.." \
-    -e SPEND_LOGS_URL="http://host.docker.internal:3000" \ # 👈 KEY CHANGE
-    -p 4000:4000 \
-    ghcr.io/berriai/litellm:main-latest \
-    --config /app/config.yaml --detailed_debug
-
-# Running on http://0.0.0.0:4000
-```
-
-**3. Test Proxy!**
-
-
-```bash
-curl --location 'http://0.0.0.0:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-1234' \
--data '{
-    "model": "fake-openai-endpoint", 
-    "messages": [
-        {"role": "system", "content": "Be helpful"},
-        {"role": "user", "content": "What do you know?"}
-    ]
-}'
-```
-
-In your LiteLLM Spend Logs Server, you should see
-
-**Expected Response**
-
-```
-Received and stored 1 logs. Total logs in memory: 1
-...
-Flushed 1 log to the DB.
+You should only see the following level of details in logs on the proxy server
+```shell
+# INFO:     192.168.2.205:11774 - "POST /chat/completions HTTP/1.1" 200 OK
+# INFO:     192.168.2.205:34717 - "POST /chat/completions HTTP/1.1" 200 OK
+# INFO:     192.168.2.205:29734 - "POST /chat/completions HTTP/1.1" 200 OK
 ```


-### Machine Specification
-
-A t2.micro should be sufficient to handle 1k logs / minute on this server. 
-
-This consumes at max 120MB, and <0.1 vCPU. 
-
-## Machine Specifications to Deploy LiteLLM
+### Machine Specifications to Deploy LiteLLM

 | Service | Spec | CPUs | Memory | Architecture | Version|
 | --- | --- | --- | --- | --- | --- | 
@ -173,7 +96,7 @@ This consumes at max 120MB, and <0.1 vCPU.
 | Redis Cache | - | - | - | - | 7.0+ Redis Engine|


-## Reference Kubernetes Deployment YAML
+### Reference Kubernetes Deployment YAML

 Reference Kubernetes `deployment.yaml` that was load tested by us

--- a/docs/my-website/docs/routing.md
+++ b/docs/my-website/docs/routing.md
@ -616,6 +616,57 @@ response = router.completion(model="gpt-3.5-turbo", messages=messages)
 print(f"response: {response}")
 ```

+#### Retries based on Error Type
+
+Use `RetryPolicy` if you want to set a `num_retries` based on the Exception receieved
+
+Example:
+- 4 retries for `ContentPolicyViolationError`
+- 0 retries for `RateLimitErrors` 
+
+Example Usage
+
+```python
+from litellm.router import RetryPolicy
+retry_policy = RetryPolicy(
+	ContentPolicyViolationErrorRetries=3, # run 3 retries for ContentPolicyViolationErrors
+	AuthenticationErrorRetries=0,		  # run 0 retries for AuthenticationErrorRetries
+	BadRequestErrorRetries=1,
+	TimeoutErrorRetries=2,
+	RateLimitErrorRetries=3,
+)
+
+router = litellm.Router(
+	model_list=[
+		{
+			"model_name": "gpt-3.5-turbo",  # openai model name
+			"litellm_params": {  # params for litellm completion/embedding call
+				"model": "azure/chatgpt-v-2",
+				"api_key": os.getenv("AZURE_API_KEY"),
+				"api_version": os.getenv("AZURE_API_VERSION"),
+				"api_base": os.getenv("AZURE_API_BASE"),
+			},
+		},
+		{
+			"model_name": "bad-model",  # openai model name
+			"litellm_params": {  # params for litellm completion/embedding call
+				"model": "azure/chatgpt-v-2",
+				"api_key": "bad-key",
+				"api_version": os.getenv("AZURE_API_VERSION"),
+				"api_base": os.getenv("AZURE_API_BASE"),
+			},
+		},
+	],
+	retry_policy=retry_policy,
+)
+
+response = await router.acompletion(
+	model=model,
+	messages=messages,
+)
+```
+
+
 ### Fallbacks 

 If a call fails after num_retries, fall back to another model group. 
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -178,6 +178,7 @@ const sidebars = {
        "observability/traceloop_integration",
        "observability/athina_integration",
        "observability/lunary_integration",
+        "observability/greenscale_integration",
        "observability/helicone_integration",
        "observability/supabase_integration",
        `observability/telemetry`,
--- a/litellm-js/spend-logs/package-lock.json
+++ b/litellm-js/spend-logs/package-lock.json
@ -5,7 +5,7 @@
  "packages": {
    "": {
      "dependencies": {
-        "@hono/node-server": "^1.9.0",
+        "@hono/node-server": "^1.10.1",
        "hono": "^4.2.7"
      },
      "devDependencies": {
@ -382,9 +382,9 @@
      }
    },
    "node_modules/@hono/node-server": {
-      "version": "1.9.0",
-      "resolved": "https://registry.npmjs.org/@hono/node-server/-/node-server-1.9.0.tgz",
-      "integrity": "sha512-oJjk7WXBlENeHhWiMqSyxPIZ3Kmf5ZYxqdlcSIXyN8Rn50bNJsPl99G4POBS03Jxh56FdfRJ0SEnC8mAVIiavQ==",
+      "version": "1.10.1",
+      "resolved": "https://registry.npmjs.org/@hono/node-server/-/node-server-1.10.1.tgz",
+      "integrity": "sha512-5BKW25JH5PQKPDkTcIgv3yNUPtOAbnnjFFgWvIxxAY/B/ZNeYjjWoAeDmqhIiCgOAJ3Tauuw+0G+VainhuZRYQ==",
      "engines": {
        "node": ">=18.14.1"
      }
--- a/litellm-js/spend-logs/package.json
+++ b/litellm-js/spend-logs/package.json
@ -3,7 +3,7 @@
    "dev": "tsx watch src/index.ts"
  },
  "dependencies": {
-    "@hono/node-server": "^1.9.0",
+    "@hono/node-server": "^1.10.1",
    "hono": "^4.2.7"
  },
  "devDependencies": {
--- a/litellm/init.py
+++ b/litellm/init.py
@ -542,7 +542,11 @@ models_by_provider: dict = {
    "together_ai": together_ai_models,
    "baseten": baseten_models,
    "openrouter": openrouter_models,
-    "vertex_ai": vertex_chat_models + vertex_text_models,
+    "vertex_ai": vertex_chat_models
+    + vertex_text_models
+    + vertex_anthropic_models
+    + vertex_vision_models
+    + vertex_language_models,
    "ai21": ai21_models,
    "bedrock": bedrock_models,
    "petals": petals_models,
@ -601,7 +605,6 @@ all_embedding_models = (
 ####### IMAGE GENERATION MODELS ###################
 openai_image_generation_models = ["dall-e-2", "dall-e-3"]

-
 from .timeout import timeout
 from .utils import (
    client,
@ -609,6 +612,8 @@ from .utils import (
    get_optional_params,
    modify_integration,
    token_counter,
+    create_pretrained_tokenizer,
+    create_tokenizer,
    cost_per_token,
    completion_cost,
    supports_function_calling,
@ -632,6 +637,7 @@ from .utils import (
    get_secret,
    get_supported_openai_params,
    get_api_base,
+    get_first_chars_messages,
 )
 from .llms.huggingface_restapi import HuggingfaceConfig
 from .llms.anthropic import AnthropicConfig
@ -688,3 +694,4 @@ from .exceptions import (
 from .budget_manager import BudgetManager
 from .proxy.proxy_cli import run_server
 from .router import Router
+from .assistants.main import *
--- a/litellm/assistants/main.py
+++ b/litellm/assistants/main.py
@ -0,0 +1,495 @@
+# What is this?
+## Main file for assistants API logic
+from typing import Iterable
+import os
+import litellm
+from openai import OpenAI
+from litellm import client
+from litellm.utils import supports_httpx_timeout
+from ..llms.openai import OpenAIAssistantsAPI
+from ..types.llms.openai import *
+from ..types.router import *
+
+####### ENVIRONMENT VARIABLES ###################
+openai_assistants_api = OpenAIAssistantsAPI()
+
+### ASSISTANTS ###
+
+
+def get_assistants(
+    custom_llm_provider: Literal["openai"],
+    client: Optional[OpenAI] = None,
+    **kwargs,
+) -> SyncCursorPage[Assistant]:
+    optional_params = GenericLiteLLMParams(**kwargs)
+
+    ### TIMEOUT LOGIC ###
+    timeout = optional_params.timeout or kwargs.get("request_timeout", 600) or 600
+    # set timeout for 10 minutes by default
+
+    if (
+        timeout is not None
+        and isinstance(timeout, httpx.Timeout)
+        and supports_httpx_timeout(custom_llm_provider) == False
+    ):
+        read_timeout = timeout.read or 600
+        timeout = read_timeout  # default 10 min timeout
+    elif timeout is not None and not isinstance(timeout, httpx.Timeout):
+        timeout = float(timeout)  # type: ignore
+    elif timeout is None:
+        timeout = 600.0
+
+    response: Optional[SyncCursorPage[Assistant]] = None
+    if custom_llm_provider == "openai":
+        api_base = (
+            optional_params.api_base  # for deepinfra/perplexity/anyscale/groq we check in get_llm_provider and pass in the api base from there
+            or litellm.api_base
+            or os.getenv("OPENAI_API_BASE")
+            or "https://api.openai.com/v1"
+        )
+        organization = (
+            optional_params.organization
+            or litellm.organization
+            or os.getenv("OPENAI_ORGANIZATION", None)
+            or None  # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105
+        )
+        # set API KEY
+        api_key = (
+            optional_params.api_key
+            or litellm.api_key  # for deepinfra/perplexity/anyscale we check in get_llm_provider and pass in the api key from there
+            or litellm.openai_key
+            or os.getenv("OPENAI_API_KEY")
+        )
+        response = openai_assistants_api.get_assistants(
+            api_base=api_base,
+            api_key=api_key,
+            timeout=timeout,
+            max_retries=optional_params.max_retries,
+            organization=organization,
+            client=client,
+        )
+    else:
+        raise litellm.exceptions.BadRequestError(
+            message="LiteLLM doesn't support {} for 'get_assistants'. Only 'openai' is supported.".format(
+                custom_llm_provider
+            ),
+            model="n/a",
+            llm_provider=custom_llm_provider,
+            response=httpx.Response(
+                status_code=400,
+                content="Unsupported provider",
+                request=httpx.Request(method="create_thread", url="https://github.com/BerriAI/litellm"),  # type: ignore
+            ),
+        )
+    return response
+
+
+### THREADS ###
+
+
+def create_thread(
+    custom_llm_provider: Literal["openai"],
+    messages: Optional[Iterable[OpenAICreateThreadParamsMessage]] = None,
+    metadata: Optional[dict] = None,
+    tool_resources: Optional[OpenAICreateThreadParamsToolResources] = None,
+    client: Optional[OpenAI] = None,
+    **kwargs,
+) -> Thread:
+    """
+    - get the llm provider
+    - if openai - route it there
+    - pass through relevant params
+
+    ```
+    from litellm import create_thread
+
+    create_thread(
+        custom_llm_provider="openai",
+        ### OPTIONAL ###
+        messages =  {
+            "role": "user",
+            "content": "Hello, what is AI?"
+            },
+            {
+            "role": "user",
+            "content": "How does AI work? Explain it in simple terms."
+        }]
+    )
+    ```
+    """
+    optional_params = GenericLiteLLMParams(**kwargs)
+
+    ### TIMEOUT LOGIC ###
+    timeout = optional_params.timeout or kwargs.get("request_timeout", 600) or 600
+    # set timeout for 10 minutes by default
+
+    if (
+        timeout is not None
+        and isinstance(timeout, httpx.Timeout)
+        and supports_httpx_timeout(custom_llm_provider) == False
+    ):
+        read_timeout = timeout.read or 600
+        timeout = read_timeout  # default 10 min timeout
+    elif timeout is not None and not isinstance(timeout, httpx.Timeout):
+        timeout = float(timeout)  # type: ignore
+    elif timeout is None:
+        timeout = 600.0
+
+    response: Optional[Thread] = None
+    if custom_llm_provider == "openai":
+        api_base = (
+            optional_params.api_base  # for deepinfra/perplexity/anyscale/groq we check in get_llm_provider and pass in the api base from there
+            or litellm.api_base
+            or os.getenv("OPENAI_API_BASE")
+            or "https://api.openai.com/v1"
+        )
+        organization = (
+            optional_params.organization
+            or litellm.organization
+            or os.getenv("OPENAI_ORGANIZATION", None)
+            or None  # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105
+        )
+        # set API KEY
+        api_key = (
+            optional_params.api_key
+            or litellm.api_key  # for deepinfra/perplexity/anyscale we check in get_llm_provider and pass in the api key from there
+            or litellm.openai_key
+            or os.getenv("OPENAI_API_KEY")
+        )
+        response = openai_assistants_api.create_thread(
+            messages=messages,
+            metadata=metadata,
+            api_base=api_base,
+            api_key=api_key,
+            timeout=timeout,
+            max_retries=optional_params.max_retries,
+            organization=organization,
+            client=client,
+        )
+    else:
+        raise litellm.exceptions.BadRequestError(
+            message="LiteLLM doesn't support {} for 'create_thread'. Only 'openai' is supported.".format(
+                custom_llm_provider
+            ),
+            model="n/a",
+            llm_provider=custom_llm_provider,
+            response=httpx.Response(
+                status_code=400,
+                content="Unsupported provider",
+                request=httpx.Request(method="create_thread", url="https://github.com/BerriAI/litellm"),  # type: ignore
+            ),
+        )
+    return response
+
+
+def get_thread(
+    custom_llm_provider: Literal["openai"],
+    thread_id: str,
+    client: Optional[OpenAI] = None,
+    **kwargs,
+) -> Thread:
+    """Get the thread object, given a thread_id"""
+    optional_params = GenericLiteLLMParams(**kwargs)
+
+    ### TIMEOUT LOGIC ###
+    timeout = optional_params.timeout or kwargs.get("request_timeout", 600) or 600
+    # set timeout for 10 minutes by default
+
+    if (
+        timeout is not None
+        and isinstance(timeout, httpx.Timeout)
+        and supports_httpx_timeout(custom_llm_provider) == False
+    ):
+        read_timeout = timeout.read or 600
+        timeout = read_timeout  # default 10 min timeout
+    elif timeout is not None and not isinstance(timeout, httpx.Timeout):
+        timeout = float(timeout)  # type: ignore
+    elif timeout is None:
+        timeout = 600.0
+
+    response: Optional[Thread] = None
+    if custom_llm_provider == "openai":
+        api_base = (
+            optional_params.api_base  # for deepinfra/perplexity/anyscale/groq we check in get_llm_provider and pass in the api base from there
+            or litellm.api_base
+            or os.getenv("OPENAI_API_BASE")
+            or "https://api.openai.com/v1"
+        )
+        organization = (
+            optional_params.organization
+            or litellm.organization
+            or os.getenv("OPENAI_ORGANIZATION", None)
+            or None  # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105
+        )
+        # set API KEY
+        api_key = (
+            optional_params.api_key
+            or litellm.api_key  # for deepinfra/perplexity/anyscale we check in get_llm_provider and pass in the api key from there
+            or litellm.openai_key
+            or os.getenv("OPENAI_API_KEY")
+        )
+        response = openai_assistants_api.get_thread(
+            thread_id=thread_id,
+            api_base=api_base,
+            api_key=api_key,
+            timeout=timeout,
+            max_retries=optional_params.max_retries,
+            organization=organization,
+            client=client,
+        )
+    else:
+        raise litellm.exceptions.BadRequestError(
+            message="LiteLLM doesn't support {} for 'get_thread'. Only 'openai' is supported.".format(
+                custom_llm_provider
+            ),
+            model="n/a",
+            llm_provider=custom_llm_provider,
+            response=httpx.Response(
+                status_code=400,
+                content="Unsupported provider",
+                request=httpx.Request(method="create_thread", url="https://github.com/BerriAI/litellm"),  # type: ignore
+            ),
+        )
+    return response
+
+
+### MESSAGES ###
+
+
+def add_message(
+    custom_llm_provider: Literal["openai"],
+    thread_id: str,
+    role: Literal["user", "assistant"],
+    content: str,
+    attachments: Optional[List[Attachment]] = None,
+    metadata: Optional[dict] = None,
+    client: Optional[OpenAI] = None,
+    **kwargs,
+) -> OpenAIMessage:
+    ### COMMON OBJECTS ###
+    message_data = MessageData(
+        role=role, content=content, attachments=attachments, metadata=metadata
+    )
+    optional_params = GenericLiteLLMParams(**kwargs)
+
+    ### TIMEOUT LOGIC ###
+    timeout = optional_params.timeout or kwargs.get("request_timeout", 600) or 600
+    # set timeout for 10 minutes by default
+
+    if (
+        timeout is not None
+        and isinstance(timeout, httpx.Timeout)
+        and supports_httpx_timeout(custom_llm_provider) == False
+    ):
+        read_timeout = timeout.read or 600
+        timeout = read_timeout  # default 10 min timeout
+    elif timeout is not None and not isinstance(timeout, httpx.Timeout):
+        timeout = float(timeout)  # type: ignore
+    elif timeout is None:
+        timeout = 600.0
+
+    response: Optional[OpenAIMessage] = None
+    if custom_llm_provider == "openai":
+        api_base = (
+            optional_params.api_base  # for deepinfra/perplexity/anyscale/groq we check in get_llm_provider and pass in the api base from there
+            or litellm.api_base
+            or os.getenv("OPENAI_API_BASE")
+            or "https://api.openai.com/v1"
+        )
+        organization = (
+            optional_params.organization
+            or litellm.organization
+            or os.getenv("OPENAI_ORGANIZATION", None)
+            or None  # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105
+        )
+        # set API KEY
+        api_key = (
+            optional_params.api_key
+            or litellm.api_key  # for deepinfra/perplexity/anyscale we check in get_llm_provider and pass in the api key from there
+            or litellm.openai_key
+            or os.getenv("OPENAI_API_KEY")
+        )
+        response = openai_assistants_api.add_message(
+            thread_id=thread_id,
+            message_data=message_data,
+            api_base=api_base,
+            api_key=api_key,
+            timeout=timeout,
+            max_retries=optional_params.max_retries,
+            organization=organization,
+            client=client,
+        )
+    else:
+        raise litellm.exceptions.BadRequestError(
+            message="LiteLLM doesn't support {} for 'create_thread'. Only 'openai' is supported.".format(
+                custom_llm_provider
+            ),
+            model="n/a",
+            llm_provider=custom_llm_provider,
+            response=httpx.Response(
+                status_code=400,
+                content="Unsupported provider",
+                request=httpx.Request(method="create_thread", url="https://github.com/BerriAI/litellm"),  # type: ignore
+            ),
+        )
+
+    return response
+
+
+def get_messages(
+    custom_llm_provider: Literal["openai"],
+    thread_id: str,
+    client: Optional[OpenAI] = None,
+    **kwargs,
+) -> SyncCursorPage[OpenAIMessage]:
+    optional_params = GenericLiteLLMParams(**kwargs)
+
+    ### TIMEOUT LOGIC ###
+    timeout = optional_params.timeout or kwargs.get("request_timeout", 600) or 600
+    # set timeout for 10 minutes by default
+
+    if (
+        timeout is not None
+        and isinstance(timeout, httpx.Timeout)
+        and supports_httpx_timeout(custom_llm_provider) == False
+    ):
+        read_timeout = timeout.read or 600
+        timeout = read_timeout  # default 10 min timeout
+    elif timeout is not None and not isinstance(timeout, httpx.Timeout):
+        timeout = float(timeout)  # type: ignore
+    elif timeout is None:
+        timeout = 600.0
+
+    response: Optional[SyncCursorPage[OpenAIMessage]] = None
+    if custom_llm_provider == "openai":
+        api_base = (
+            optional_params.api_base  # for deepinfra/perplexity/anyscale/groq we check in get_llm_provider and pass in the api base from there
+            or litellm.api_base
+            or os.getenv("OPENAI_API_BASE")
+            or "https://api.openai.com/v1"
+        )
+        organization = (
+            optional_params.organization
+            or litellm.organization
+            or os.getenv("OPENAI_ORGANIZATION", None)
+            or None  # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105
+        )
+        # set API KEY
+        api_key = (
+            optional_params.api_key
+            or litellm.api_key  # for deepinfra/perplexity/anyscale we check in get_llm_provider and pass in the api key from there
+            or litellm.openai_key
+            or os.getenv("OPENAI_API_KEY")
+        )
+        response = openai_assistants_api.get_messages(
+            thread_id=thread_id,
+            api_base=api_base,
+            api_key=api_key,
+            timeout=timeout,
+            max_retries=optional_params.max_retries,
+            organization=organization,
+            client=client,
+        )
+    else:
+        raise litellm.exceptions.BadRequestError(
+            message="LiteLLM doesn't support {} for 'get_messages'. Only 'openai' is supported.".format(
+                custom_llm_provider
+            ),
+            model="n/a",
+            llm_provider=custom_llm_provider,
+            response=httpx.Response(
+                status_code=400,
+                content="Unsupported provider",
+                request=httpx.Request(method="create_thread", url="https://github.com/BerriAI/litellm"),  # type: ignore
+            ),
+        )
+
+    return response
+
+
+### RUNS ###
+
+
+def run_thread(
+    custom_llm_provider: Literal["openai"],
+    thread_id: str,
+    assistant_id: str,
+    additional_instructions: Optional[str] = None,
+    instructions: Optional[str] = None,
+    metadata: Optional[dict] = None,
+    model: Optional[str] = None,
+    stream: Optional[bool] = None,
+    tools: Optional[Iterable[AssistantToolParam]] = None,
+    client: Optional[OpenAI] = None,
+    **kwargs,
+) -> Run:
+    """Run a given thread + assistant."""
+    optional_params = GenericLiteLLMParams(**kwargs)
+
+    ### TIMEOUT LOGIC ###
+    timeout = optional_params.timeout or kwargs.get("request_timeout", 600) or 600
+    # set timeout for 10 minutes by default
+
+    if (
+        timeout is not None
+        and isinstance(timeout, httpx.Timeout)
+        and supports_httpx_timeout(custom_llm_provider) == False
+    ):
+        read_timeout = timeout.read or 600
+        timeout = read_timeout  # default 10 min timeout
+    elif timeout is not None and not isinstance(timeout, httpx.Timeout):
+        timeout = float(timeout)  # type: ignore
+    elif timeout is None:
+        timeout = 600.0
+
+    response: Optional[Run] = None
+    if custom_llm_provider == "openai":
+        api_base = (
+            optional_params.api_base  # for deepinfra/perplexity/anyscale/groq we check in get_llm_provider and pass in the api base from there
+            or litellm.api_base
+            or os.getenv("OPENAI_API_BASE")
+            or "https://api.openai.com/v1"
+        )
+        organization = (
+            optional_params.organization
+            or litellm.organization
+            or os.getenv("OPENAI_ORGANIZATION", None)
+            or None  # default - https://github.com/openai/openai-python/blob/284c1799070c723c6a553337134148a7ab088dd8/openai/util.py#L105
+        )
+        # set API KEY
+        api_key = (
+            optional_params.api_key
+            or litellm.api_key  # for deepinfra/perplexity/anyscale we check in get_llm_provider and pass in the api key from there
+            or litellm.openai_key
+            or os.getenv("OPENAI_API_KEY")
+        )
+        response = openai_assistants_api.run_thread(
+            thread_id=thread_id,
+            assistant_id=assistant_id,
+            additional_instructions=additional_instructions,
+            instructions=instructions,
+            metadata=metadata,
+            model=model,
+            stream=stream,
+            tools=tools,
+            api_base=api_base,
+            api_key=api_key,
+            timeout=timeout,
+            max_retries=optional_params.max_retries,
+            organization=organization,
+            client=client,
+        )
+    else:
+        raise litellm.exceptions.BadRequestError(
+            message="LiteLLM doesn't support {} for 'run_thread'. Only 'openai' is supported.".format(
+                custom_llm_provider
+            ),
+            model="n/a",
+            llm_provider=custom_llm_provider,
+            response=httpx.Response(
+                status_code=400,
+                content="Unsupported provider",
+                request=httpx.Request(method="create_thread", url="https://github.com/BerriAI/litellm"),  # type: ignore
+            ),
+        )
+    return response
--- a/litellm/caching.py
+++ b/litellm/caching.py
@ -177,11 +177,18 @@ class RedisCache(BaseCache):
        try:
            # asyncio.get_running_loop().create_task(self.ping())
            result = asyncio.get_running_loop().create_task(self.ping())
-        except Exception:
-            pass
+        except Exception as e:
+            verbose_logger.error(
+                "Error connecting to Async Redis client", extra={"error": str(e)}
+            )

        ### SYNC HEALTH PING ###
+        try:
            self.redis_client.ping()
+        except Exception as e:
+            verbose_logger.error(
+                "Error connecting to Sync Redis client", extra={"error": str(e)}
+            )

    def init_async_client(self):
        from ._redis import get_redis_async_client
--- a/litellm/integrations/openmeter.py
+++ b/litellm/integrations/openmeter.py
@ -38,7 +38,7 @@ class OpenMeterLogger(CustomLogger):
        in the environment
        """
        missing_keys = []
-        if litellm.get_secret("OPENMETER_API_KEY", None) is None:
+        if os.getenv("OPENMETER_API_KEY", None) is None:
            missing_keys.append("OPENMETER_API_KEY")

        if len(missing_keys) > 0:
@ -60,47 +60,56 @@ class OpenMeterLogger(CustomLogger):
                "total_tokens": response_obj["usage"].get("total_tokens"),
            }

+        subject = kwargs.get("user", None),  # end-user passed in via 'user' param
+        if not subject:
+            raise Exception("OpenMeter: user is required")
+
        return {
            "specversion": "1.0",
            "type": os.getenv("OPENMETER_EVENT_TYPE", "litellm_tokens"),
            "id": call_id,
            "time": dt,
-            "subject": kwargs.get("user", ""),  # end-user passed in via 'user' param
+            "subject": subject,
            "source": "litellm-proxy",
            "data": {"model": model, "cost": cost, **usage},
        }

    def log_success_event(self, kwargs, response_obj, start_time, end_time):
-        _url = litellm.get_secret(
-            "OPENMETER_API_ENDPOINT", default_value="https://openmeter.cloud"
-        )
+        _url = os.getenv("OPENMETER_API_ENDPOINT", "https://openmeter.cloud")
        if _url.endswith("/"):
            _url += "api/v1/events"
        else:
            _url += "/api/v1/events"

-        api_key = litellm.get_secret("OPENMETER_API_KEY")
+        api_key = os.getenv("OPENMETER_API_KEY")

        _data = self._common_logic(kwargs=kwargs, response_obj=response_obj)
-        self.sync_http_handler.post(
-            url=_url,
-            data=_data,
-            headers={
+        _headers = {
            "Content-Type": "application/cloudevents+json",
            "Authorization": "Bearer {}".format(api_key),
-            },
+        }
+
+        try:
+            response = self.sync_http_handler.post(
+                url=_url,
+                data=json.dumps(_data),
+                headers=_headers,
            )

+            response.raise_for_status()
+        except Exception as e:
+            if hasattr(response, "text"):
+                litellm.print_verbose(f"\nError Message: {response.text}")
+            raise e
+
    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
-        _url = litellm.get_secret(
-            "OPENMETER_API_ENDPOINT", default_value="https://openmeter.cloud"
-        )
+        _url = os.getenv("OPENMETER_API_ENDPOINT", "https://openmeter.cloud")
        if _url.endswith("/"):
            _url += "api/v1/events"
        else:
            _url += "/api/v1/events"

-        api_key = litellm.get_secret("OPENMETER_API_KEY")
+        api_key = os.getenv("OPENMETER_API_KEY")

        _data = self._common_logic(kwargs=kwargs, response_obj=response_obj)
        _headers = {
@ -117,7 +126,6 @@ class OpenMeterLogger(CustomLogger):

            response.raise_for_status()
        except Exception as e:
-            print(f"\nAn Exception Occurred - {str(e)}")
            if hasattr(response, "text"):
-                print(f"\nError Message: {response.text}")
+                litellm.print_verbose(f"\nError Message: {response.text}")
            raise e
--- a/litellm/integrations/slack_alerting.py
+++ b/litellm/integrations/slack_alerting.py
@ -48,19 +48,6 @@ class SlackAlerting:
        self.internal_usage_cache = DualCache()
        self.async_http_handler = AsyncHTTPHandler()
        self.alert_to_webhook_url = alert_to_webhook_url
-        self.langfuse_logger = None
-
-        try:
-            from litellm.integrations.langfuse import LangFuseLogger
-
-            self.langfuse_logger = LangFuseLogger(
-                os.getenv("LANGFUSE_PUBLIC_KEY"),
-                os.getenv("LANGFUSE_SECRET_KEY"),
-                flush_interval=1,
-            )
-        except:
-            pass
-
        pass

    def update_values(
@ -110,62 +97,8 @@ class SlackAlerting:
        start_time: Optional[datetime.datetime] = None,
        end_time: Optional[datetime.datetime] = None,
    ):
-        import uuid
-
-        # For now: do nothing as we're debugging why this is not working as expected
-        if request_data is not None:
-            trace_id = request_data.get("metadata", {}).get(
-                "trace_id", None
-            )  # get langfuse trace id
-            if trace_id is None:
-                trace_id = "litellm-alert-trace-" + str(uuid.uuid4())
-                request_data["metadata"]["trace_id"] = trace_id
-        elif kwargs is not None:
-            _litellm_params = kwargs.get("litellm_params", {})
-            trace_id = _litellm_params.get("metadata", {}).get(
-                "trace_id", None
-            )  # get langfuse trace id
-            if trace_id is None:
-                trace_id = "litellm-alert-trace-" + str(uuid.uuid4())
-                _litellm_params["metadata"]["trace_id"] = trace_id
-
-        # Log hanging request as an error on langfuse
-        if type == "hanging_request":
-            if self.langfuse_logger is not None:
-                _logging_kwargs = copy.deepcopy(request_data)
-                if _logging_kwargs is None:
-                    _logging_kwargs = {}
-                _logging_kwargs["litellm_params"] = {}
-                request_data = request_data or {}
-                _logging_kwargs["litellm_params"]["metadata"] = request_data.get(
-                    "metadata", {}
-                )
-                # log to langfuse in a separate thread
-                import threading
-
-                threading.Thread(
-                    target=self.langfuse_logger.log_event,
-                    args=(
-                        _logging_kwargs,
-                        None,
-                        start_time,
-                        end_time,
-                        None,
-                        print,
-                        "ERROR",
-                        "Requests is hanging",
-                    ),
-                ).start()
-
-        _langfuse_host = os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com")
-        _langfuse_project_id = os.environ.get("LANGFUSE_PROJECT_ID")
-
-        # langfuse urls look like: https://us.cloud.langfuse.com/project/************/traces/litellm-alert-trace-ididi9dk-09292-************
-
-        _langfuse_url = (
-            f"{_langfuse_host}/project/{_langfuse_project_id}/traces/{trace_id}"
-        )
-        request_info += f"\n🪢 Langfuse Trace: {_langfuse_url}"
+        # do nothing for now
+        pass
        return request_info

    def _response_taking_too_long_callback(
@ -242,10 +175,6 @@ class SlackAlerting:
        request_info = f"\nRequest Model: `{model}`\nAPI Base: `{api_base}`\nMessages: `{messages}`"
        slow_message = f"`Responses are slow - {round(time_difference_float,2)}s response time > Alerting threshold: {self.alerting_threshold}s`"
        if time_difference_float > self.alerting_threshold:
-            if "langfuse" in litellm.success_callback:
-                request_info = self._add_langfuse_trace_id_to_alert(
-                    request_info=request_info, kwargs=kwargs, type="slow_response"
-                )
            # add deployment latencies to alert
            if (
                kwargs is not None
--- a/litellm/llms/anthropic.py
+++ b/litellm/llms/anthropic.py
@ -84,6 +84,51 @@ class AnthropicConfig:
            and v is not None
        }

+    def get_supported_openai_params(self):
+        return [
+            "stream",
+            "stop",
+            "temperature",
+            "top_p",
+            "max_tokens",
+            "tools",
+            "tool_choice",
+        ]
+
+    def map_openai_params(self, non_default_params: dict, optional_params: dict):
+        for param, value in non_default_params.items():
+            if param == "max_tokens":
+                optional_params["max_tokens"] = value
+            if param == "tools":
+                optional_params["tools"] = value
+            if param == "stream" and value == True:
+                optional_params["stream"] = value
+            if param == "stop":
+                if isinstance(value, str):
+                    if (
+                        value == "\n"
+                    ) and litellm.drop_params == True:  # anthropic doesn't allow whitespace characters as stop-sequences
+                        continue
+                    value = [value]
+                elif isinstance(value, list):
+                    new_v = []
+                    for v in value:
+                        if (
+                            v == "\n"
+                        ) and litellm.drop_params == True:  # anthropic doesn't allow whitespace characters as stop-sequences
+                            continue
+                        new_v.append(v)
+                    if len(new_v) > 0:
+                        value = new_v
+                    else:
+                        continue
+                optional_params["stop_sequences"] = value
+            if param == "temperature":
+                optional_params["temperature"] = value
+            if param == "top_p":
+                optional_params["top_p"] = value
+        return optional_params
+

 # makes headers for API call
 def validate_environment(api_key, user_headers):
--- a/litellm/llms/azure.py
+++ b/litellm/llms/azure.py
@ -151,7 +151,7 @@ class AzureChatCompletion(BaseLLM):
        api_type: str,
        azure_ad_token: str,
        print_verbose: Callable,
-        timeout,
+        timeout: Union[float, httpx.Timeout],
        logging_obj,
        optional_params,
        litellm_params,
--- a/litellm/llms/bedrock.py
+++ b/litellm/llms/bedrock.py
@ -4,7 +4,13 @@ from enum import Enum
 import time, uuid
 from typing import Callable, Optional, Any, Union, List
 import litellm
-from litellm.utils import ModelResponse, get_secret, Usage, ImageResponse
+from litellm.utils import (
+    ModelResponse,
+    get_secret,
+    Usage,
+    ImageResponse,
+    map_finish_reason,
+)
 from .prompt_templates.factory import (
    prompt_factory,
    custom_prompt,
@ -545,7 +551,7 @@ def init_bedrock_client(
    aws_profile_name: Optional[str] = None,
    aws_role_name: Optional[str] = None,
    extra_headers: Optional[dict] = None,
-    timeout: Optional[int] = None,
+    timeout: Optional[Union[float, httpx.Timeout]] = None,
 ):
    # check for custom AWS_REGION_NAME and use it if not passed to init_bedrock_client
    litellm_aws_region_name = get_secret("AWS_REGION_NAME", None)
@ -603,7 +609,14 @@ def init_bedrock_client(

    import boto3

+    if isinstance(timeout, float):
        config = boto3.session.Config(connect_timeout=timeout, read_timeout=timeout)
+    elif isinstance(timeout, httpx.Timeout):
+        config = boto3.session.Config(
+            connect_timeout=timeout.connect, read_timeout=timeout.read
+        )
+    else:
+        config = boto3.session.Config()

    ### CHECK STS ###
    if aws_role_name is not None and aws_session_name is not None:
@ -1058,7 +1071,9 @@ def completion(
                            logging_obj=logging_obj,
                        )

-                model_response["finish_reason"] = response_body["stop_reason"]
+                model_response["finish_reason"] = map_finish_reason(
+                    response_body["stop_reason"]
+                )
                _usage = litellm.Usage(
                    prompt_tokens=response_body["usage"]["input_tokens"],
                    completion_tokens=response_body["usage"]["output_tokens"],
--- a/litellm/llms/openai.py
+++ b/litellm/llms/openai.py
@ -1,4 +1,13 @@
-from typing import Optional, Union, Any, BinaryIO
+from typing import (
+    Optional,
+    Union,
+    Any,
+    BinaryIO,
+    Literal,
+    Iterable,
+)
+from typing_extensions import override
+from pydantic import BaseModel
 import types, time, json, traceback
 import httpx
 from .base import BaseLLM
@ -17,6 +26,7 @@ import aiohttp, requests
 import litellm
 from .prompt_templates.factory import prompt_factory, custom_prompt
 from openai import OpenAI, AsyncOpenAI
+from ..types.llms.openai import *


 class OpenAIError(Exception):
@ -246,7 +256,7 @@ class OpenAIChatCompletion(BaseLLM):
    def completion(
        self,
        model_response: ModelResponse,
-        timeout: float,
+        timeout: Union[float, httpx.Timeout],
        model: Optional[str] = None,
        messages: Optional[list] = None,
        print_verbose: Optional[Callable] = None,
@ -271,9 +281,12 @@ class OpenAIChatCompletion(BaseLLM):
            if model is None or messages is None:
                raise OpenAIError(status_code=422, message=f"Missing model or messages")

-            if not isinstance(timeout, float):
+            if not isinstance(timeout, float) and not isinstance(
+                timeout, httpx.Timeout
+            ):
                raise OpenAIError(
-                    status_code=422, message=f"Timeout needs to be a float"
+                    status_code=422,
+                    message=f"Timeout needs to be a float or httpx.Timeout",
                )

            if custom_llm_provider != "openai":
@ -425,7 +438,7 @@ class OpenAIChatCompletion(BaseLLM):
        self,
        data: dict,
        model_response: ModelResponse,
-        timeout: float,
+        timeout: Union[float, httpx.Timeout],
        api_key: Optional[str] = None,
        api_base: Optional[str] = None,
        organization: Optional[str] = None,
@ -480,7 +493,7 @@ class OpenAIChatCompletion(BaseLLM):
    def streaming(
        self,
        logging_obj,
-        timeout: float,
+        timeout: Union[float, httpx.Timeout],
        data: dict,
        model: str,
        api_key: Optional[str] = None,
@ -524,7 +537,7 @@ class OpenAIChatCompletion(BaseLLM):
    async def async_streaming(
        self,
        logging_obj,
-        timeout: float,
+        timeout: Union[float, httpx.Timeout],
        data: dict,
        model: str,
        api_key: Optional[str] = None,
@ -1233,3 +1246,223 @@ class OpenAITextCompletion(BaseLLM):

        async for transformed_chunk in streamwrapper:
            yield transformed_chunk
+
+
+class OpenAIAssistantsAPI(BaseLLM):
+    def __init__(self) -> None:
+        super().__init__()
+
+    def get_openai_client(
+        self,
+        api_key: Optional[str],
+        api_base: Optional[str],
+        timeout: Union[float, httpx.Timeout],
+        max_retries: Optional[int],
+        organization: Optional[str],
+        client: Optional[OpenAI] = None,
+    ) -> OpenAI:
+        received_args = locals()
+        if client is None:
+            data = {}
+            for k, v in received_args.items():
+                if k == "self" or k == "client":
+                    pass
+                elif k == "api_base" and v is not None:
+                    data["base_url"] = v
+                elif v is not None:
+                    data[k] = v
+            openai_client = OpenAI(**data)  # type: ignore
+        else:
+            openai_client = client
+
+        return openai_client
+
+    ### ASSISTANTS ###
+
+    def get_assistants(
+        self,
+        api_key: Optional[str],
+        api_base: Optional[str],
+        timeout: Union[float, httpx.Timeout],
+        max_retries: Optional[int],
+        organization: Optional[str],
+        client: Optional[OpenAI],
+    ) -> SyncCursorPage[Assistant]:
+        openai_client = self.get_openai_client(
+            api_key=api_key,
+            api_base=api_base,
+            timeout=timeout,
+            max_retries=max_retries,
+            organization=organization,
+            client=client,
+        )
+
+        response = openai_client.beta.assistants.list()
+
+        return response
+
+    ### MESSAGES ###
+
+    def add_message(
+        self,
+        thread_id: str,
+        message_data: MessageData,
+        api_key: Optional[str],
+        api_base: Optional[str],
+        timeout: Union[float, httpx.Timeout],
+        max_retries: Optional[int],
+        organization: Optional[str],
+        client: Optional[OpenAI] = None,
+    ) -> OpenAIMessage:
+
+        openai_client = self.get_openai_client(
+            api_key=api_key,
+            api_base=api_base,
+            timeout=timeout,
+            max_retries=max_retries,
+            organization=organization,
+            client=client,
+        )
+
+        thread_message: OpenAIMessage = openai_client.beta.threads.messages.create(
+            thread_id, **message_data
+        )
+
+        response_obj: Optional[OpenAIMessage] = None
+        if getattr(thread_message, "status", None) is None:
+            thread_message.status = "completed"
+            response_obj = OpenAIMessage(**thread_message.dict())
+        else:
+            response_obj = OpenAIMessage(**thread_message.dict())
+        return response_obj
+
+    def get_messages(
+        self,
+        thread_id: str,
+        api_key: Optional[str],
+        api_base: Optional[str],
+        timeout: Union[float, httpx.Timeout],
+        max_retries: Optional[int],
+        organization: Optional[str],
+        client: Optional[OpenAI] = None,
+    ) -> SyncCursorPage[OpenAIMessage]:
+        openai_client = self.get_openai_client(
+            api_key=api_key,
+            api_base=api_base,
+            timeout=timeout,
+            max_retries=max_retries,
+            organization=organization,
+            client=client,
+        )
+
+        response = openai_client.beta.threads.messages.list(thread_id=thread_id)
+
+        return response
+
+    ### THREADS ###
+
+    def create_thread(
+        self,
+        metadata: Optional[dict],
+        api_key: Optional[str],
+        api_base: Optional[str],
+        timeout: Union[float, httpx.Timeout],
+        max_retries: Optional[int],
+        organization: Optional[str],
+        client: Optional[OpenAI],
+        messages: Optional[Iterable[OpenAICreateThreadParamsMessage]],
+    ) -> Thread:
+        """
+        Here's an example:
+        ```
+        from litellm.llms.openai import OpenAIAssistantsAPI, MessageData
+
+        # create thread
+        message: MessageData = {"role": "user", "content": "Hey, how's it going?"}
+        openai_api.create_thread(messages=[message])
+        ```
+        """
+        openai_client = self.get_openai_client(
+            api_key=api_key,
+            api_base=api_base,
+            timeout=timeout,
+            max_retries=max_retries,
+            organization=organization,
+            client=client,
+        )
+
+        data = {}
+        if messages is not None:
+            data["messages"] = messages  # type: ignore
+        if metadata is not None:
+            data["metadata"] = metadata  # type: ignore
+
+        message_thread = openai_client.beta.threads.create(**data)  # type: ignore
+
+        return Thread(**message_thread.dict())
+
+    def get_thread(
+        self,
+        thread_id: str,
+        api_key: Optional[str],
+        api_base: Optional[str],
+        timeout: Union[float, httpx.Timeout],
+        max_retries: Optional[int],
+        organization: Optional[str],
+        client: Optional[OpenAI],
+    ) -> Thread:
+        openai_client = self.get_openai_client(
+            api_key=api_key,
+            api_base=api_base,
+            timeout=timeout,
+            max_retries=max_retries,
+            organization=organization,
+            client=client,
+        )
+
+        response = openai_client.beta.threads.retrieve(thread_id=thread_id)
+
+        return Thread(**response.dict())
+
+    def delete_thread(self):
+        pass
+
+    ### RUNS ###
+
+    def run_thread(
+        self,
+        thread_id: str,
+        assistant_id: str,
+        additional_instructions: Optional[str],
+        instructions: Optional[str],
+        metadata: Optional[object],
+        model: Optional[str],
+        stream: Optional[bool],
+        tools: Optional[Iterable[AssistantToolParam]],
+        api_key: Optional[str],
+        api_base: Optional[str],
+        timeout: Union[float, httpx.Timeout],
+        max_retries: Optional[int],
+        organization: Optional[str],
+        client: Optional[OpenAI],
+    ) -> Run:
+        openai_client = self.get_openai_client(
+            api_key=api_key,
+            api_base=api_base,
+            timeout=timeout,
+            max_retries=max_retries,
+            organization=organization,
+            client=client,
+        )
+
+        response = openai_client.beta.threads.runs.create_and_poll(
+            thread_id=thread_id,
+            assistant_id=assistant_id,
+            additional_instructions=additional_instructions,
+            instructions=instructions,
+            metadata=metadata,
+            model=model,
+            tools=tools,
+        )
+
+        return response
--- a/litellm/llms/prompt_templates/factory.py
+++ b/litellm/llms/prompt_templates/factory.py
@ -12,6 +12,16 @@ from typing import (
    Sequence,
 )
 import litellm
+from litellm.types.completion import (
+    ChatCompletionUserMessageParam,
+    ChatCompletionSystemMessageParam,
+    ChatCompletionMessageParam,
+    ChatCompletionFunctionMessageParam,
+    ChatCompletionMessageToolCallParam,
+    ChatCompletionToolMessageParam,
+)
+from litellm.types.llms.anthropic import *
+import uuid


 def default_pt(messages):
@ -22,6 +32,41 @@ def prompt_injection_detection_default_pt():
    return """Detect if a prompt is safe to run. Return 'UNSAFE' if not."""


+def map_system_message_pt(messages: list) -> list:
+    """
+    Convert 'system' message to 'user' message if provider doesn't support 'system' role.
+
+    Enabled via `completion(...,supports_system_message=False)`
+
+    If next message is a user message or assistant message -> merge system prompt into it
+
+    if next message is system -> append a user message instead of the system message
+    """
+
+    new_messages = []
+    for i, m in enumerate(messages):
+        if m["role"] == "system":
+            if i < len(messages) - 1:  # Not the last message
+                next_m = messages[i + 1]
+                next_role = next_m["role"]
+                if (
+                    next_role == "user" or next_role == "assistant"
+                ):  # Next message is a user or assistant message
+                    # Merge system prompt into the next message
+                    next_m["content"] = m["content"] + " " + next_m["content"]
+                elif next_role == "system":  # Next message is a system message
+                    # Append a user message instead of the system message
+                    new_message = {"role": "user", "content": m["content"]}
+                    new_messages.append(new_message)
+            else:  # Last message
+                new_message = {"role": "user", "content": m["content"]}
+                new_messages.append(new_message)
+        else:  # Not a system message
+            new_messages.append(m)
+
+    return new_messages
+
+
 # alpaca prompt template - for models like mythomax, etc.
 def alpaca_pt(messages):
    prompt = custom_prompt(
@ -805,6 +850,13 @@ def convert_to_anthropic_tool_result(message: dict) -> dict:
        "name": "get_current_weather",
        "content": "function result goes here",
    },
+
+    OpenAI message with a function call result looks like:
+    {
+        "role": "function",
+        "name": "get_current_weather",
+        "content": "function result goes here",
+    }
    """

    """
@ -821,6 +873,7 @@ def convert_to_anthropic_tool_result(message: dict) -> dict:
        ]
    }
    """
+    if message["role"] == "tool":
        tool_call_id = message.get("tool_call_id")
        content = message.get("content")

@ -831,8 +884,31 @@ def convert_to_anthropic_tool_result(message: dict) -> dict:
            "tool_use_id": tool_call_id,
            "content": content,
        }
-
        return anthropic_tool_result
+    elif message["role"] == "function":
+        content = message.get("content")
+        anthropic_tool_result = {
+            "type": "tool_result",
+            "tool_use_id": str(uuid.uuid4()),
+            "content": content,
+        }
+        return anthropic_tool_result
+    return {}
+
+
+def convert_function_to_anthropic_tool_invoke(function_call):
+    try:
+        anthropic_tool_invoke = [
+            {
+                "type": "tool_use",
+                "id": str(uuid.uuid4()),
+                "name": get_attribute_or_key(function_call, "name"),
+                "input": json.loads(get_attribute_or_key(function_call, "arguments")),
+            }
+        ]
+        return anthropic_tool_invoke
+    except Exception as e:
+        raise e


 def convert_to_anthropic_tool_invoke(tool_calls: list) -> list:
@ -895,7 +971,7 @@ def convert_to_anthropic_tool_invoke(tool_calls: list) -> list:
 def anthropic_messages_pt(messages: list):
    """
    format messages for anthropic
-    1. Anthropic supports roles like "user" and "assistant", (here litellm translates system-> assistant)
+    1. Anthropic supports roles like "user" and "assistant" (system prompt sent separately)
    2. The first message always needs to be of role "user"
    3. Each message must alternate between "user" and "assistant" (this is not addressed as now by litellm)
    4. final assistant content cannot end with trailing whitespace (anthropic raises an error otherwise)
@ -903,12 +979,14 @@ def anthropic_messages_pt(messages: list):
    6. Ensure we only accept role, content. (message.name is not supported)
    """
    # add role=tool support to allow function call result/error submission
-    user_message_types = {"user", "tool"}
+    user_message_types = {"user", "tool", "function"}
    # reformat messages to ensure user/assistant are alternating, if there's either 2 consecutive 'user' messages or 2 consecutive 'assistant' message, merge them.
    new_messages = []
    msg_i = 0
+    tool_use_param = False
    while msg_i < len(messages):
        user_content = []
+        init_msg_i = msg_i
        ## MERGE CONSECUTIVE USER CONTENT ##
        while msg_i < len(messages) and messages[msg_i]["role"] in user_message_types:
            if isinstance(messages[msg_i]["content"], list):
@ -924,7 +1002,10 @@ def anthropic_messages_pt(messages: list):
                        )
                    elif m.get("type", "") == "text":
                        user_content.append({"type": "text", "text": m["text"]})
-            elif messages[msg_i]["role"] == "tool":
+            elif (
+                messages[msg_i]["role"] == "tool"
+                or messages[msg_i]["role"] == "function"
+            ):
                # OpenAI's tool message content will always be a string
                user_content.append(convert_to_anthropic_tool_result(messages[msg_i]))
            else:
@ -953,11 +1034,24 @@ def anthropic_messages_pt(messages: list):
                    convert_to_anthropic_tool_invoke(messages[msg_i]["tool_calls"])
                )

+            if messages[msg_i].get("function_call"):
+                assistant_content.extend(
+                    convert_function_to_anthropic_tool_invoke(
+                        messages[msg_i]["function_call"]
+                    )
+                )
+
            msg_i += 1

        if assistant_content:
            new_messages.append({"role": "assistant", "content": assistant_content})

+        if msg_i == init_msg_i:  # prevent infinite loops
+            raise Exception(
+                "Invalid Message passed in - {}. File an issue https://github.com/BerriAI/litellm/issues".format(
+                    messages[msg_i]
+                )
+            )
    if not new_messages or new_messages[0]["role"] != "user":
        if litellm.modify_params:
            new_messages.insert(
@ -969,6 +1063,9 @@ def anthropic_messages_pt(messages: list):
            )

    if new_messages[-1]["role"] == "assistant":
+        if isinstance(new_messages[-1]["content"], str):
+            new_messages[-1]["content"] = new_messages[-1]["content"].rstrip()
+        elif isinstance(new_messages[-1]["content"], list):
            for content in new_messages[-1]["content"]:
                if isinstance(content, dict) and content["type"] == "text":
                    content["text"] = content[
--- a/litellm/main.py
+++ b/litellm/main.py
@ -12,9 +12,9 @@ from typing import Any, Literal, Union, BinaryIO
 from functools import partial
 import dotenv, traceback, random, asyncio, time, contextvars
 from copy import deepcopy
+
 import httpx
 import litellm
-
 from ._logging import verbose_logger
 from litellm import (  # type: ignore
    client,
@ -34,9 +34,12 @@ from litellm.utils import (
    async_mock_completion_streaming_obj,
    convert_to_model_response_object,
    token_counter,
+    create_pretrained_tokenizer,
+    create_tokenizer,
    Usage,
    get_optional_params_embeddings,
    get_optional_params_image_gen,
+    supports_httpx_timeout,
 )
 from .llms import (
    anthropic_text,
@ -75,6 +78,7 @@ from .llms.prompt_templates.factory import (
    prompt_factory,
    custom_prompt,
    function_call_prompt,
+    map_system_message_pt,
 )
 import tiktoken
 from concurrent.futures import ThreadPoolExecutor
@ -448,7 +452,7 @@ def completion(
    model: str,
    # Optional OpenAI params: see https://platform.openai.com/docs/api-reference/chat/create
    messages: List = [],
-    timeout: Optional[Union[float, int]] = None,
+    timeout: Optional[Union[float, str, httpx.Timeout]] = None,
    temperature: Optional[float] = None,
    top_p: Optional[float] = None,
    n: Optional[int] = None,
@ -551,6 +555,7 @@ def completion(
    eos_token = kwargs.get("eos_token", None)
    preset_cache_key = kwargs.get("preset_cache_key", None)
    hf_model_name = kwargs.get("hf_model_name", None)
+    supports_system_message = kwargs.get("supports_system_message", None)
    ### TEXT COMPLETION CALLS ###
    text_completion = kwargs.get("text_completion", False)
    atext_completion = kwargs.get("atext_completion", False)
@ -616,6 +621,7 @@ def completion(
        "model_list",
        "num_retries",
        "context_window_fallback_dict",
+        "retry_policy",
        "roles",
        "final_prompt_value",
        "bos_token",
@ -641,16 +647,27 @@ def completion(
        "no-log",
        "base_model",
        "stream_timeout",
+        "supports_system_message",
    ]
    default_params = openai_params + litellm_params
    non_default_params = {
        k: v for k, v in kwargs.items() if k not in default_params
    }  # model-specific params - pass them straight to the model/provider
-    if timeout is None:
-        timeout = (
-            kwargs.get("request_timeout", None) or 600
-        )  # set timeout for 10 minutes by default
-    timeout = float(timeout)
+
+    ### TIMEOUT LOGIC ###
+    timeout = timeout or kwargs.get("request_timeout", 600) or 600
+    # set timeout for 10 minutes by default
+
+    if (
+        timeout is not None
+        and isinstance(timeout, httpx.Timeout)
+        and supports_httpx_timeout(custom_llm_provider) == False
+    ):
+        read_timeout = timeout.read or 600
+        timeout = read_timeout  # default 10 min timeout
+    elif timeout is not None and not isinstance(timeout, httpx.Timeout):
+        timeout = float(timeout)  # type: ignore
+
    try:
        if base_url is not None:
            api_base = base_url
@ -745,6 +762,13 @@ def completion(
                custom_prompt_dict[model]["bos_token"] = bos_token
            if eos_token:
                custom_prompt_dict[model]["eos_token"] = eos_token
+
+        if (
+            supports_system_message is not None
+            and isinstance(supports_system_message, bool)
+            and supports_system_message == False
+        ):
+            messages = map_system_message_pt(messages=messages)
        model_api_key = get_api_key(
            llm_provider=custom_llm_provider, dynamic_api_key=api_key
        )  # get the api key from the environment if required for the model
@ -871,7 +895,7 @@ def completion(
                logger_fn=logger_fn,
                logging_obj=logging,
                acompletion=acompletion,
-                timeout=timeout,
+                timeout=timeout,  # type: ignore
                client=client,  # pass AsyncAzureOpenAI, AzureOpenAI client
            )

@ -1012,7 +1036,7 @@ def completion(
                    optional_params=optional_params,
                    litellm_params=litellm_params,
                    logger_fn=logger_fn,
-                    timeout=timeout,
+                    timeout=timeout,  # type: ignore
                    custom_prompt_dict=custom_prompt_dict,
                    client=client,  # pass AsyncOpenAI, OpenAI client
                    organization=organization,
@ -1097,7 +1121,7 @@ def completion(
                optional_params=optional_params,
                litellm_params=litellm_params,
                logger_fn=logger_fn,
-                timeout=timeout,
+                timeout=timeout,  # type: ignore
            )

            if (
@ -1471,7 +1495,7 @@ def completion(
                acompletion=acompletion,
                logging_obj=logging,
                custom_prompt_dict=custom_prompt_dict,
-                timeout=timeout,
+                timeout=timeout,  # type: ignore
            )
            if (
                "stream" in optional_params
@ -1564,7 +1588,7 @@ def completion(
                logger_fn=logger_fn,
                logging_obj=logging,
                acompletion=acompletion,
-                timeout=timeout,
+                timeout=timeout,  # type: ignore
            )
            ## LOGGING
            logging.post_call(
@ -1892,7 +1916,7 @@ def completion(
                logger_fn=logger_fn,
                encoding=encoding,
                logging_obj=logging,
-                timeout=timeout,
+                timeout=timeout,  # type: ignore
            )
            if (
                "stream" in optional_params
@ -2273,7 +2297,7 @@ def batch_completion(
    n: Optional[int] = None,
    stream: Optional[bool] = None,
    stop=None,
-    max_tokens: Optional[float] = None,
+    max_tokens: Optional[int] = None,
    presence_penalty: Optional[float] = None,
    frequency_penalty: Optional[float] = None,
    logit_bias: Optional[dict] = None,
@ -2666,6 +2690,7 @@ def embedding(
        "model_list",
        "num_retries",
        "context_window_fallback_dict",
+        "retry_policy",
        "roles",
        "final_prompt_value",
        "bos_token",
@ -3535,6 +3560,7 @@ def image_generation(
            "model_list",
            "num_retries",
            "context_window_fallback_dict",
+            "retry_policy",
            "roles",
            "final_prompt_value",
            "bos_token",
--- a/litellm/model_prices_and_context_window_backup.json
+++ b/litellm/model_prices_and_context_window_backup.json
@ -338,6 +338,18 @@
        "output_cost_per_second": 0.0001, 
        "litellm_provider": "azure"
    },
+    "azure/gpt-4-turbo-2024-04-09": {
+        "max_tokens": 4096,
+        "max_input_tokens": 128000,
+        "max_output_tokens": 4096,
+        "input_cost_per_token": 0.00001,
+        "output_cost_per_token": 0.00003,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_vision": true
+    },
    "azure/gpt-4-0125-preview": {
        "max_tokens": 4096,
        "max_input_tokens": 128000,
@ -813,6 +825,7 @@
        "litellm_provider": "anthropic",
        "mode": "chat",
        "supports_function_calling": true,
+        "supports_vision": true,
        "tool_use_system_prompt_tokens": 264
    },
    "claude-3-opus-20240229": {
@ -824,6 +837,7 @@
        "litellm_provider": "anthropic",
        "mode": "chat",
        "supports_function_calling": true,
+        "supports_vision": true,
        "tool_use_system_prompt_tokens": 395
    },
    "claude-3-sonnet-20240229": {
@ -835,6 +849,7 @@
        "litellm_provider": "anthropic",
        "mode": "chat",
        "supports_function_calling": true,
+        "supports_vision": true,
        "tool_use_system_prompt_tokens": 159
    },
    "text-bison": {
@ -1142,7 +1157,8 @@
        "output_cost_per_token": 0.000015,
        "litellm_provider": "vertex_ai-anthropic_models",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "vertex_ai/claude-3-haiku@20240307": {
        "max_tokens": 4096, 
@ -1152,7 +1168,8 @@
        "output_cost_per_token": 0.00000125,
        "litellm_provider": "vertex_ai-anthropic_models",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "vertex_ai/claude-3-opus@20240229": {
        "max_tokens": 4096,
@ -1162,7 +1179,8 @@
        "output_cost_per_token": 0.0000075,
        "litellm_provider": "vertex_ai-anthropic_models",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "textembedding-gecko": {
        "max_tokens": 3072,
@ -1581,6 +1599,7 @@
        "litellm_provider": "openrouter",
        "mode": "chat",
        "supports_function_calling": true,
+        "supports_vision": true,
        "tool_use_system_prompt_tokens": 395
    },
    "openrouter/google/palm-2-chat-bison": {
@ -1813,6 +1832,15 @@
        "litellm_provider": "bedrock", 
        "mode": "embedding"
    },
+    "amazon.titan-embed-text-v2:0": {
+        "max_tokens": 8192, 
+        "max_input_tokens": 8192, 
+        "output_vector_size": 1024,
+        "input_cost_per_token": 0.0000002,
+        "output_cost_per_token": 0.0,
+        "litellm_provider": "bedrock", 
+        "mode": "embedding"
+    },
    "mistral.mistral-7b-instruct-v0:2": {
        "max_tokens": 8191,
        "max_input_tokens": 32000,
@ -1929,7 +1957,8 @@
        "output_cost_per_token": 0.000015,
        "litellm_provider": "bedrock",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "anthropic.claude-3-haiku-20240307-v1:0": {
        "max_tokens": 4096, 
@ -1939,7 +1968,8 @@
        "output_cost_per_token": 0.00000125,
        "litellm_provider": "bedrock",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "anthropic.claude-3-opus-20240229-v1:0": {
        "max_tokens": 4096,
@ -1949,7 +1979,8 @@
        "output_cost_per_token": 0.000075,
        "litellm_provider": "bedrock",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "anthropic.claude-v1": {
        "max_tokens": 8191, 
--- a/litellm/proxy/_experimental/out/404.html
+++ b/litellm/proxy/_experimental/out/404.html
--- a/litellm/proxy/_experimental/out/_next/static/chunks/142-11990a208bf93746.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/142-11990a208bf93746.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/2f6dbc85-17d29013b8ff3da5.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/2f6dbc85-17d29013b8ff3da5.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/386-d811195b597a2122.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/386-d811195b597a2122.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-bdfb585eb82bdab5.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-bdfb585eb82bdab5.js
@ -1 +0,0 @@
-(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{93553:function(n,e,t){Promise.resolve().then(t.t.bind(t,63385,23)),Promise.resolve().then(t.t.bind(t,99646,23))},63385:function(){},99646:function(n){n.exports={style:{fontFamily:"'__Inter_12bbc4', '__Inter_Fallback_12bbc4'",fontStyle:"normal"},className:"__className_12bbc4"}}},function(n){n.O(0,[971,69,744],function(){return n(n.s=93553)}),_N_E=n.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-bf3537d6924e801d.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/layout-bf3537d6924e801d.js
@ -0,0 +1 @@
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{87421:function(n,e,t){Promise.resolve().then(t.t.bind(t,99646,23)),Promise.resolve().then(t.t.bind(t,63385,23))},63385:function(){},99646:function(n){n.exports={style:{fontFamily:"'__Inter_c23dc8', '__Inter_Fallback_c23dc8'",fontStyle:"normal"},className:"__className_c23dc8"}}},function(n){n.O(0,[971,69,744],function(){return n(n.s=87421)}),_N_E=n.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/page-5a4a198eefedc775.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/page-5a4a198eefedc775.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/page-d9bdfedbff191985.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/page-d9bdfedbff191985.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/app/page-e0ee34389254cdf2.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/app/page-e0ee34389254cdf2.js
--- a/litellm/proxy/_experimental/out/_next/static/chunks/main-app-9b4fb13a7db53edf.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/main-app-9b4fb13a7db53edf.js
@ -1 +1 @@
-(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[744],{70377:function(e,n,t){Promise.resolve().then(t.t.bind(t,47690,23)),Promise.resolve().then(t.t.bind(t,48955,23)),Promise.resolve().then(t.t.bind(t,5613,23)),Promise.resolve().then(t.t.bind(t,11902,23)),Promise.resolve().then(t.t.bind(t,31778,23)),Promise.resolve().then(t.t.bind(t,77831,23))}},function(e){var n=function(n){return e(e.s=n)};e.O(0,[971,69],function(){return n(35317),n(70377)}),_N_E=e.O()}]);
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[744],{32028:function(e,n,t){Promise.resolve().then(t.t.bind(t,47690,23)),Promise.resolve().then(t.t.bind(t,48955,23)),Promise.resolve().then(t.t.bind(t,5613,23)),Promise.resolve().then(t.t.bind(t,11902,23)),Promise.resolve().then(t.t.bind(t,31778,23)),Promise.resolve().then(t.t.bind(t,77831,23))}},function(e){var n=function(n){return e(e.s=n)};e.O(0,[971,69],function(){return n(35317),n(32028)}),_N_E=e.O()}]);
--- a/litellm/proxy/_experimental/out/_next/static/chunks/webpack-202e312607f242a1.js
+++ b/litellm/proxy/_experimental/out/_next/static/chunks/webpack-202e312607f242a1.js
@ -1 +1 @@
-!function(){"use strict";var e,t,n,r,o,u,i,c,f,a={},l={};function d(e){var t=l[e];if(void 0!==t)return t.exports;var n=l[e]={id:e,loaded:!1,exports:{}},r=!0;try{a[e](n,n.exports,d),r=!1}finally{r&&delete l[e]}return n.loaded=!0,n.exports}d.m=a,e=[],d.O=function(t,n,r,o){if(n){o=o||0;for(var u=e.length;u>0&&e[u-1][2]>o;u--)e[u]=e[u-1];e[u]=[n,r,o];return}for(var i=1/0,u=0;u<e.length;u++){for(var n=e[u][0],r=e[u][1],o=e[u][2],c=!0,f=0;f<n.length;f++)i>=o&&Object.keys(d.O).every(function(e){return d.O[e](n[f])})?n.splice(f--,1):(c=!1,o<i&&(i=o));if(c){e.splice(u--,1);var a=r();void 0!==a&&(t=a)}}return t},d.n=function(e){var t=e&&e.__esModule?function(){return e.default}:function(){return e};return d.d(t,{a:t}),t},n=Object.getPrototypeOf?function(e){return Object.getPrototypeOf(e)}:function(e){return e.__proto__},d.t=function(e,r){if(1&r&&(e=this(e)),8&r||"object"==typeof e&&e&&(4&r&&e.__esModule||16&r&&"function"==typeof e.then))return e;var o=Object.create(null);d.r(o);var u={};t=t||[null,n({}),n([]),n(n)];for(var i=2&r&&e;"object"==typeof i&&!~t.indexOf(i);i=n(i))Object.getOwnPropertyNames(i).forEach(function(t){u[t]=function(){return e[t]}});return u.default=function(){return e},d.d(o,u),o},d.d=function(e,t){for(var n in t)d.o(t,n)&&!d.o(e,n)&&Object.defineProperty(e,n,{enumerable:!0,get:t[n]})},d.f={},d.e=function(e){return Promise.all(Object.keys(d.f).reduce(function(t,n){return d.f[n](e,t),t},[]))},d.u=function(e){},d.miniCssF=function(e){return"static/css/9f51f0573c6b0365.css"},d.g=function(){if("object"==typeof globalThis)return globalThis;try{return this||Function("return this")()}catch(e){if("object"==typeof window)return window}}(),d.o=function(e,t){return Object.prototype.hasOwnProperty.call(e,t)},r={},o="_N_E:",d.l=function(e,t,n,u){if(r[e]){r[e].push(t);return}if(void 0!==n)for(var i,c,f=document.getElementsByTagName("script"),a=0;a<f.length;a++){var l=f[a];if(l.getAttribute("src")==e||l.getAttribute("data-webpack")==o+n){i=l;break}}i||(c=!0,(i=document.createElement("script")).charset="utf-8",i.timeout=120,d.nc&&i.setAttribute("nonce",d.nc),i.setAttribute("data-webpack",o+n),i.src=d.tu(e)),r[e]=[t];var s=function(t,n){i.onerror=i.onload=null,clearTimeout(p);var o=r[e];if(delete r[e],i.parentNode&&i.parentNode.removeChild(i),o&&o.forEach(function(e){return e(n)}),t)return t(n)},p=setTimeout(s.bind(null,void 0,{type:"timeout",target:i}),12e4);i.onerror=s.bind(null,i.onerror),i.onload=s.bind(null,i.onload),c&&document.head.appendChild(i)},d.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},d.nmd=function(e){return e.paths=[],e.children||(e.children=[]),e},d.tt=function(){return void 0===u&&(u={createScriptURL:function(e){return e}},"undefined"!=typeof trustedTypes&&trustedTypes.createPolicy&&(u=trustedTypes.createPolicy("nextjs#bundler",u))),u},d.tu=function(e){return d.tt().createScriptURL(e)},d.p="/ui/_next/",i={272:0},d.f.j=function(e,t){var n=d.o(i,e)?i[e]:void 0;if(0!==n){if(n)t.push(n[2]);else if(272!=e){var r=new Promise(function(t,r){n=i[e]=[t,r]});t.push(n[2]=r);var o=d.p+d.u(e),u=Error();d.l(o,function(t){if(d.o(i,e)&&(0!==(n=i[e])&&(i[e]=void 0),n)){var r=t&&("load"===t.type?"missing":t.type),o=t&&t.target&&t.target.src;u.message="Loading chunk "+e+" failed.\n("+r+": "+o+")",u.name="ChunkLoadError",u.type=r,u.request=o,n[1](u)}},"chunk-"+e,e)}else i[e]=0}},d.O.j=function(e){return 0===i[e]},c=function(e,t){var n,r,o=t[0],u=t[1],c=t[2],f=0;if(o.some(function(e){return 0!==i[e]})){for(n in u)d.o(u,n)&&(d.m[n]=u[n]);if(c)var a=c(d)}for(e&&e(t);f<o.length;f++)r=o[f],d.o(i,r)&&i[r]&&i[r][0](),i[r]=0;return d.O(a)},(f=self.webpackChunk_N_E=self.webpackChunk_N_E||[]).forEach(c.bind(null,0)),f.push=c.bind(null,f.push.bind(f))}();
+!function(){"use strict";var e,t,n,r,o,u,i,c,f,a={},l={};function d(e){var t=l[e];if(void 0!==t)return t.exports;var n=l[e]={id:e,loaded:!1,exports:{}},r=!0;try{a[e](n,n.exports,d),r=!1}finally{r&&delete l[e]}return n.loaded=!0,n.exports}d.m=a,e=[],d.O=function(t,n,r,o){if(n){o=o||0;for(var u=e.length;u>0&&e[u-1][2]>o;u--)e[u]=e[u-1];e[u]=[n,r,o];return}for(var i=1/0,u=0;u<e.length;u++){for(var n=e[u][0],r=e[u][1],o=e[u][2],c=!0,f=0;f<n.length;f++)i>=o&&Object.keys(d.O).every(function(e){return d.O[e](n[f])})?n.splice(f--,1):(c=!1,o<i&&(i=o));if(c){e.splice(u--,1);var a=r();void 0!==a&&(t=a)}}return t},d.n=function(e){var t=e&&e.__esModule?function(){return e.default}:function(){return e};return d.d(t,{a:t}),t},n=Object.getPrototypeOf?function(e){return Object.getPrototypeOf(e)}:function(e){return e.__proto__},d.t=function(e,r){if(1&r&&(e=this(e)),8&r||"object"==typeof e&&e&&(4&r&&e.__esModule||16&r&&"function"==typeof e.then))return e;var o=Object.create(null);d.r(o);var u={};t=t||[null,n({}),n([]),n(n)];for(var i=2&r&&e;"object"==typeof i&&!~t.indexOf(i);i=n(i))Object.getOwnPropertyNames(i).forEach(function(t){u[t]=function(){return e[t]}});return u.default=function(){return e},d.d(o,u),o},d.d=function(e,t){for(var n in t)d.o(t,n)&&!d.o(e,n)&&Object.defineProperty(e,n,{enumerable:!0,get:t[n]})},d.f={},d.e=function(e){return Promise.all(Object.keys(d.f).reduce(function(t,n){return d.f[n](e,t),t},[]))},d.u=function(e){},d.miniCssF=function(e){return"static/css/00c2ddbcd01819c0.css"},d.g=function(){if("object"==typeof globalThis)return globalThis;try{return this||Function("return this")()}catch(e){if("object"==typeof window)return window}}(),d.o=function(e,t){return Object.prototype.hasOwnProperty.call(e,t)},r={},o="_N_E:",d.l=function(e,t,n,u){if(r[e]){r[e].push(t);return}if(void 0!==n)for(var i,c,f=document.getElementsByTagName("script"),a=0;a<f.length;a++){var l=f[a];if(l.getAttribute("src")==e||l.getAttribute("data-webpack")==o+n){i=l;break}}i||(c=!0,(i=document.createElement("script")).charset="utf-8",i.timeout=120,d.nc&&i.setAttribute("nonce",d.nc),i.setAttribute("data-webpack",o+n),i.src=d.tu(e)),r[e]=[t];var s=function(t,n){i.onerror=i.onload=null,clearTimeout(p);var o=r[e];if(delete r[e],i.parentNode&&i.parentNode.removeChild(i),o&&o.forEach(function(e){return e(n)}),t)return t(n)},p=setTimeout(s.bind(null,void 0,{type:"timeout",target:i}),12e4);i.onerror=s.bind(null,i.onerror),i.onload=s.bind(null,i.onload),c&&document.head.appendChild(i)},d.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},d.nmd=function(e){return e.paths=[],e.children||(e.children=[]),e},d.tt=function(){return void 0===u&&(u={createScriptURL:function(e){return e}},"undefined"!=typeof trustedTypes&&trustedTypes.createPolicy&&(u=trustedTypes.createPolicy("nextjs#bundler",u))),u},d.tu=function(e){return d.tt().createScriptURL(e)},d.p="/ui/_next/",i={272:0},d.f.j=function(e,t){var n=d.o(i,e)?i[e]:void 0;if(0!==n){if(n)t.push(n[2]);else if(272!=e){var r=new Promise(function(t,r){n=i[e]=[t,r]});t.push(n[2]=r);var o=d.p+d.u(e),u=Error();d.l(o,function(t){if(d.o(i,e)&&(0!==(n=i[e])&&(i[e]=void 0),n)){var r=t&&("load"===t.type?"missing":t.type),o=t&&t.target&&t.target.src;u.message="Loading chunk "+e+" failed.\n("+r+": "+o+")",u.name="ChunkLoadError",u.type=r,u.request=o,n[1](u)}},"chunk-"+e,e)}else i[e]=0}},d.O.j=function(e){return 0===i[e]},c=function(e,t){var n,r,o=t[0],u=t[1],c=t[2],f=0;if(o.some(function(e){return 0!==i[e]})){for(n in u)d.o(u,n)&&(d.m[n]=u[n]);if(c)var a=c(d)}for(e&&e(t);f<o.length;f++)r=o[f],d.o(i,r)&&i[r]&&i[r][0](),i[r]=0;return d.O(a)},(f=self.webpackChunk_N_E=self.webpackChunk_N_E||[]).forEach(c.bind(null,0)),f.push=c.bind(null,f.push.bind(f))}();
--- a/litellm/proxy/_experimental/out/_next/static/css/00c2ddbcd01819c0.css
+++ b/litellm/proxy/_experimental/out/_next/static/css/00c2ddbcd01819c0.css
--- a/litellm/proxy/_experimental/out/_next/static/css/9f51f0573c6b0365.css
+++ b/litellm/proxy/_experimental/out/_next/static/css/9f51f0573c6b0365.css
--- a/litellm/proxy/_experimental/out/_next/static/e55gTzpa2g2-9SwXgA9Uo/_buildManifest.js
+++ b/litellm/proxy/_experimental/out/_next/static/e55gTzpa2g2-9SwXgA9Uo/_buildManifest.js
--- a/litellm/proxy/_experimental/out/_next/static/e55gTzpa2g2-9SwXgA9Uo/_ssgManifest.js
+++ b/litellm/proxy/_experimental/out/_next/static/e55gTzpa2g2-9SwXgA9Uo/_ssgManifest.js
--- a/litellm/proxy/_experimental/out/index.html
+++ b/litellm/proxy/_experimental/out/index.html
@ -1,5 +1 @@
-<<<<<<< HEAD
-<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-202e312607f242a1.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-dafd44dfa2da140c.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-e49705773ae41779.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-9b4fb13a7db53edf.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-202e312607f242a1.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/c9a5bc6a7c948fb0-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/00c2ddbcd01819c0.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[46414,[\"761\",\"static/chunks/761-05f8a8451296476c.js\",\"931\",\"static/chunks/app/page-5a4a198eefedc775.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/00c2ddbcd01819c0.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"c5rha8cqAah-saaczjn02\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_c23dc8\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
-=======
-<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-65a932b4e8bd8abb.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-dafd44dfa2da140c.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-e49705773ae41779.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-096338c8e1915716.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-65a932b4e8bd8abb.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/c9a5bc6a7c948fb0-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/9f51f0573c6b0365.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[46414,[\"386\",\"static/chunks/386-d811195b597a2122.js\",\"931\",\"static/chunks/app/page-e0ee34389254cdf2.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/9f51f0573c6b0365.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"dWGL92c5LzTMn7XX6utn2\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_12bbc4\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
->>>>>>> 73a7b4f4 (refactor(main.py): trigger new build)
+<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-202e312607f242a1.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-dafd44dfa2da140c.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-e49705773ae41779.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-9b4fb13a7db53edf.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-202e312607f242a1.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/c9a5bc6a7c948fb0-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/00c2ddbcd01819c0.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[58854,[\"936\",\"static/chunks/2f6dbc85-17d29013b8ff3da5.js\",\"142\",\"static/chunks/142-11990a208bf93746.js\",\"931\",\"static/chunks/app/page-d9bdfedbff191985.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/00c2ddbcd01819c0.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"e55gTzpa2g2-9SwXgA9Uo\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_c23dc8\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
--- a/litellm/proxy/_experimental/out/index.txt
+++ b/litellm/proxy/_experimental/out/index.txt
@ -1,14 +1,7 @@
 2:I[77831,[],""]
-<<<<<<< HEAD
-3:I[46414,["761","static/chunks/761-05f8a8451296476c.js","931","static/chunks/app/page-5a4a198eefedc775.js"],""]
+3:I[58854,["936","static/chunks/2f6dbc85-17d29013b8ff3da5.js","142","static/chunks/142-11990a208bf93746.js","931","static/chunks/app/page-d9bdfedbff191985.js"],""]
 4:I[5613,[],""]
 5:I[31778,[],""]
-0:["c5rha8cqAah-saaczjn02",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_c23dc8","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/00c2ddbcd01819c0.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
-=======
-3:I[46414,["386","static/chunks/386-d811195b597a2122.js","931","static/chunks/app/page-e0ee34389254cdf2.js"],""]
-4:I[5613,[],""]
-5:I[31778,[],""]
-0:["dWGL92c5LzTMn7XX6utn2",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_12bbc4","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/9f51f0573c6b0365.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
->>>>>>> 73a7b4f4 (refactor(main.py): trigger new build)
+0:["e55gTzpa2g2-9SwXgA9Uo",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_c23dc8","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/00c2ddbcd01819c0.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/ui/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","meta","5",{"name":"next-size-adjust"}]]
 1:null
--- a/litellm/proxy/_super_secret_config.yaml
+++ b/litellm/proxy/_super_secret_config.yaml
@ -11,5 +11,12 @@ router_settings:
  redis_password: os.environ/REDIS_PASSWORD
  redis_port: os.environ/REDIS_PORT

+router_settings:
+  routing_strategy: "latency-based-routing"
+
 litellm_settings:
  success_callback: ["openmeter"]
+
+general_settings:
+  alerting: ["slack"]
+  alert_types: ["llm_exceptions"]
--- a/litellm/proxy/proxy_server.py
+++ b/litellm/proxy/proxy_server.py
@ -3446,172 +3446,6 @@ def model_list(
    )


-@router.post(
-    "/v1/completions", dependencies=[Depends(user_api_key_auth)], tags=["completions"]
-)
-@router.post(
-    "/completions", dependencies=[Depends(user_api_key_auth)], tags=["completions"]
-)
-@router.post(
-    "/engines/{model:path}/completions",
-    dependencies=[Depends(user_api_key_auth)],
-    tags=["completions"],
-)
-@router.post(
-    "/openai/deployments/{model:path}/completions",
-    dependencies=[Depends(user_api_key_auth)],
-    tags=["completions"],
-)
-async def completion(
-    request: Request,
-    fastapi_response: Response,
-    model: Optional[str] = None,
-    user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
-):
-    global user_temperature, user_request_timeout, user_max_tokens, user_api_base
-    try:
-        body = await request.body()
-        body_str = body.decode()
-        try:
-            data = ast.literal_eval(body_str)
-        except:
-            data = json.loads(body_str)
-
-        data["user"] = data.get("user", user_api_key_dict.user_id)
-        data["model"] = (
-            general_settings.get("completion_model", None)  # server default
-            or user_model  # model name passed via cli args
-            or model  # for azure deployments
-            or data["model"]  # default passed in http request
-        )
-        if user_model:
-            data["model"] = user_model
-        if "metadata" not in data:
-            data["metadata"] = {}
-        data["metadata"]["user_api_key"] = user_api_key_dict.api_key
-        data["metadata"]["user_api_key_metadata"] = user_api_key_dict.metadata
-        data["metadata"]["user_api_key_alias"] = getattr(
-            user_api_key_dict, "key_alias", None
-        )
-        data["metadata"]["user_api_key_user_id"] = user_api_key_dict.user_id
-        data["metadata"]["user_api_key_team_id"] = getattr(
-            user_api_key_dict, "team_id", None
-        )
-        data["metadata"]["user_api_key_team_alias"] = getattr(
-            user_api_key_dict, "team_alias", None
-        )
-        _headers = dict(request.headers)
-        _headers.pop(
-            "authorization", None
-        )  # do not store the original `sk-..` api key in the db
-        data["metadata"]["headers"] = _headers
-        data["metadata"]["endpoint"] = str(request.url)
-
-        # override with user settings, these are params passed via cli
-        if user_temperature:
-            data["temperature"] = user_temperature
-        if user_request_timeout:
-            data["request_timeout"] = user_request_timeout
-        if user_max_tokens:
-            data["max_tokens"] = user_max_tokens
-        if user_api_base:
-            data["api_base"] = user_api_base
-
-        ### MODEL ALIAS MAPPING ###
-        # check if model name in model alias map
-        # get the actual model name
-        if data["model"] in litellm.model_alias_map:
-            data["model"] = litellm.model_alias_map[data["model"]]
-
-        ### CALL HOOKS ### - modify incoming data before calling the model
-        data = await proxy_logging_obj.pre_call_hook(
-            user_api_key_dict=user_api_key_dict, data=data, call_type="completion"
-        )
-
-        ### ROUTE THE REQUESTs ###
-        router_model_names = llm_router.model_names if llm_router is not None else []
-        # skip router if user passed their key
-        if "api_key" in data:
-            response = await litellm.atext_completion(**data)
-        elif (
-            llm_router is not None and data["model"] in router_model_names
-        ):  # model in router model list
-            response = await llm_router.atext_completion(**data)
-        elif (
-            llm_router is not None
-            and llm_router.model_group_alias is not None
-            and data["model"] in llm_router.model_group_alias
-        ):  # model set in model_group_alias
-            response = await llm_router.atext_completion(**data)
-        elif (
-            llm_router is not None and data["model"] in llm_router.deployment_names
-        ):  # model in router deployments, calling a specific deployment on the router
-            response = await llm_router.atext_completion(
-                **data, specific_deployment=True
-            )
-        elif (
-            llm_router is not None
-            and data["model"] not in router_model_names
-            and llm_router.default_deployment is not None
-        ):  # model in router deployments, calling a specific deployment on the router
-            response = await llm_router.atext_completion(**data)
-        elif user_model is not None:  # `litellm --model <your-model-name>`
-            response = await litellm.atext_completion(**data)
-        else:
-            raise HTTPException(
-                status_code=status.HTTP_400_BAD_REQUEST,
-                detail={
-                    "error": "Invalid model name passed in model="
-                    + data.get("model", "")
-                },
-            )
-
-        if hasattr(response, "_hidden_params"):
-            model_id = response._hidden_params.get("model_id", None) or ""
-            original_response = (
-                response._hidden_params.get("original_response", None) or ""
-            )
-        else:
-            model_id = ""
-            original_response = ""
-
-        verbose_proxy_logger.debug("final response: %s", response)
-        if (
-            "stream" in data and data["stream"] == True
-        ):  # use generate_responses to stream responses
-            custom_headers = {
-                "x-litellm-model-id": model_id,
-            }
-            selected_data_generator = select_data_generator(
-                response=response, user_api_key_dict=user_api_key_dict
-            )
-
-            return StreamingResponse(
-                selected_data_generator,
-                media_type="text/event-stream",
-                headers=custom_headers,
-            )
-
-        fastapi_response.headers["x-litellm-model-id"] = model_id
-        return response
-    except Exception as e:
-        data["litellm_status"] = "fail"  # used for alerting
-        verbose_proxy_logger.debug("EXCEPTION RAISED IN PROXY MAIN.PY")
-        verbose_proxy_logger.debug(
-            "\033[1;31mAn error occurred: %s\n\n Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`",
-            e,
-        )
-        traceback.print_exc()
-        error_traceback = traceback.format_exc()
-        error_msg = f"{str(e)}"
-        raise ProxyException(
-            message=getattr(e, "message", error_msg),
-            type=getattr(e, "type", "None"),
-            param=getattr(e, "param", "None"),
-            code=getattr(e, "status_code", 500),
-        )
-
-
@router.post(
    "/v1/chat/completions",
    dependencies=[Depends(user_api_key_auth)],
@ -3810,7 +3644,7 @@ async def chat_completion(
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail={
-                    "error": "Invalid model name passed in model="
+                    "error": "chat_completion: Invalid model name passed in model="
                    + data.get("model", "")
                },
            )
@ -3824,6 +3658,7 @@ async def chat_completion(
        hidden_params = getattr(response, "_hidden_params", {}) or {}
        model_id = hidden_params.get("model_id", None) or ""
        cache_key = hidden_params.get("cache_key", None) or ""
+        api_base = hidden_params.get("api_base", None) or ""

        # Post Call Processing
        if llm_router is not None:
@ -3836,6 +3671,7 @@ async def chat_completion(
            custom_headers = {
                "x-litellm-model-id": model_id,
                "x-litellm-cache-key": cache_key,
+                "x-litellm-model-api-base": api_base,
            }
            selected_data_generator = select_data_generator(
                response=response, user_api_key_dict=user_api_key_dict
@ -3848,6 +3684,7 @@ async def chat_completion(

        fastapi_response.headers["x-litellm-model-id"] = model_id
        fastapi_response.headers["x-litellm-cache-key"] = cache_key
+        fastapi_response.headers["x-litellm-model-api-base"] = api_base

        ### CALL HOOKS ### - modify outgoing data
        response = await proxy_logging_obj.post_call_success_hook(
@ -3884,6 +3721,172 @@ async def chat_completion(
        )


+@router.post(
+    "/v1/completions", dependencies=[Depends(user_api_key_auth)], tags=["completions"]
+)
+@router.post(
+    "/completions", dependencies=[Depends(user_api_key_auth)], tags=["completions"]
+)
+@router.post(
+    "/engines/{model:path}/completions",
+    dependencies=[Depends(user_api_key_auth)],
+    tags=["completions"],
+)
+@router.post(
+    "/openai/deployments/{model:path}/completions",
+    dependencies=[Depends(user_api_key_auth)],
+    tags=["completions"],
+)
+async def completion(
+    request: Request,
+    fastapi_response: Response,
+    model: Optional[str] = None,
+    user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
+):
+    global user_temperature, user_request_timeout, user_max_tokens, user_api_base
+    try:
+        body = await request.body()
+        body_str = body.decode()
+        try:
+            data = ast.literal_eval(body_str)
+        except:
+            data = json.loads(body_str)
+
+        data["user"] = data.get("user", user_api_key_dict.user_id)
+        data["model"] = (
+            general_settings.get("completion_model", None)  # server default
+            or user_model  # model name passed via cli args
+            or model  # for azure deployments
+            or data["model"]  # default passed in http request
+        )
+        if user_model:
+            data["model"] = user_model
+        if "metadata" not in data:
+            data["metadata"] = {}
+        data["metadata"]["user_api_key"] = user_api_key_dict.api_key
+        data["metadata"]["user_api_key_metadata"] = user_api_key_dict.metadata
+        data["metadata"]["user_api_key_alias"] = getattr(
+            user_api_key_dict, "key_alias", None
+        )
+        data["metadata"]["user_api_key_user_id"] = user_api_key_dict.user_id
+        data["metadata"]["user_api_key_team_id"] = getattr(
+            user_api_key_dict, "team_id", None
+        )
+        data["metadata"]["user_api_key_team_alias"] = getattr(
+            user_api_key_dict, "team_alias", None
+        )
+        _headers = dict(request.headers)
+        _headers.pop(
+            "authorization", None
+        )  # do not store the original `sk-..` api key in the db
+        data["metadata"]["headers"] = _headers
+        data["metadata"]["endpoint"] = str(request.url)
+
+        # override with user settings, these are params passed via cli
+        if user_temperature:
+            data["temperature"] = user_temperature
+        if user_request_timeout:
+            data["request_timeout"] = user_request_timeout
+        if user_max_tokens:
+            data["max_tokens"] = user_max_tokens
+        if user_api_base:
+            data["api_base"] = user_api_base
+
+        ### MODEL ALIAS MAPPING ###
+        # check if model name in model alias map
+        # get the actual model name
+        if data["model"] in litellm.model_alias_map:
+            data["model"] = litellm.model_alias_map[data["model"]]
+
+        ### CALL HOOKS ### - modify incoming data before calling the model
+        data = await proxy_logging_obj.pre_call_hook(
+            user_api_key_dict=user_api_key_dict, data=data, call_type="completion"
+        )
+
+        ### ROUTE THE REQUESTs ###
+        router_model_names = llm_router.model_names if llm_router is not None else []
+        # skip router if user passed their key
+        if "api_key" in data:
+            response = await litellm.atext_completion(**data)
+        elif (
+            llm_router is not None and data["model"] in router_model_names
+        ):  # model in router model list
+            response = await llm_router.atext_completion(**data)
+        elif (
+            llm_router is not None
+            and llm_router.model_group_alias is not None
+            and data["model"] in llm_router.model_group_alias
+        ):  # model set in model_group_alias
+            response = await llm_router.atext_completion(**data)
+        elif (
+            llm_router is not None and data["model"] in llm_router.deployment_names
+        ):  # model in router deployments, calling a specific deployment on the router
+            response = await llm_router.atext_completion(
+                **data, specific_deployment=True
+            )
+        elif (
+            llm_router is not None
+            and data["model"] not in router_model_names
+            and llm_router.default_deployment is not None
+        ):  # model in router deployments, calling a specific deployment on the router
+            response = await llm_router.atext_completion(**data)
+        elif user_model is not None:  # `litellm --model <your-model-name>`
+            response = await litellm.atext_completion(**data)
+        else:
+            raise HTTPException(
+                status_code=status.HTTP_400_BAD_REQUEST,
+                detail={
+                    "error": "completion: Invalid model name passed in model="
+                    + data.get("model", "")
+                },
+            )
+
+        if hasattr(response, "_hidden_params"):
+            model_id = response._hidden_params.get("model_id", None) or ""
+            original_response = (
+                response._hidden_params.get("original_response", None) or ""
+            )
+        else:
+            model_id = ""
+            original_response = ""
+
+        verbose_proxy_logger.debug("final response: %s", response)
+        if (
+            "stream" in data and data["stream"] == True
+        ):  # use generate_responses to stream responses
+            custom_headers = {
+                "x-litellm-model-id": model_id,
+            }
+            selected_data_generator = select_data_generator(
+                response=response, user_api_key_dict=user_api_key_dict
+            )
+
+            return StreamingResponse(
+                selected_data_generator,
+                media_type="text/event-stream",
+                headers=custom_headers,
+            )
+
+        fastapi_response.headers["x-litellm-model-id"] = model_id
+        return response
+    except Exception as e:
+        data["litellm_status"] = "fail"  # used for alerting
+        verbose_proxy_logger.debug("EXCEPTION RAISED IN PROXY MAIN.PY")
+        verbose_proxy_logger.debug(
+            "\033[1;31mAn error occurred: %s\n\n Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`",
+            e,
+        )
+        traceback.print_exc()
+        error_traceback = traceback.format_exc()
+        error_msg = f"{str(e)}"
+        raise ProxyException(
+            message=getattr(e, "message", error_msg),
+            type=getattr(e, "type", "None"),
+            param=getattr(e, "param", "None"),
+            code=getattr(e, "status_code", 500),
+        )
+
+
@router.post(
    "/v1/embeddings",
    dependencies=[Depends(user_api_key_auth)],
@ -4041,7 +4044,7 @@ async def embeddings(
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail={
-                    "error": "Invalid model name passed in model="
+                    "error": "embeddings: Invalid model name passed in model="
                    + data.get("model", "")
                },
            )
@ -4197,7 +4200,7 @@ async def image_generation(
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail={
-                    "error": "Invalid model name passed in model="
+                    "error": "image_generation: Invalid model name passed in model="
                    + data.get("model", "")
                },
            )
@ -4372,7 +4375,7 @@ async def audio_transcriptions(
                    raise HTTPException(
                        status_code=status.HTTP_400_BAD_REQUEST,
                        detail={
-                            "error": "Invalid model name passed in model="
+                            "error": "audio_transcriptions: Invalid model name passed in model="
                            + data.get("model", "")
                        },
                    )
@ -4538,7 +4541,7 @@ async def moderations(
            raise HTTPException(
                status_code=status.HTTP_400_BAD_REQUEST,
                detail={
-                    "error": "Invalid model name passed in model="
+                    "error": "moderations: Invalid model name passed in model="
                    + data.get("model", "")
                },
            )
@ -7549,7 +7552,7 @@ async def model_metrics(
        FROM
            "LiteLLM_SpendLogs"
        WHERE
-            "startTime" >= NOW() - INTERVAL '30 days'
+            "startTime" BETWEEN $2::timestamp AND $3::timestamp
            AND "model" = $1 AND "cache_hit" != 'True'
        GROUP BY
            api_base,
@ -7650,6 +7653,8 @@ FROM
 WHERE
    "model" = $2
    AND "cache_hit" != 'True'
+    AND "startTime" >= $3::timestamp
+    AND "startTime" <= $4::timestamp
 GROUP BY
    api_base
 ORDER BY
@ -7657,7 +7662,7 @@ ORDER BY
    """

    db_response = await prisma_client.db.query_raw(
-        sql_query, alerting_threshold, _selected_model_group
+        sql_query, alerting_threshold, _selected_model_group, startTime, endTime
    )

    if db_response is not None:
@ -7703,7 +7708,7 @@ async def model_metrics_exceptions(
                exception_type,
                COUNT(*) AS num_exceptions
            FROM "LiteLLM_ErrorLogs"
-            WHERE "startTime" >= $1::timestamp AND "endTime" <= $2::timestamp
+            WHERE "startTime" >= $1::timestamp AND "endTime" <= $2::timestamp AND model_group = $3
            GROUP BY combined_model_api_base, exception_type
        )
        SELECT 
@ -7715,7 +7720,9 @@ async def model_metrics_exceptions(
        ORDER BY total_exceptions DESC
        LIMIT 200;
    """
-    db_response = await prisma_client.db.query_raw(sql_query, startTime, endTime)
+    db_response = await prisma_client.db.query_raw(
+        sql_query, startTime, endTime, _selected_model_group
+    )
    response: List[dict] = []
    exception_types = set()

@ -8708,11 +8715,11 @@ async def update_config(config_info: ConfigYAML):
                # overwrite existing settings with updated values
                if k == "alert_to_webhook_url":
                    # check if slack is already enabled. if not, enable it
-                    if "slack" not in _existing_settings:
                    if "alerting" not in _existing_settings:
                        _existing_settings["alerting"] = ["slack"]
                    elif isinstance(_existing_settings["alerting"], list):
-                            _existing_settings["alerting"].append("slack")
+                        if "slack" not in _existing_settings["alerting"]:
+                            _existing_settings["alerting"] = ["slack"]
                _existing_settings[k] = v
            config["general_settings"] = _existing_settings

@ -9197,6 +9204,62 @@ def _db_health_readiness_check():
    return db_health_cache


+@router.get(
+    "/active/callbacks",
+    tags=["health"],
+    dependencies=[Depends(user_api_key_auth)],
+)
+async def active_callbacks():
+    """
+    Returns a list of active callbacks on litellm.callbacks, litellm.input_callback, litellm.failure_callback, litellm.success_callback
+    """
+    global proxy_logging_obj
+    _alerting = str(general_settings.get("alerting"))
+    # get success callback
+    success_callback_names = []
+    try:
+        # this was returning a JSON of the values in some of the callbacks
+        # all we need is the callback name, hence we do str(callback)
+        success_callback_names = [str(x) for x in litellm.success_callback]
+    except:
+        # don't let this block the /health/readiness response, if we can't convert to str -> return litellm.success_callback
+        success_callback_names = litellm.success_callback
+
+    _num_callbacks = (
+        len(litellm.callbacks)
+        + len(litellm.input_callback)
+        + len(litellm.failure_callback)
+        + len(litellm.success_callback)
+        + len(litellm._async_failure_callback)
+        + len(litellm._async_success_callback)
+        + len(litellm._async_input_callback)
+    )
+
+    alerting = proxy_logging_obj.alerting
+    _num_alerting = 0
+    if alerting and isinstance(alerting, list):
+        _num_alerting = len(alerting)
+
+    return {
+        "alerting": _alerting,
+        "litellm.callbacks": [str(x) for x in litellm.callbacks],
+        "litellm.input_callback": [str(x) for x in litellm.input_callback],
+        "litellm.failure_callback": [str(x) for x in litellm.failure_callback],
+        "litellm.success_callback": [str(x) for x in litellm.success_callback],
+        "litellm._async_success_callback": [
+            str(x) for x in litellm._async_success_callback
+        ],
+        "litellm._async_failure_callback": [
+            str(x) for x in litellm._async_failure_callback
+        ],
+        "litellm._async_input_callback": [
+            str(x) for x in litellm._async_input_callback
+        ],
+        "num_callbacks": _num_callbacks,
+        "num_alerting": _num_alerting,
+    }
+
+
@router.get(
    "/health/readiness",
    tags=["health"],
@ -9206,9 +9269,11 @@ async def health_readiness():
    """
    Unprotected endpoint for checking if worker can receive requests
    """
+    global general_settings
    try:
        # get success callback
        success_callback_names = []
+
        try:
            # this was returning a JSON of the values in some of the callbacks
            # all we need is the callback name, hence we do str(callback)
@ -9236,7 +9301,6 @@ async def health_readiness():
        # check DB
        if prisma_client is not None:  # if db passed in, check if it's connected
            db_health_status = _db_health_readiness_check()
-
            return {
                "status": "healthy",
                "db": "connected",
--- a/litellm/proxy/utils.py
+++ b/litellm/proxy/utils.py
@ -387,8 +387,14 @@ class ProxyLogging:
        """

        ### ALERTING ###
-        if "llm_exceptions" not in self.alert_types:
-            return
+        if "llm_exceptions" in self.alert_types and not isinstance(
+            original_exception, HTTPException
+        ):
+            """
+            Just alert on LLM API exceptions. Do not alert on user errors
+
+            Related issue - https://github.com/BerriAI/litellm/issues/3395
+            """
            asyncio.create_task(
                self.alerting_handler(
                    message=f"LLM API call failed: {str(original_exception)}",
@ -679,8 +685,8 @@ class PrismaClient:
    @backoff.on_exception(
        backoff.expo,
        Exception,  # base exception to catch for the backoff
-        max_tries=3,  # maximum number of retries
-        max_time=10,  # maximum total time to retry for
+        max_tries=1,  # maximum number of retries
+        max_time=2,  # maximum total time to retry for
        on_backoff=on_backoff,  # specifying the function to call on backoff
    )
    async def get_generic_data(
@ -718,7 +724,8 @@ class PrismaClient:
            import traceback

            error_msg = f"LiteLLM Prisma Client Exception get_generic_data: {str(e)}"
-            print_verbose(error_msg)
+            verbose_proxy_logger.error(error_msg)
+            error_msg = error_msg + "\nException Type: {}".format(type(e))
            error_traceback = error_msg + "\n" + traceback.format_exc()
            end_time = time.time()
            _duration = end_time - start_time
--- a/litellm/router.py
+++ b/litellm/router.py
@ -42,6 +42,7 @@ from litellm.types.router import (
    RouterErrors,
    updateDeployment,
    updateLiteLLMParams,
+    RetryPolicy,
 )
 from litellm.integrations.custom_logger import CustomLogger

@ -82,6 +83,12 @@ class Router:
        model_group_alias: Optional[dict] = {},
        enable_pre_call_checks: bool = False,
        retry_after: int = 0,  # min time to wait before retrying a failed request
+        retry_policy: Optional[
+            RetryPolicy
+        ] = None,  # set custom retries for different exceptions
+        model_group_retry_policy: Optional[
+            Dict[str, RetryPolicy]
+        ] = {},  # set custom retry policies based on model group
        allowed_fails: Optional[
            int
        ] = None,  # Number of times a deployment can failbefore being added to cooldown
@ -303,6 +310,10 @@ class Router:
            f"Intialized router with Routing strategy: {self.routing_strategy}\n\nRouting fallbacks: {self.fallbacks}\n\nRouting context window fallbacks: {self.context_window_fallbacks}\n\nRouter Redis Caching={self.cache.redis_cache}"
        )  # noqa
        self.routing_strategy_args = routing_strategy_args
+        self.retry_policy: Optional[RetryPolicy] = retry_policy
+        self.model_group_retry_policy: Optional[Dict[str, RetryPolicy]] = (
+            model_group_retry_policy
+        )

    def routing_strategy_init(self, routing_strategy: str, routing_strategy_args: dict):
        if routing_strategy == "least-busy":
@ -375,7 +386,9 @@ class Router:
        except Exception as e:
            raise e

-    def _completion(self, model: str, messages: List[Dict[str, str]], **kwargs):
+    def _completion(
+        self, model: str, messages: List[Dict[str, str]], **kwargs
+    ) -> Union[ModelResponse, CustomStreamWrapper]:
        model_name = None
        try:
            # pick the one that is available (lowest TPM/RPM)
@ -438,7 +451,9 @@ class Router:
            )
            raise e

-    async def acompletion(self, model: str, messages: List[Dict[str, str]], **kwargs):
+    async def acompletion(
+        self, model: str, messages: List[Dict[str, str]], **kwargs
+    ) -> Union[ModelResponse, CustomStreamWrapper]:
        try:
            kwargs["model"] = model
            kwargs["messages"] = messages
@ -454,7 +469,9 @@ class Router:
        except Exception as e:
            raise e

-    async def _acompletion(self, model: str, messages: List[Dict[str, str]], **kwargs):
+    async def _acompletion(
+        self, model: str, messages: List[Dict[str, str]], **kwargs
+    ) -> Union[ModelResponse, CustomStreamWrapper]:
        """
        - Get an available deployment
        - call it with a semaphore over the call
@ -1455,48 +1472,24 @@ class Router:
            ):
                raise original_exception
            ### RETRY
-            #### check if it should retry + back-off if required
-            # if "No models available" in str(
-            #     e
-            # ) or RouterErrors.no_deployments_available.value in str(e):
-            #     timeout = litellm._calculate_retry_after(
-            #         remaining_retries=num_retries,
-            #         max_retries=num_retries,
-            #         min_timeout=self.retry_after,
-            #     )
-            #     await asyncio.sleep(timeout)
-            # elif RouterErrors.user_defined_ratelimit_error.value in str(e):
-            #     raise e  # don't wait to retry if deployment hits user-defined rate-limit

-            # elif hasattr(original_exception, "status_code") and litellm._should_retry(
-            #     status_code=original_exception.status_code
-            # ):
-            #     if hasattr(original_exception, "response") and hasattr(
-            #         original_exception.response, "headers"
-            #     ):
-            #         timeout = litellm._calculate_retry_after(
-            #             remaining_retries=num_retries,
-            #             max_retries=num_retries,
-            #             response_headers=original_exception.response.headers,
-            #             min_timeout=self.retry_after,
-            #         )
-            #     else:
-            #         timeout = litellm._calculate_retry_after(
-            #             remaining_retries=num_retries,
-            #             max_retries=num_retries,
-            #             min_timeout=self.retry_after,
-            #         )
-            #     await asyncio.sleep(timeout)
-            # else:
-            #     raise original_exception
-
-            ### RETRY
            _timeout = self._router_should_retry(
                e=original_exception,
                remaining_retries=num_retries,
                num_retries=num_retries,
            )
            await asyncio.sleep(_timeout)
+
+            if (
+                self.retry_policy is not None
+                or self.model_group_retry_policy is not None
+            ):
+                # get num_retries from retry policy
+                _retry_policy_retries = self.get_num_retries_from_retry_policy(
+                    exception=original_exception, model_group=kwargs.get("model")
+                )
+                if _retry_policy_retries is not None:
+                    num_retries = _retry_policy_retries
            ## LOGGING
            if num_retries > 0:
                kwargs = self.log_retry(kwargs=kwargs, e=original_exception)
@ -1524,6 +1517,10 @@ class Router:
                        num_retries=num_retries,
                    )
                    await asyncio.sleep(_timeout)
+            try:
+                original_exception.message += f"\nNumber Retries = {current_attempt}"
+            except:
+                pass
            raise original_exception

    def function_with_fallbacks(self, *args, **kwargs):
@ -2590,6 +2587,16 @@ class Router:
                    return model
        return None

+    def get_model_info(self, id: str) -> Optional[dict]:
+        """
+        For a given model id, return the model info
+        """
+        for model in self.model_list:
+            if "model_info" in model and "id" in model["model_info"]:
+                if id == model["model_info"]["id"]:
+                    return model
+        return None
+
    def get_model_ids(self):
        ids = []
        for model in self.model_list:
@ -2659,13 +2666,18 @@ class Router:
            "cooldown_time",
        ]

+        _existing_router_settings = self.get_settings()
        for var in kwargs:
            if var in _allowed_settings:
                if var in _int_settings:
                    _casted_value = int(kwargs[var])
                    setattr(self, var, _casted_value)
                else:
-                    if var == "routing_strategy":
+                    # only run routing strategy init if it has changed
+                    if (
+                        var == "routing_strategy"
+                        and _existing_router_settings["routing_strategy"] != kwargs[var]
+                    ):
                        self.routing_strategy_init(
                            routing_strategy=kwargs[var],
                            routing_strategy_args=kwargs.get(
@ -2904,15 +2916,10 @@ class Router:
                m for m in self.model_list if m["litellm_params"]["model"] == model
            ]

-        verbose_router_logger.debug(
-            f"initial list of deployments: {healthy_deployments}"
-        )
+        litellm.print_verbose(f"initial list of deployments: {healthy_deployments}")

-        verbose_router_logger.debug(
-            f"healthy deployments: length {len(healthy_deployments)} {healthy_deployments}"
-        )
        if len(healthy_deployments) == 0:
-            raise ValueError(f"No healthy deployment available, passed model={model}")
+            raise ValueError(f"No healthy deployment available, passed model={model}. ")
        if litellm.model_alias_map and model in litellm.model_alias_map:
            model = litellm.model_alias_map[
                model
@ -3238,6 +3245,53 @@ class Router:
        except Exception as e:
            verbose_router_logger.error(f"Error in _track_deployment_metrics: {str(e)}")

+    def get_num_retries_from_retry_policy(
+        self, exception: Exception, model_group: Optional[str] = None
+    ):
+        """
+        BadRequestErrorRetries: Optional[int] = None
+        AuthenticationErrorRetries: Optional[int] = None
+        TimeoutErrorRetries: Optional[int] = None
+        RateLimitErrorRetries: Optional[int] = None
+        ContentPolicyViolationErrorRetries: Optional[int] = None
+        """
+        # if we can find the exception then in the retry policy -> return the number of retries
+        retry_policy = self.retry_policy
+        if (
+            self.model_group_retry_policy is not None
+            and model_group is not None
+            and model_group in self.model_group_retry_policy
+        ):
+            retry_policy = self.model_group_retry_policy.get(model_group, None)
+
+        if retry_policy is None:
+            return None
+        if (
+            isinstance(exception, litellm.BadRequestError)
+            and retry_policy.BadRequestErrorRetries is not None
+        ):
+            return retry_policy.BadRequestErrorRetries
+        if (
+            isinstance(exception, litellm.AuthenticationError)
+            and retry_policy.AuthenticationErrorRetries is not None
+        ):
+            return retry_policy.AuthenticationErrorRetries
+        if (
+            isinstance(exception, litellm.Timeout)
+            and retry_policy.TimeoutErrorRetries is not None
+        ):
+            return retry_policy.TimeoutErrorRetries
+        if (
+            isinstance(exception, litellm.RateLimitError)
+            and retry_policy.RateLimitErrorRetries is not None
+        ):
+            return retry_policy.RateLimitErrorRetries
+        if (
+            isinstance(exception, litellm.ContentPolicyViolationError)
+            and retry_policy.ContentPolicyViolationErrorRetries is not None
+        ):
+            return retry_policy.ContentPolicyViolationErrorRetries
+
    def flush_cache(self):
        litellm.cache = None
        self.cache.flush_cache()
@ -3248,4 +3302,5 @@ class Router:
        litellm.__async_success_callback = []
        litellm.failure_callback = []
        litellm._async_failure_callback = []
+        self.retry_policy = None
        self.flush_cache()
--- a/litellm/router_strategy/lowest_latency.py
+++ b/litellm/router_strategy/lowest_latency.py
@ -31,6 +31,7 @@ class LiteLLMBase(BaseModel):
 class RoutingArgs(LiteLLMBase):
    ttl: int = 1 * 60 * 60  # 1 hour
    lowest_latency_buffer: float = 0
+    max_latency_list_size: int = 10


 class LowestLatencyLoggingHandler(CustomLogger):
@ -103,7 +104,18 @@ class LowestLatencyLoggingHandler(CustomLogger):
                    request_count_dict[id] = {}

                ## Latency
+                if (
+                    len(request_count_dict[id].get("latency", []))
+                    < self.routing_args.max_latency_list_size
+                ):
                    request_count_dict[id].setdefault("latency", []).append(final_value)
+                else:
+                    request_count_dict[id]["latency"] = request_count_dict[id][
+                        "latency"
+                    ][: self.routing_args.max_latency_list_size - 1] + [final_value]
+
+                if precise_minute not in request_count_dict[id]:
+                    request_count_dict[id][precise_minute] = {}

                if precise_minute not in request_count_dict[id]:
                    request_count_dict[id][precise_minute] = {}
@ -170,8 +182,17 @@ class LowestLatencyLoggingHandler(CustomLogger):
                    if id not in request_count_dict:
                        request_count_dict[id] = {}

-                    ## Latency
+                    ## Latency - give 1000s penalty for failing
+                    if (
+                        len(request_count_dict[id].get("latency", []))
+                        < self.routing_args.max_latency_list_size
+                    ):
                        request_count_dict[id].setdefault("latency", []).append(1000.0)
+                    else:
+                        request_count_dict[id]["latency"] = request_count_dict[id][
+                            "latency"
+                        ][: self.routing_args.max_latency_list_size - 1] + [1000.0]
+
                    self.router_cache.set_cache(
                        key=latency_key,
                        value=request_count_dict,
@ -242,7 +263,15 @@ class LowestLatencyLoggingHandler(CustomLogger):
                    request_count_dict[id] = {}

                ## Latency
+                if (
+                    len(request_count_dict[id].get("latency", []))
+                    < self.routing_args.max_latency_list_size
+                ):
                    request_count_dict[id].setdefault("latency", []).append(final_value)
+                else:
+                    request_count_dict[id]["latency"] = request_count_dict[id][
+                        "latency"
+                    ][: self.routing_args.max_latency_list_size - 1] + [final_value]

                if precise_minute not in request_count_dict[id]:
                    request_count_dict[id][precise_minute] = {}
--- a/litellm/router_strategy/lowest_tpm_rpm_v2.py
+++ b/litellm/router_strategy/lowest_tpm_rpm_v2.py
@ -79,10 +79,12 @@ class LowestTPMLoggingHandler_v2(CustomLogger):
                    model=deployment.get("litellm_params", {}).get("model"),
                    response=httpx.Response(
                        status_code=429,
-                        content="{} rpm limit={}. current usage={}".format(
+                        content="{} rpm limit={}. current usage={}. id={}, model_group={}. Get the model info by calling 'router.get_model_info(id)".format(
                            RouterErrors.user_defined_ratelimit_error.value,
                            deployment_rpm,
                            local_result,
+                            model_id,
+                            deployment.get("model_name", ""),
                        ),
                        request=httpx.Request(method="tpm_rpm_limits", url="https://github.com/BerriAI/litellm"),  # type: ignore
                    ),
--- a/litellm/tests/langfuse.log
+++ b/litellm/tests/langfuse.log
@ -0,0 +1,88 @@
+int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+Traceback (most recent call last):
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
+    "usage": _convert_usage_input(usage) if usage is not None else None,
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
+    "totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
+                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
+    return int(usage[key])
+           ^^^^^^^^^^^^^^^
+TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+Traceback (most recent call last):
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
+    "usage": _convert_usage_input(usage) if usage is not None else None,
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
+    "totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
+                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
+    return int(usage[key])
+           ^^^^^^^^^^^^^^^
+TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+Traceback (most recent call last):
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
+    "usage": _convert_usage_input(usage) if usage is not None else None,
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
+    "totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
+                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
+    return int(usage[key])
+           ^^^^^^^^^^^^^^^
+TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+Traceback (most recent call last):
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
+    "usage": _convert_usage_input(usage) if usage is not None else None,
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
+    "totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
+                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
+    return int(usage[key])
+           ^^^^^^^^^^^^^^^
+TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+Traceback (most recent call last):
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
+    "usage": _convert_usage_input(usage) if usage is not None else None,
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
+    "totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
+                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
+    return int(usage[key])
+           ^^^^^^^^^^^^^^^
+TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
+consumer is running...
+Getting observations... None, None, None, None, litellm-test-98e1cc75-bef8-4280-a2b9-e08633b81acd, None, GENERATION
+consumer is running...
+Getting observations... None, None, None, None, litellm-test-532d2bc8-f8d6-42fd-8f78-416bae79925d, None, GENERATION
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
+joining 1 consumer threads
+consumer thread 0 joined
--- a/litellm/tests/log.txt
+++ b/litellm/tests/log.txt
@ -5,74 +5,99 @@ plugins: timeout-2.2.0, asyncio-0.23.2, anyio-3.7.1, xdist-3.3.1
 asyncio: mode=Mode.STRICT
 collected 1 item

-test_custom_logger.py Chunks have a created at hidden param
-Chunks sorted
-token_counter messages received: [{'role': 'user', 'content': 'write a one sentence poem about: 73348'}]
-Token Counter - using OpenAI token counter, for model=gpt-3.5-turbo
-LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
-Logging Details LiteLLM-Success Call: None
-success callbacks: []
-Token Counter - using OpenAI token counter, for model=gpt-3.5-turbo
-LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
-Logging Details LiteLLM-Success Call streaming complete
-Looking up model=gpt-3.5-turbo in model_cost_map
-Success: model=gpt-3.5-turbo in model_cost_map
-prompt_tokens=17; completion_tokens=0
-Returned custom cost for model=gpt-3.5-turbo - prompt_tokens_cost_usd_dollar: 2.55e-05, completion_tokens_cost_usd_dollar: 0.0
-final cost: 2.55e-05; prompt_tokens_cost_usd_dollar: 2.55e-05; completion_tokens_cost_usd_dollar: 0.0
-.                                                  [100%]
+test_completion.py F                                                     [100%]

+=================================== FAILURES ===================================
+______________________ test_completion_anthropic_hanging _______________________
+
+    def test_completion_anthropic_hanging():
+        litellm.set_verbose = True
+        litellm.modify_params = True
+        messages = [
+            {
+                "role": "user",
+                "content": "What's the capital of fictional country Ubabababababaaba? Use your tools.",
+            },
+            {
+                "role": "assistant",
+                "function_call": {
+                    "name": "get_capital",
+                    "arguments": '{"country": "Ubabababababaaba"}',
+                },
+            },
+            {"role": "function", "name": "get_capital", "content": "Kokoko"},
+        ]
+    
+        converted_messages = anthropic_messages_pt(messages)
+    
+        print(f"converted_messages: {converted_messages}")
+    
+        ## ENSURE USER / ASSISTANT ALTERNATING
+        for i, msg in enumerate(converted_messages):
+            if i < len(converted_messages) - 1:
+>               assert msg["role"] != converted_messages[i + 1]["role"]
+E               AssertionError: assert 'user' != 'user'
+
+test_completion.py:2406: AssertionError
+---------------------------- Captured stdout setup -----------------------------
+<module 'litellm' from '/Users/krrishdholakia/Documents/litellm/litellm/__init__.py'>
+
+pytest fixture - resetting callbacks
+----------------------------- Captured stdout call -----------------------------
+message: {'role': 'user', 'content': "What's the capital of fictional country Ubabababababaaba? Use your tools."}
+message: {'role': 'function', 'name': 'get_capital', 'content': 'Kokoko'}
+converted_messages: [{'role': 'user', 'content': [{'type': 'text', 'text': "What's the capital of fictional country Ubabababababaaba? Use your tools."}]}, {'role': 'user', 'content': [{'type': 'tool_result', 'tool_use_id': '10e9f4d4-bdc9-4514-8b7a-c10bc555d67c', 'content': 'Kokoko'}]}]
 =============================== warnings summary ===============================
-../../../../../../opt/homebrew/lib/python3.11/site-packages/pydantic/_internal/_config.py:284: 18 warnings
+../../../../../../opt/homebrew/lib/python3.11/site-packages/pydantic/_internal/_config.py:284: 23 warnings
  /opt/homebrew/lib/python3.11/site-packages/pydantic/_internal/_config.py:284: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    warnings.warn(DEPRECATION_MESSAGE, DeprecationWarning)

-../proxy/_types.py:218
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:218: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:219
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:219: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:305
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:305: PydanticDeprecatedSince20: `pydantic.config.Extra` is deprecated, use literal values instead (e.g. `extra='allow'`). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:306
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:306: PydanticDeprecatedSince20: `pydantic.config.Extra` is deprecated, use literal values instead (e.g. `extra='allow'`). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    extra = Extra.allow  # Allow extra fields

-../proxy/_types.py:308
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:308: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:309
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:309: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:337
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:337: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:338
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:338: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:384
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:384: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:385
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:385: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:450
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:450: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:454
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:454: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:462
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:462: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:466
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:466: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:502
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:502: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:509
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:509: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:536
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:536: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:546
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:546: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:823
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:823: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:840
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:840: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:850
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:850: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:867
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:867: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

-../proxy/_types.py:869
-  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:869: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
+../proxy/_types.py:886
+  /Users/krrishdholakia/Documents/litellm/litellm/proxy/_types.py:886: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.7/migration/
    @root_validator(pre=True)

 ../../../../../../opt/homebrew/lib/python3.11/site-packages/pkg_resources/__init__.py:121
@ -126,30 +151,7 @@ final cost: 2.55e-05; prompt_tokens_cost_usd_dollar: 2.55e-05; completion_tokens
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

-test_custom_logger.py::test_redis_cache_completion_stream
-  /opt/homebrew/lib/python3.11/site-packages/_pytest/unraisableexception.py:78: PytestUnraisableExceptionWarning: Exception ignored in: <function StreamWriter.__del__ at 0x1019c28e0>
-  
-  Traceback (most recent call last):
-    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/streams.py", line 395, in __del__
-      self.close()
-    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/streams.py", line 343, in close
-      return self._transport.close()
-             ^^^^^^^^^^^^^^^^^^^^^^^
-    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 112, in close
-      self._ssl_protocol._start_shutdown()
-    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 620, in _start_shutdown
-      self._shutdown_timeout_handle = self._loop.call_later(
-                                      ^^^^^^^^^^^^^^^^^^^^^^
-    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 727, in call_later
-      timer = self.call_at(self.time() + delay, callback, *args,
-              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 740, in call_at
-      self._check_closed()
-    File "/opt/homebrew/Cellar/python@3.11/3.11.6_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
-      raise RuntimeError('Event loop is closed')
-  RuntimeError: Event loop is closed
-  
-    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))
-
 -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
-======================== 1 passed, 56 warnings in 2.43s ========================
+=========================== short test summary info ============================
+FAILED test_completion.py::test_completion_anthropic_hanging - AssertionError...
+======================== 1 failed, 60 warnings in 0.15s ========================
--- a/litellm/tests/test_alangfuse.py
+++ b/litellm/tests/test_alangfuse.py
@ -205,8 +205,6 @@ async def test_langfuse_logging_without_request_response(stream):
        assert _trace_data[0].output == {
            "role": "assistant",
            "content": "redacted-by-litellm",
-            "function_call": None,
-            "tool_calls": None,
        }

    except Exception as e:
@ -561,7 +559,15 @@ def test_langfuse_existing_trace_id():

    new_langfuse_trace = langfuse_client.get_trace(id=trace_id)

-    assert dict(initial_langfuse_trace) == dict(new_langfuse_trace)
+    initial_langfuse_trace_dict = dict(initial_langfuse_trace)
+    initial_langfuse_trace_dict.pop("updatedAt")
+    initial_langfuse_trace_dict.pop("timestamp")
+
+    new_langfuse_trace_dict = dict(new_langfuse_trace)
+    new_langfuse_trace_dict.pop("updatedAt")
+    new_langfuse_trace_dict.pop("timestamp")
+
+    assert initial_langfuse_trace_dict == new_langfuse_trace_dict


 def test_langfuse_logging_tool_calling():
--- a/litellm/tests/test_alerting.py
+++ b/litellm/tests/test_alerting.py
@ -15,10 +15,24 @@ import litellm
 import pytest
 import asyncio
 from unittest.mock import patch, MagicMock
+from litellm.utils import get_api_base
 from litellm.caching import DualCache
 from litellm.integrations.slack_alerting import SlackAlerting


+@pytest.mark.parametrize(
+    "model, optional_params, expected_api_base",
+    [
+        ("openai/my-fake-model", {"api_base": "my-fake-api-base"}, "my-fake-api-base"),
+        ("gpt-3.5-turbo", {}, "https://api.openai.com"),
+    ],
+)
+def test_get_api_base_unit_test(model, optional_params, expected_api_base):
+    api_base = get_api_base(model=model, optional_params=optional_params)
+
+    assert api_base == expected_api_base
+
+
@pytest.mark.asyncio
 async def test_get_api_base():
    _pl = ProxyLogging(user_api_key_cache=DualCache())
@ -94,3 +108,80 @@ def test_init():
    assert slack_no_alerting.alerting == []

    print("passed testing slack alerting init")
+
+
+from unittest.mock import patch, AsyncMock
+from datetime import datetime, timedelta
+
+
+@pytest.fixture
+def slack_alerting():
+    return SlackAlerting(alerting_threshold=1)
+
+
+# Test for hanging LLM responses
+@pytest.mark.asyncio
+async def test_response_taking_too_long_hanging(slack_alerting):
+    request_data = {
+        "model": "test_model",
+        "messages": "test_messages",
+        "litellm_status": "running",
+    }
+    with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
+        await slack_alerting.response_taking_too_long(
+            type="hanging_request", request_data=request_data
+        )
+        mock_send_alert.assert_awaited_once()
+
+
+# Test for slow LLM responses
+@pytest.mark.asyncio
+async def test_response_taking_too_long_callback(slack_alerting):
+    start_time = datetime.now()
+    end_time = start_time + timedelta(seconds=301)
+    kwargs = {"model": "test_model", "messages": "test_messages", "litellm_params": {}}
+    with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
+        await slack_alerting.response_taking_too_long_callback(
+            kwargs, None, start_time, end_time
+        )
+        mock_send_alert.assert_awaited_once()
+
+
+# Test for budget crossed
+@pytest.mark.asyncio
+async def test_budget_alerts_crossed(slack_alerting):
+    user_max_budget = 100
+    user_current_spend = 101
+    with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
+        await slack_alerting.budget_alerts(
+            "user_budget", user_max_budget, user_current_spend
+        )
+        mock_send_alert.assert_awaited_once()
+
+
+# Test for budget crossed again (should not fire alert 2nd time)
+@pytest.mark.asyncio
+async def test_budget_alerts_crossed_again(slack_alerting):
+    user_max_budget = 100
+    user_current_spend = 101
+    with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
+        await slack_alerting.budget_alerts(
+            "user_budget", user_max_budget, user_current_spend
+        )
+        mock_send_alert.assert_awaited_once()
+        mock_send_alert.reset_mock()
+        await slack_alerting.budget_alerts(
+            "user_budget", user_max_budget, user_current_spend
+        )
+        mock_send_alert.assert_not_awaited()
+
+
+# Test for send_alert - should be called once
+@pytest.mark.asyncio
+async def test_send_alert(slack_alerting):
+    with patch.object(
+        slack_alerting.async_http_handler, "post", new=AsyncMock()
+    ) as mock_post:
+        mock_post.return_value.status_code = 200
+        await slack_alerting.send_alert("Test message", "Low", "budget_alerts")
+        mock_post.assert_awaited_once()
--- a/litellm/tests/test_amazing_vertex_completion.py
+++ b/litellm/tests/test_amazing_vertex_completion.py
@ -548,42 +548,6 @@ def test_gemini_pro_vision_base64():


 def test_gemini_pro_function_calling():
-    load_vertex_ai_credentials()
-    tools = [
-        {
-            "type": "function",
-            "function": {
-                "name": "get_current_weather",
-                "description": "Get the current weather in a given location",
-                "parameters": {
-                    "type": "object",
-                    "properties": {
-                        "location": {
-                            "type": "string",
-                            "description": "The city and state, e.g. San Francisco, CA",
-                        },
-                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
-                    },
-                    "required": ["location"],
-                },
-            },
-        }
-    ]
-
-    messages = [
-        {
-            "role": "user",
-            "content": "What's the weather like in Boston today in fahrenheit?",
-        }
-    ]
-    completion = litellm.completion(
-        model="gemini-pro", messages=messages, tools=tools, tool_choice="auto"
-    )
-    print(f"completion: {completion}")
-    if hasattr(completion.choices[0].message, "tool_calls") and isinstance(
-        completion.choices[0].message.tool_calls, list
-    ):
-        assert len(completion.choices[0].message.tool_calls) == 1
    try:
        load_vertex_ai_credentials()
        tools = [
--- a/litellm/tests/test_assistants.py
+++ b/litellm/tests/test_assistants.py
@ -0,0 +1,102 @@
+# What is this?
+## Unit Tests for OpenAI Assistants API
+import sys, os, json
+import traceback
+from dotenv import load_dotenv
+
+load_dotenv()
+sys.path.insert(
+    0, os.path.abspath("../..")
+)  # Adds the parent directory to the system path
+import pytest, logging, asyncio
+import litellm
+from litellm import create_thread, get_thread
+from litellm.llms.openai import (
+    OpenAIAssistantsAPI,
+    MessageData,
+    Thread,
+    OpenAIMessage as Message,
+)
+
+"""
+V0 Scope:
+
+- Add Message -> `/v1/threads/{thread_id}/messages`
+- Run Thread -> `/v1/threads/{thread_id}/run`
+"""
+
+
+def test_create_thread_litellm() -> Thread:
+    message: MessageData = {"role": "user", "content": "Hey, how's it going?"}  # type: ignore
+    new_thread = create_thread(
+        custom_llm_provider="openai",
+        messages=[message],  # type: ignore
+    )
+
+    assert isinstance(
+        new_thread, Thread
+    ), f"type of thread={type(new_thread)}. Expected Thread-type"
+    return new_thread
+
+
+def test_get_thread_litellm():
+    new_thread = test_create_thread_litellm()
+
+    received_thread = get_thread(
+        custom_llm_provider="openai",
+        thread_id=new_thread.id,
+    )
+
+    assert isinstance(
+        received_thread, Thread
+    ), f"type of thread={type(received_thread)}. Expected Thread-type"
+    return new_thread
+
+
+def test_add_message_litellm():
+    message: MessageData = {"role": "user", "content": "Hey, how's it going?"}  # type: ignore
+    new_thread = test_create_thread_litellm()
+
+    # add message to thread
+    message: MessageData = {"role": "user", "content": "Hey, how's it going?"}  # type: ignore
+    added_message = litellm.add_message(
+        thread_id=new_thread.id, custom_llm_provider="openai", **message
+    )
+
+    print(f"added message: {added_message}")
+
+    assert isinstance(added_message, Message)
+
+
+def test_run_thread_litellm():
+    """
+    - Get Assistants
+    - Create thread
+    - Create run w/ Assistants + Thread
+    """
+    assistants = litellm.get_assistants(custom_llm_provider="openai")
+
+    ## get the first assistant ###
+    assistant_id = assistants.data[0].id
+
+    new_thread = test_create_thread_litellm()
+
+    thread_id = new_thread.id
+
+    # add message to thread
+    message: MessageData = {"role": "user", "content": "Hey, how's it going?"}  # type: ignore
+    added_message = litellm.add_message(
+        thread_id=new_thread.id, custom_llm_provider="openai", **message
+    )
+
+    run = litellm.run_thread(
+        custom_llm_provider="openai", thread_id=thread_id, assistant_id=assistant_id
+    )
+
+    if run.status == "completed":
+        messages = litellm.get_messages(
+            thread_id=new_thread.id, custom_llm_provider="openai"
+        )
+        assert isinstance(messages.data[0], Message)
+    else:
+        pytest.fail("An unexpected error occurred when running the thread")
--- a/litellm/tests/test_bedrock_completion.py
+++ b/litellm/tests/test_bedrock_completion.py
@ -229,15 +229,39 @@ def test_bedrock_extra_headers():
 def test_bedrock_claude_3():
    try:
        litellm.set_verbose = True
+        data = {
+            "max_tokens": 2000,
+            "stream": False,
+            "temperature": 0.3,
+            "messages": [
+                {"role": "user", "content": "Hi"},
+                {"role": "assistant", "content": "Hi"},
+                {
+                    "role": "user",
+                    "content": [
+                        {"text": "describe this image", "type": "text"},
+                        {
+                            "image_url": {
+                                "detail": "high",
+                                "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAL0AAAC9CAMAAADRCYwCAAAAh1BMVEX///8AAAD8/Pz5+fkEBAT39/cJCQn09PRNTU3y8vIMDAwzMzPe3t7v7+8QEBCOjo7FxcXR0dHn5+elpaWGhoYYGBivr686OjocHBy0tLQtLS1TU1PY2Ni6urpaWlpERER3d3ecnJxoaGiUlJRiYmIlJSU4ODhBQUFycnKAgIDBwcFnZ2chISE7EjuwAAAI/UlEQVR4nO1caXfiOgz1bhJIyAJhX1JoSzv8/9/3LNlpYd4rhX6o4/N8Z2lKM2cURZau5JsQEhERERERERERERERERERERHx/wBjhDPC3OGN8+Cc5JeMuheaETSdO8vZFyCScHtmz2CsktoeMn7rLM1u3h0PMAEhyYX7v/Q9wQvoGdB0hlbzm45lEq/wd6y6G9aezvBk9AXwp1r3LHJIRsh6s2maxaJpmvqgvkC7WFS3loUnaFJtKRVUCEoV/RpCnHRvAsesVQ1hw+vd7Mpo+424tLs72NplkvQgcdrsvXkW/zJWqH/fA0FT84M/xnQJt4to3+ZLuanbM6X5lfXKHosO9COgREqpCR5i86pf2zPS7j9tTj+9nO7bQz3+xGEyGW9zqgQ1tyQ/VsxEDvce/4dcUPNb5OD9yXvR4Z2QisuP0xiGWPnemgugU5q/troHhGEjIF5sTOyW648aC0TssuaaCEsYEIkGzjWXOp3A0vVsf6kgRyqaDk+T7DIVWrb58b2tT5xpUucKwodOD/5LbrZC1ws6YSaBZJ/8xlh+XZSYXaMJ2ezNqjB3IPXuehPcx2U6b4t1dS/xNdFzguUt8ie7arnPeyCZroxLHzGgGdqVcspwafizPWEXBee+9G1OaufGdvNng/9C+gwgZ3PH3r87G6zXTZ5D5De2G2DeFoANXfbACkT+fxBQ22YFsTTJF9hjFVO6VbqxZXko4WJ8s52P4PnuxO5KRzu0/hlix1ySt8iXjgaQ+4IHPA9nVzNkdduM9LFT/Aacj4FtKrHA7iAw602Vnht6R8Vq1IOS+wNMKLYqayAYfRuufQPGeGb7sZogQQoLZrGPgZ6KoYn70Iw30O92BNEDpvwouCFn6wH2uS+EhRb3WF/HObZk3HuxfRQM3Y/Of/VH0n4MKNHZDiZvO9+m/ABALfkOcuar/7nOo7B95ACGVAFaz4jMiJwJhdaHBkySmzlGTu82gr6FSTik2kJvLnY9nOd/D90qcH268m3I/cgI1xg1maE5CuZYaWLH+UHANCIck0yt7Mx5zBm5vVHXHwChsZ35kKqUpmo5Svq5/fzfAI5g2vDtFPYo1HiEA85QrDeGm9g//LG7K0scO3sdpj2CBDgCa+0OFs0bkvVgnnM/QBDwllOMm+cN7vMSHlB7Uu4haHKaTwgGkv8tlK+hP8fzmFuK/RQTpaLPWvbd58yWIo66HHM0OsPoPhVqmtaEVL7N+wYcTLTbb0DLdgp23Eyy2VYJ2N7bkLFAAibtoLPe5sLt6Oa2bvU+zyeMa8wrixO0gRTn9tO9NCSThTLGqcqtsDvphlfmx/cPBZVvw24jg1LE2lPuEo35Mhi58U0I/Ga8n5w+NS8i34MAQLos5B1u0xL1ZvCVYVRw/Fs2q53KLaXJMWwOZZ/4MPYV19bAHmgGDKB6f01xoeJKFbl63q9J34KdaVNPJWztQyRkzA3KNs1AdAEDowMxh10emXTCx75CkurtbY/ZpdNDGdsn2UcHKHsQ8Ai3WZi48IfkvtjOhsLpuIRSKZTX9FA4o+0d6o/zOWqQzVJMynL9NsxhSJOaourq6nBVQBueMSyubsX2xHrmuABZN2Ns9jr5nwLFlLF/2R6atjW/67Yd11YQ1Z+kA9Zk9dPTM/o6dVo6HHVgC0JR8oUfmI93T9u3gvTG94bAH02Y5xeqRcjuwnKCK6Q2+ajl8KXJ3GSh22P3Zfx6S+n008ROhJn+JRIUVu6o7OXl8w1SeyhuqNDwNI7SjbK08QrqPxS95jy4G7nCXVq6G3HNu0LtK5J0e226CfC005WKK9sVvfxI0eUbcnzutfhWe3rpZHM0nZ/ny/N8tanKYlQ6VEW5Xuym8yV1zZX58vwGhZp/5tFfhybZabdbrQYOs8F+xEhmPsb0/nki6kIyVvzZzUASiOrTfF+Sj9bXC7DoJxeiV8tjQL6loSd0yCx7YyB6rPdLx31U2qCG3F/oXIuDuqd6LFO+4DNIJuxFZqSsU0ea88avovFnWKRYFYRQDfCfcGaBCLn4M4A1ntJ5E57vicwqq2enaZEF5nokCYu9TbKqCC5yCDfL+GhLxT4w4xEJs+anqgou8DOY2q8FMryjb2MehC1dRJ9s4g9NXeTwPkWON4RH+FhIe0AWR/S9ekvQ+t70XHeimGF78LzuU7d7PwrswdIG2VpgF8C53qVQsTDtBJc4CdnkQPbnZY9mbPdDFra3PCXBBQ5QBn2aQqtyhvlyYM4Hb2/mdhsxCUen04GZVvIJZw5PAamMOmjzq8Q+dzAKLXDQ3RUZItWsg4t7W2DP+JDrJDymoMH7E5zQtuEpG03GTIjGCW3LQqOYEsXgFc78x76NeRwY6SNM+IfQoh6myJKRBIcLYxZcwscJ/gI2isTBty2Po9IkYzP0/SS4hGlxRjFAG5z1Jt1LckiB57yWvo35EaolbvA+6fBa24xodL2YjsPpTnj3JgJOqhcgOeLVsYYwoK0wjY+m1D3rGc40CukkaHnkEjarlXrF1B9M6ECQ6Ow0V7R7N4G3LfOHAXtymoyXOb4QhaYHJ/gNBJUkxclpSs7DNcgWWDDmM7Ke5MJpGuioe7w5EOvfTunUKRzOh7G2ylL+6ynHrD54oQO3//cN3yVO+5qMVsPZq0CZIOx4TlcJ8+Vz7V5waL+7WekzUpRFMTnnTlSCq3X5usi8qmIleW/rit1+oQZn1WGSU/sKBYEqMNh1mBOc6PhK8yCfKHdUNQk8o/G19ZPTs5MYfai+DLs5vmee37zEyyH48WW3XA6Xw6+Az8lMhci7N/KleToo7PtTKm+RA887Kqc6E9dyqL/QPTugzMHLbLZtJKqKLFfzVWRNJ63c+95uWT/F7R0U5dDVvuS409AJXhJvD0EwWaWdW8UN11u/7+umaYjT8mJtzZwP/MD4r57fihiHlC5fylHfaqnJdro+Dr7DajvO+vi2EwyD70s8nCH71nzIO1l5Zl+v1DMCb5ebvCMkGHvobXy/hPumGLyX0218/3RyD1GRLOuf9u/OGQyDmto32yMiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIv7GP8YjWPR/czH2AAAAAElFTkSuQmCC",
+                            },
+                            "type": "image_url",
+                        },
+                    ],
+                },
+            ],
+        }
        response: ModelResponse = completion(
            model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0",
-            messages=messages,
-            max_tokens=10,
-            temperature=0.78,
+            # messages=messages,
+            # max_tokens=10,
+            # temperature=0.78,
+            **data,
        )
        # Add any assertions here to check the response
        assert len(response.choices) > 0
        assert len(response.choices[0].message.content) > 0
+
    except RateLimitError:
        pass
    except Exception as e:
--- a/litellm/tests/test_completion.py
+++ b/litellm/tests/test_completion.py
@ -12,6 +12,7 @@ import pytest
 import litellm
 from litellm import embedding, completion, completion_cost, Timeout
 from litellm import RateLimitError
+from litellm.llms.prompt_templates.factory import anthropic_messages_pt

 # litellm.num_retries=3
 litellm.cache = None
@ -2355,6 +2356,56 @@ def test_completion_with_fallbacks():


 # test_completion_with_fallbacks()
+
+
+# @pytest.mark.parametrize(
+#     "function_call",
+#     [
+#         [{"role": "function", "name": "get_capital", "content": "Kokoko"}],
+#         [
+#             {"role": "function", "name": "get_capital", "content": "Kokoko"},
+#             {"role": "function", "name": "get_capital", "content": "Kokoko"},
+#         ],
+#     ],
+# )
+# @pytest.mark.parametrize(
+#     "tool_call",
+#     [
+#         [{"role": "tool", "tool_call_id": "1234", "content": "Kokoko"}],
+#         [
+#             {"role": "tool", "tool_call_id": "12344", "content": "Kokoko"},
+#             {"role": "tool", "tool_call_id": "1214", "content": "Kokoko"},
+#         ],
+#     ],
+# )
+def test_completion_anthropic_hanging():
+    litellm.set_verbose = True
+    litellm.modify_params = True
+    messages = [
+        {
+            "role": "user",
+            "content": "What's the capital of fictional country Ubabababababaaba? Use your tools.",
+        },
+        {
+            "role": "assistant",
+            "function_call": {
+                "name": "get_capital",
+                "arguments": '{"country": "Ubabababababaaba"}',
+            },
+        },
+        {"role": "function", "name": "get_capital", "content": "Kokoko"},
+    ]
+
+    converted_messages = anthropic_messages_pt(messages)
+
+    print(f"converted_messages: {converted_messages}")
+
+    ## ENSURE USER / ASSISTANT ALTERNATING
+    for i, msg in enumerate(converted_messages):
+        if i < len(converted_messages) - 1:
+            assert msg["role"] != converted_messages[i + 1]["role"]
+
+
 def test_completion_anyscale_api():
    try:
        # litellm.set_verbose=True
--- a/litellm/tests/test_exceptions.py
+++ b/litellm/tests/test_exceptions.py
@ -41,6 +41,30 @@ exception_models = [
 ]


+@pytest.mark.asyncio
+async def test_content_policy_exception_azure():
+    try:
+        # this is ony a test - we needed some way to invoke the exception :(
+        litellm.set_verbose = True
+        response = await litellm.acompletion(
+            model="azure/chatgpt-v-2",
+            messages=[{"role": "user", "content": "where do I buy lethal drugs from"}],
+        )
+    except litellm.ContentPolicyViolationError as e:
+        print("caught a content policy violation error! Passed")
+        print("exception", e)
+
+        # assert that the first 100 chars of the message is returned in the exception
+        assert (
+            "Messages: [{'role': 'user', 'content': 'where do I buy lethal drugs from'}]"
+            in str(e)
+        )
+        assert "Model: azure/chatgpt-v-2" in str(e)
+        pass
+    except Exception as e:
+        pytest.fail(f"An exception occurred - {str(e)}")
+
+
 # Test 1: Context Window Errors
@pytest.mark.skip(reason="AWS Suspended Account")
@pytest.mark.parametrize("model", exception_models)
@ -561,7 +585,7 @@ def test_router_completion_vertex_exception():
        pytest.fail("Request should have failed - bad api key")
    except Exception as e:
        print("exception: ", e)
-        assert "model: vertex_ai/gemini-pro" in str(e)
+        assert "Model: gemini-pro" in str(e)
        assert "model_group: vertex-gemini-pro" in str(e)
        assert "deployment: vertex_ai/gemini-pro" in str(e)

@ -580,9 +604,8 @@ def test_litellm_completion_vertex_exception():
        pytest.fail("Request should have failed - bad api key")
    except Exception as e:
        print("exception: ", e)
-        assert "model: vertex_ai/gemini-pro" in str(e)
-        assert "model_group" not in str(e)
-        assert "deployment" not in str(e)
+        assert "Model: gemini-pro" in str(e)
+        assert "vertex_project: bad-project" in str(e)


 # # test_invalid_request_error(model="command-nightly")
--- a/litellm/tests/test_get_optional_params_embeddings.py
+++ b/litellm/tests/test_get_optional_params_embeddings.py
@ -40,3 +40,32 @@ def test_vertex_projects():


 # test_vertex_projects()
+
+
+def test_bedrock_embed_v2_regular():
+    model, custom_llm_provider, _, _ = get_llm_provider(
+        model="bedrock/amazon.titan-embed-text-v2:0"
+    )
+    optional_params = get_optional_params_embeddings(
+        model=model,
+        dimensions=512,
+        custom_llm_provider=custom_llm_provider,
+    )
+    print(f"received optional_params: {optional_params}")
+    assert optional_params == {"dimensions": 512}
+
+
+def test_bedrock_embed_v2_with_drop_params():
+    litellm.drop_params = True
+    model, custom_llm_provider, _, _ = get_llm_provider(
+        model="bedrock/amazon.titan-embed-text-v2:0"
+    )
+    optional_params = get_optional_params_embeddings(
+        model=model,
+        dimensions=512,
+        user="test-litellm-user-5",
+        encoding_format="base64",
+        custom_llm_provider=custom_llm_provider,
+    )
+    print(f"received optional_params: {optional_params}")
+    assert optional_params == {"dimensions": 512}
--- a/litellm/tests/test_lowest_latency_routing.py
+++ b/litellm/tests/test_lowest_latency_routing.py
@ -7,7 +7,7 @@ import traceback
 from dotenv import load_dotenv

 load_dotenv()
-import os
+import os, copy

 sys.path.insert(
    0, os.path.abspath("../..")
@ -20,6 +20,96 @@ from litellm.caching import DualCache
 ### UNIT TESTS FOR LATENCY ROUTING ###


+@pytest.mark.parametrize("sync_mode", [True, False])
+@pytest.mark.asyncio
+async def test_latency_memory_leak(sync_mode):
+    """
+    Test to make sure there's no memory leak caused by lowest latency routing
+
+    - make 10 calls -> check memory
+    - make 11th call -> no change in memory
+    """
+    test_cache = DualCache()
+    model_list = []
+    lowest_latency_logger = LowestLatencyLoggingHandler(
+        router_cache=test_cache, model_list=model_list
+    )
+    model_group = "gpt-3.5-turbo"
+    deployment_id = "1234"
+    kwargs = {
+        "litellm_params": {
+            "metadata": {
+                "model_group": "gpt-3.5-turbo",
+                "deployment": "azure/chatgpt-v-2",
+            },
+            "model_info": {"id": deployment_id},
+        }
+    }
+    start_time = time.time()
+    response_obj = {"usage": {"total_tokens": 50}}
+    time.sleep(5)
+    end_time = time.time()
+    for _ in range(10):
+        if sync_mode:
+            lowest_latency_logger.log_success_event(
+                response_obj=response_obj,
+                kwargs=kwargs,
+                start_time=start_time,
+                end_time=end_time,
+            )
+        else:
+            await lowest_latency_logger.async_log_success_event(
+                response_obj=response_obj,
+                kwargs=kwargs,
+                start_time=start_time,
+                end_time=end_time,
+            )
+    latency_key = f"{model_group}_map"
+    cache_value = copy.deepcopy(
+        test_cache.get_cache(key=latency_key)
+    )  # MAKE SURE NO MEMORY LEAK IN CACHING OBJECT
+
+    if sync_mode:
+        lowest_latency_logger.log_success_event(
+            response_obj=response_obj,
+            kwargs=kwargs,
+            start_time=start_time,
+            end_time=end_time,
+        )
+    else:
+        await lowest_latency_logger.async_log_success_event(
+            response_obj=response_obj,
+            kwargs=kwargs,
+            start_time=start_time,
+            end_time=end_time,
+        )
+    new_cache_value = test_cache.get_cache(key=latency_key)
+    # Assert that the size of the cache doesn't grow unreasonably
+    assert get_size(new_cache_value) <= get_size(
+        cache_value
+    ), f"Memory leak detected in function call! new_cache size={get_size(new_cache_value)}, old cache size={get_size(cache_value)}"
+
+
+def get_size(obj, seen=None):
+    # From https://goshippo.com/blog/measure-real-size-any-python-object/
+    # Recursively finds size of objects
+    size = sys.getsizeof(obj)
+    if seen is None:
+        seen = set()
+    obj_id = id(obj)
+    if obj_id in seen:
+        return 0
+    seen.add(obj_id)
+    if isinstance(obj, dict):
+        size += sum([get_size(v, seen) for v in obj.values()])
+        size += sum([get_size(k, seen) for k in obj.keys()])
+    elif hasattr(obj, "__dict__"):
+        size += get_size(obj.__dict__, seen)
+    elif hasattr(obj, "__iter__") and not isinstance(obj, (str, bytes, bytearray)):
+        size += sum([get_size(i, seen) for i in obj])
+    return size
+
+
 def test_latency_updated():
    test_cache = DualCache()
    model_list = []
--- a/litellm/tests/test_optional_params.py
+++ b/litellm/tests/test_optional_params.py
@ -5,13 +5,58 @@ import pytest

 sys.path.insert(0, os.path.abspath("../.."))
 import litellm
-from litellm.utils import get_optional_params_embeddings
+from litellm.utils import get_optional_params_embeddings, get_optional_params
+from litellm.llms.prompt_templates.factory import (
+    map_system_message_pt,
+)
+from litellm.types.completion import (
+    ChatCompletionUserMessageParam,
+    ChatCompletionSystemMessageParam,
+    ChatCompletionMessageParam,
+)

 ## get_optional_params_embeddings
 ### Models: OpenAI, Azure, Bedrock
 ### Scenarios: w/ optional params + litellm.drop_params = True


+def test_supports_system_message():
+    """
+    Check if litellm.completion(...,supports_system_message=False)
+    """
+    messages = [
+        ChatCompletionSystemMessageParam(role="system", content="Listen here!"),
+        ChatCompletionUserMessageParam(role="user", content="Hello there!"),
+    ]
+
+    new_messages = map_system_message_pt(messages=messages)
+
+    assert len(new_messages) == 1
+    assert new_messages[0]["role"] == "user"
+
+    ## confirm you can make a openai call with this param
+
+    response = litellm.completion(
+        model="gpt-3.5-turbo", messages=new_messages, supports_system_message=False
+    )
+
+    assert isinstance(response, litellm.ModelResponse)
+
+
+@pytest.mark.parametrize(
+    "stop_sequence, expected_count", [("\n", 0), (["\n"], 0), (["finish_reason"], 1)]
+)
+def test_anthropic_optional_params(stop_sequence, expected_count):
+    """
+    Test if whitespace character optional param is dropped by anthropic
+    """
+    litellm.drop_params = True
+    optional_params = get_optional_params(
+        model="claude-3", custom_llm_provider="anthropic", stop=stop_sequence
+    )
+    assert len(optional_params) == expected_count
+
+
 def test_bedrock_optional_params_embeddings():
    litellm.drop_params = True
    optional_params = get_optional_params_embeddings(
--- a/litellm/tests/test_proxy_exception_mapping.py
+++ b/litellm/tests/test_proxy_exception_mapping.py
@ -1,6 +1,8 @@
 # test that the proxy actually does exception mapping to the OpenAI format

 import sys, os
+from unittest import mock
+import json
 from dotenv import load_dotenv

 load_dotenv()
@ -12,13 +14,30 @@ sys.path.insert(
 import pytest
 import litellm, openai
 from fastapi.testclient import TestClient
-from fastapi import FastAPI
+from fastapi import Response
 from litellm.proxy.proxy_server import (
    router,
    save_worker_config,
    initialize,
 )  # Replace with the actual module where your FastAPI router is defined

+invalid_authentication_error_response = Response(
+    status_code=401,
+    content=json.dumps({"error": "Invalid Authentication"}),
+)
+context_length_exceeded_error_response_dict = {
+    "error": {
+        "message": "AzureException - Error code: 400 - {'error': {'message': \"This model's maximum context length is 4096 tokens. However, your messages resulted in 10007 tokens. Please reduce the length of the messages.\", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}",
+        "type": None,
+        "param": None,
+        "code": 400,
+    },
+}
+context_length_exceeded_error_response = Response(
+    status_code=400,
+    content=json.dumps(context_length_exceeded_error_response_dict),
+)
+

@pytest.fixture
 def client():
@ -60,7 +79,11 @@ def test_chat_completion_exception(client):


 # raise openai.AuthenticationError
-def test_chat_completion_exception_azure(client):
+@mock.patch(
+    "litellm.proxy.proxy_server.llm_router.acompletion",
+    return_value=invalid_authentication_error_response,
+)
+def test_chat_completion_exception_azure(mock_acompletion, client):
    try:
        # Your test data
        test_data = {
@ -73,6 +96,15 @@ def test_chat_completion_exception_azure(client):

        response = client.post("/chat/completions", json=test_data)

+        mock_acompletion.assert_called_once_with(
+            **test_data,
+            litellm_call_id=mock.ANY,
+            litellm_logging_obj=mock.ANY,
+            request_timeout=mock.ANY,
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
+
        json_response = response.json()
        print("keys in json response", json_response.keys())
        assert json_response.keys() == {"error"}
@ -90,12 +122,21 @@ def test_chat_completion_exception_azure(client):


 # raise openai.AuthenticationError
-def test_embedding_auth_exception_azure(client):
+@mock.patch(
+    "litellm.proxy.proxy_server.llm_router.aembedding",
+    return_value=invalid_authentication_error_response,
+)
+def test_embedding_auth_exception_azure(mock_aembedding, client):
    try:
        # Your test data
        test_data = {"model": "azure-embedding", "input": ["hi"]}

        response = client.post("/embeddings", json=test_data)
+        mock_aembedding.assert_called_once_with(
+            **test_data,
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
        print("Response from proxy=", response)

        json_response = response.json()
@ -169,7 +210,7 @@ def test_chat_completion_exception_any_model(client):
        )
        assert isinstance(openai_exception, openai.BadRequestError)
        _error_message = openai_exception.message
-        assert "Invalid model name passed in model=Lite-GPT-12" in str(_error_message)
+        assert "chat_completion: Invalid model name passed in model=Lite-GPT-12" in str(_error_message)

    except Exception as e:
        pytest.fail(f"LiteLLM Proxy test failed. Exception {str(e)}")
@ -197,14 +238,18 @@ def test_embedding_exception_any_model(client):
        print("Exception raised=", openai_exception)
        assert isinstance(openai_exception, openai.BadRequestError)
        _error_message = openai_exception.message
-        assert "Invalid model name passed in model=Lite-GPT-12" in str(_error_message)
+        assert "embeddings: Invalid model name passed in model=Lite-GPT-12" in str(_error_message)

    except Exception as e:
        pytest.fail(f"LiteLLM Proxy test failed. Exception {str(e)}")


 # raise openai.BadRequestError
-def test_chat_completion_exception_azure_context_window(client):
+@mock.patch(
+    "litellm.proxy.proxy_server.llm_router.acompletion",
+    return_value=context_length_exceeded_error_response,
+)
+def test_chat_completion_exception_azure_context_window(mock_acompletion, client):
    try:
        # Your test data
        test_data = {
@ -219,20 +264,22 @@ def test_chat_completion_exception_azure_context_window(client):
        response = client.post("/chat/completions", json=test_data)
        print("got response from server", response)

+        mock_acompletion.assert_called_once_with(
+            **test_data,
+            litellm_call_id=mock.ANY,
+            litellm_logging_obj=mock.ANY,
+            request_timeout=mock.ANY,
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
+
        json_response = response.json()

        print("keys in json response", json_response.keys())

        assert json_response.keys() == {"error"}

-        assert json_response == {
-            "error": {
-                "message": "AzureException - Error code: 400 - {'error': {'message': \"This model's maximum context length is 4096 tokens. However, your messages resulted in 10007 tokens. Please reduce the length of the messages.\", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}",
-                "type": None,
-                "param": None,
-                "code": 400,
-            }
-        }
+        assert json_response == context_length_exceeded_error_response_dict

        # make an openai client to call _make_status_error_from_response
        openai_client = openai.OpenAI(api_key="anything")
--- a/litellm/tests/test_proxy_server.py
+++ b/litellm/tests/test_proxy_server.py
@ -1,5 +1,6 @@
 import sys, os
 import traceback
+from unittest import mock
 from dotenv import load_dotenv

 load_dotenv()
@ -35,6 +36,77 @@ token = "sk-1234"

 headers = {"Authorization": f"Bearer {token}"}

+example_completion_result = {
+    "choices": [
+        {
+            "message": {
+                "content": "Whispers of the wind carry dreams to me.",
+                "role": "assistant"
+            }
+        }
+    ],
+}
+example_embedding_result = {
+  "object": "list",
+  "data": [
+    {
+      "object": "embedding",
+      "index": 0,
+      "embedding": [
+        -0.006929283495992422,
+        -0.005336422007530928,
+        -4.547132266452536e-05,
+        -0.024047505110502243,
+        -0.006929283495992422,
+        -0.005336422007530928,
+        -4.547132266452536e-05,
+        -0.024047505110502243,
+        -0.006929283495992422,
+        -0.005336422007530928,
+        -4.547132266452536e-05,
+        -0.024047505110502243,
+      ],
+    }
+  ],
+  "model": "text-embedding-3-small",
+  "usage": {
+    "prompt_tokens": 5,
+    "total_tokens": 5
+  }
+}
+example_image_generation_result = {
+  "created": 1589478378,
+  "data": [
+    {
+      "url": "https://..."
+    },
+    {
+      "url": "https://..."
+    }
+  ]
+}
+
+
+def mock_patch_acompletion():
+    return mock.patch(
+        "litellm.proxy.proxy_server.llm_router.acompletion",
+        return_value=example_completion_result,
+    )
+
+
+def mock_patch_aembedding():
+    return mock.patch(
+        "litellm.proxy.proxy_server.llm_router.aembedding",
+        return_value=example_embedding_result,
+    )
+
+
+def mock_patch_aimage_generation():
+    return mock.patch(
+        "litellm.proxy.proxy_server.llm_router.aimage_generation",
+        return_value=example_image_generation_result,
+    )
+

@pytest.fixture(scope="function")
 def client_no_auth():
@ -52,7 +124,8 @@ def client_no_auth():
    return TestClient(app)


-def test_chat_completion(client_no_auth):
+@mock_patch_acompletion()
+def test_chat_completion(mock_acompletion, client_no_auth):
    global headers
    try:
        # Your test data
@ -66,6 +139,19 @@ def test_chat_completion(client_no_auth):

        print("testing proxy server with chat completions")
        response = client_no_auth.post("/v1/chat/completions", json=test_data)
+        mock_acompletion.assert_called_once_with(
+            model="gpt-3.5-turbo",
+            messages=[
+                {"role": "user", "content": "hi"},
+            ],
+            max_tokens=10,
+            litellm_call_id=mock.ANY,
+            litellm_logging_obj=mock.ANY,
+            request_timeout=mock.ANY,
+            specific_deployment=True,
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
        print(f"response - {response.text}")
        assert response.status_code == 200
        result = response.json()
@ -77,7 +163,8 @@ def test_chat_completion(client_no_auth):
 # Run the test


-def test_chat_completion_azure(client_no_auth):
+@mock_patch_acompletion()
+def test_chat_completion_azure(mock_acompletion, client_no_auth):
    global headers
    try:
        # Your test data
@ -92,6 +179,19 @@ def test_chat_completion_azure(client_no_auth):
        print("testing proxy server with Azure Request /chat/completions")
        response = client_no_auth.post("/v1/chat/completions", json=test_data)

+        mock_acompletion.assert_called_once_with(
+            model="azure/chatgpt-v-2",
+            messages=[
+                {"role": "user", "content": "write 1 sentence poem"},
+            ],
+            max_tokens=10,
+            litellm_call_id=mock.ANY,
+            litellm_logging_obj=mock.ANY,
+            request_timeout=mock.ANY,
+            specific_deployment=True,
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
        assert response.status_code == 200
        result = response.json()
        print(f"Received response: {result}")
@ -104,8 +204,51 @@ def test_chat_completion_azure(client_no_auth):
 # test_chat_completion_azure()


+@mock_patch_acompletion()
+def test_openai_deployments_model_chat_completions_azure(mock_acompletion, client_no_auth):
+    global headers
+    try:
+        # Your test data
+        test_data = {
+            "model": "azure/chatgpt-v-2",
+            "messages": [
+                {"role": "user", "content": "write 1 sentence poem"},
+            ],
+            "max_tokens": 10,
+        }
+
+        url = "/openai/deployments/azure/chatgpt-v-2/chat/completions"
+        print(f"testing proxy server with Azure Request {url}")
+        response = client_no_auth.post(url, json=test_data)
+
+        mock_acompletion.assert_called_once_with(
+            model="azure/chatgpt-v-2",
+            messages=[
+                {"role": "user", "content": "write 1 sentence poem"},
+            ],
+            max_tokens=10,
+            litellm_call_id=mock.ANY,
+            litellm_logging_obj=mock.ANY,
+            request_timeout=mock.ANY,
+            specific_deployment=True,
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
+        assert response.status_code == 200
+        result = response.json()
+        print(f"Received response: {result}")
+        assert len(result["choices"][0]["message"]["content"]) > 0
+    except Exception as e:
+        pytest.fail(f"LiteLLM Proxy test failed. Exception - {str(e)}")
+
+
+# Run the test
+# test_openai_deployments_model_chat_completions_azure()
+
+
 ### EMBEDDING
-def test_embedding(client_no_auth):
+@mock_patch_aembedding()
+def test_embedding(mock_aembedding, client_no_auth):
    global headers
    from litellm.proxy.proxy_server import user_custom_auth

@ -117,6 +260,13 @@ def test_embedding(client_no_auth):

        response = client_no_auth.post("/v1/embeddings", json=test_data)

+        mock_aembedding.assert_called_once_with(
+            model="azure/azure-embedding-model",
+            input=["good morning from litellm"],
+            specific_deployment=True,
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
        assert response.status_code == 200
        result = response.json()
        print(len(result["data"][0]["embedding"]))
@ -125,7 +275,8 @@ def test_embedding(client_no_auth):
        pytest.fail(f"LiteLLM Proxy test failed. Exception - {str(e)}")


-def test_bedrock_embedding(client_no_auth):
+@mock_patch_aembedding()
+def test_bedrock_embedding(mock_aembedding, client_no_auth):
    global headers
    from litellm.proxy.proxy_server import user_custom_auth

@ -137,6 +288,12 @@ def test_bedrock_embedding(client_no_auth):

        response = client_no_auth.post("/v1/embeddings", json=test_data)

+        mock_aembedding.assert_called_once_with(
+            model="amazon-embeddings",
+            input=["good morning from litellm"],
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
        assert response.status_code == 200
        result = response.json()
        print(len(result["data"][0]["embedding"]))
@ -171,7 +328,8 @@ def test_sagemaker_embedding(client_no_auth):
 #### IMAGE GENERATION


-def test_img_gen(client_no_auth):
+@mock_patch_aimage_generation()
+def test_img_gen(mock_aimage_generation, client_no_auth):
    global headers
    from litellm.proxy.proxy_server import user_custom_auth

@ -185,6 +343,14 @@ def test_img_gen(client_no_auth):

        response = client_no_auth.post("/v1/images/generations", json=test_data)

+        mock_aimage_generation.assert_called_once_with(
+            model='dall-e-3',
+            prompt='A cute baby sea otter',
+            n=1,
+            size='1024x1024',
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
        assert response.status_code == 200
        result = response.json()
        print(len(result["data"][0]["url"]))
@ -249,7 +415,8 @@ class MyCustomHandler(CustomLogger):
 customHandler = MyCustomHandler()


-def test_chat_completion_optional_params(client_no_auth):
+@mock_patch_acompletion()
+def test_chat_completion_optional_params(mock_acompletion, client_no_auth):
    # [PROXY: PROD TEST] - DO NOT DELETE
    # This tests if all the /chat/completion params are passed to litellm
    try:
@ -267,6 +434,20 @@ def test_chat_completion_optional_params(client_no_auth):
        litellm.callbacks = [customHandler]
        print("testing proxy server: optional params")
        response = client_no_auth.post("/v1/chat/completions", json=test_data)
+        mock_acompletion.assert_called_once_with(
+            model="gpt-3.5-turbo",
+            messages=[
+                {"role": "user", "content": "hi"},
+            ],
+            max_tokens=10,
+            user="proxy-user",
+            litellm_call_id=mock.ANY,
+            litellm_logging_obj=mock.ANY,
+            request_timeout=mock.ANY,
+            specific_deployment=True,
+            metadata=mock.ANY,
+            proxy_server_request=mock.ANY,
+        )
        assert response.status_code == 200
        result = response.json()
        print(f"Received response: {result}")
--- a/litellm/tests/test_router_debug_logs.py
+++ b/litellm/tests/test_router_debug_logs.py
@ -82,7 +82,7 @@ def test_async_fallbacks(caplog):
    # Define the expected log messages
    # - error request, falling back notice, success notice
    expected_logs = [
-        "litellm.acompletion(model=gpt-3.5-turbo)\x1b[31m Exception OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided: bad-key. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}\x1b[0m",
+        "litellm.acompletion(model=gpt-3.5-turbo)\x1b[31m Exception OpenAIException - Error code: 401 - {'error': {'message': 'Incorrect API key provided: bad-key. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} \nModel: gpt-3.5-turbo\nAPI Base: https://api.openai.com\nMessages: [{'content': 'Hello, how are you?', 'role': 'user'}]\nmodel_group: gpt-3.5-turbo\n\ndeployment: gpt-3.5-turbo\n\x1b[0m",
        "litellm.acompletion(model=None)\x1b[31m Exception No deployments available for selected model, passed model=gpt-3.5-turbo\x1b[0m",
        "Falling back to model_group = azure/gpt-3.5-turbo",
        "litellm.acompletion(model=azure/chatgpt-v-2)\x1b[32m 200 OK\x1b[0m",
--- a/litellm/tests/test_router_fallbacks.py
+++ b/litellm/tests/test_router_fallbacks.py
@ -854,7 +854,7 @@ def test_ausage_based_routing_fallbacks():
        assert response._hidden_params["model_id"] == "1"

        # now make 100 mock requests to OpenAI - expect it to fallback to anthropic-claude-instant-1.2
-        for i in range(20):
+        for i in range(21):
            response = router.completion(
                model="azure/gpt-4-fast",
                messages=messages,
@ -863,7 +863,7 @@ def test_ausage_based_routing_fallbacks():
            )
            print("response: ", response)
            print("response._hidden_params: ", response._hidden_params)
-            if i == 19:
+            if i == 20:
                # by the 19th call we should have hit TPM LIMIT for OpenAI, it should fallback to anthropic-claude-instant-1.2
                assert response._hidden_params["model_id"] == "4"

--- a/litellm/tests/test_router_retries.py
+++ b/litellm/tests/test_router_retries.py
@ -119,3 +119,127 @@ async def test_router_retries_errors(sync_mode, error_type):
        assert customHandler.previous_models == 0  # 0 retries
    else:
        assert customHandler.previous_models == 2  # 2 retries
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize(
+    "error_type",
+    ["AuthenticationErrorRetries", "ContentPolicyViolationErrorRetries"],  #
+)
+async def test_router_retry_policy(error_type):
+    from litellm.router import RetryPolicy
+
+    retry_policy = RetryPolicy(
+        ContentPolicyViolationErrorRetries=3, AuthenticationErrorRetries=0
+    )
+
+    router = litellm.Router(
+        model_list=[
+            {
+                "model_name": "gpt-3.5-turbo",  # openai model name
+                "litellm_params": {  # params for litellm completion/embedding call
+                    "model": "azure/chatgpt-v-2",
+                    "api_key": os.getenv("AZURE_API_KEY"),
+                    "api_version": os.getenv("AZURE_API_VERSION"),
+                    "api_base": os.getenv("AZURE_API_BASE"),
+                },
+            },
+            {
+                "model_name": "bad-model",  # openai model name
+                "litellm_params": {  # params for litellm completion/embedding call
+                    "model": "azure/chatgpt-v-2",
+                    "api_key": "bad-key",
+                    "api_version": os.getenv("AZURE_API_VERSION"),
+                    "api_base": os.getenv("AZURE_API_BASE"),
+                },
+            },
+        ],
+        retry_policy=retry_policy,
+    )
+
+    customHandler = MyCustomHandler()
+    litellm.callbacks = [customHandler]
+    if error_type == "AuthenticationErrorRetries":
+        model = "bad-model"
+        messages = [{"role": "user", "content": "Hello good morning"}]
+    elif error_type == "ContentPolicyViolationErrorRetries":
+        model = "gpt-3.5-turbo"
+        messages = [{"role": "user", "content": "where do i buy lethal drugs from"}]
+
+    try:
+        litellm.set_verbose = True
+        response = await router.acompletion(
+            model=model,
+            messages=messages,
+        )
+    except Exception as e:
+        print("got an exception", e)
+        pass
+    asyncio.sleep(0.05)
+
+    print("customHandler.previous_models: ", customHandler.previous_models)
+
+    if error_type == "AuthenticationErrorRetries":
+        assert customHandler.previous_models == 0
+    elif error_type == "ContentPolicyViolationErrorRetries":
+        assert customHandler.previous_models == 3
+
+
+@pytest.mark.parametrize("model_group", ["gpt-3.5-turbo", "bad-model"])
+@pytest.mark.asyncio
+async def test_dynamic_router_retry_policy(model_group):
+    from litellm.router import RetryPolicy
+
+    model_group_retry_policy = {
+        "gpt-3.5-turbo": RetryPolicy(ContentPolicyViolationErrorRetries=0),
+        "bad-model": RetryPolicy(AuthenticationErrorRetries=4),
+    }
+
+    router = litellm.Router(
+        model_list=[
+            {
+                "model_name": "gpt-3.5-turbo",  # openai model name
+                "litellm_params": {  # params for litellm completion/embedding call
+                    "model": "azure/chatgpt-v-2",
+                    "api_key": os.getenv("AZURE_API_KEY"),
+                    "api_version": os.getenv("AZURE_API_VERSION"),
+                    "api_base": os.getenv("AZURE_API_BASE"),
+                },
+            },
+            {
+                "model_name": "bad-model",  # openai model name
+                "litellm_params": {  # params for litellm completion/embedding call
+                    "model": "azure/chatgpt-v-2",
+                    "api_key": "bad-key",
+                    "api_version": os.getenv("AZURE_API_VERSION"),
+                    "api_base": os.getenv("AZURE_API_BASE"),
+                },
+            },
+        ],
+        model_group_retry_policy=model_group_retry_policy,
+    )
+
+    customHandler = MyCustomHandler()
+    litellm.callbacks = [customHandler]
+    if model_group == "bad-model":
+        model = "bad-model"
+        messages = [{"role": "user", "content": "Hello good morning"}]
+
+    elif model_group == "gpt-3.5-turbo":
+        model = "gpt-3.5-turbo"
+        messages = [{"role": "user", "content": "where do i buy lethal drugs from"}]
+
+    try:
+        litellm.set_verbose = True
+        response = await router.acompletion(model=model, messages=messages)
+    except Exception as e:
+        print("got an exception", e)
+        pass
+    asyncio.sleep(0.05)
+
+    print("customHandler.previous_models: ", customHandler.previous_models)
+
+    if model_group == "bad-model":
+        assert customHandler.previous_models == 4
+    elif model_group == "gpt-3.5-turbo":
+        assert customHandler.previous_models == 0
--- a/litellm/tests/test_rules.py
+++ b/litellm/tests/test_rules.py
@ -127,8 +127,8 @@ def test_post_call_rule_streaming():
        print(type(e))
        print(vars(e))
        assert (
-            e.message
-            == "OpenAIException - This violates LiteLLM Proxy Rules. Response too short"
+            "OpenAIException - This violates LiteLLM Proxy Rules. Response too short"
+            in e.message
        )


--- a/litellm/tests/test_timeout.py
+++ b/litellm/tests/test_timeout.py
@ -10,7 +10,37 @@ sys.path.insert(
 import time
 import litellm
 import openai
-import pytest, uuid
+import pytest, uuid, httpx
+
+
+@pytest.mark.parametrize(
+    "model, provider",
+    [
+        ("gpt-3.5-turbo", "openai"),
+        ("anthropic.claude-instant-v1", "bedrock"),
+        ("azure/chatgpt-v-2", "azure"),
+    ],
+)
+@pytest.mark.parametrize("sync_mode", [True, False])
+@pytest.mark.asyncio
+async def test_httpx_timeout(model, provider, sync_mode):
+    """
+    Test if setting httpx.timeout works for completion calls
+    """
+    timeout_val = httpx.Timeout(10.0, connect=60.0)
+
+    messages = [{"role": "user", "content": "Hey, how's it going?"}]
+
+    if sync_mode:
+        response = litellm.completion(
+            model=model, messages=messages, timeout=timeout_val
+        )
+    else:
+        response = await litellm.acompletion(
+            model=model, messages=messages, timeout=timeout_val
+        )
+
+    print(f"response: {response}")


 def test_timeout():
--- a/litellm/tests/test_token_counter.py
+++ b/litellm/tests/test_token_counter.py
@ -9,7 +9,7 @@ sys.path.insert(
    0, os.path.abspath("../..")
 )  # Adds the parent directory to the system path
 import time
-from litellm import token_counter, encode, decode
+from litellm import token_counter, create_pretrained_tokenizer, encode, decode


 def test_token_counter_normal_plus_function_calling():
@ -69,15 +69,23 @@ def test_tokenizers():
            model="meta-llama/Llama-2-7b-chat", text=sample_text
        )

+        # llama3 tokenizer (also testing custom tokenizer)
+        llama3_tokens_1 = token_counter(model="meta-llama/llama-3-70b-instruct", text=sample_text)
+
+        llama3_tokenizer = create_pretrained_tokenizer("Xenova/llama-3-tokenizer")
+        llama3_tokens_2 = token_counter(custom_tokenizer=llama3_tokenizer, text=sample_text)
+
        print(
-            f"openai tokens: {openai_tokens}; claude tokens: {claude_tokens}; cohere tokens: {cohere_tokens}; llama2 tokens: {llama2_tokens}"
+            f"openai tokens: {openai_tokens}; claude tokens: {claude_tokens}; cohere tokens: {cohere_tokens}; llama2 tokens: {llama2_tokens}; llama3 tokens: {llama3_tokens_1}"
        )

        # assert that all token values are different
        assert (
-            openai_tokens != cohere_tokens != llama2_tokens
+            openai_tokens != cohere_tokens != llama2_tokens != llama3_tokens_1
        ), "Token values are not different."

+        assert llama3_tokens_1 == llama3_tokens_2, "Custom tokenizer is not being used! It has been configured to use the same tokenizer as the built in llama3 tokenizer and the results should be the same."
+
        print("test tokenizer: It worked!")
    except Exception as e:
        pytest.fail(f"An exception occured: {e}")
--- a/litellm/tests/test_utils.py
+++ b/litellm/tests/test_utils.py
@ -20,6 +20,8 @@ from litellm.utils import (
    validate_environment,
    function_to_dict,
    token_counter,
+    create_pretrained_tokenizer,
+    create_tokenizer,
 )

 # Assuming your trim_messages, shorten_message_to_fit_limit, and get_token_count functions are all in a module named 'message_utils'
--- a/litellm/types/completion.py
+++ b/litellm/types/completion.py
@ -1,7 +1,167 @@
-from typing import List, Optional, Union
+from typing import List, Optional, Union, Iterable

 from pydantic import BaseModel, validator

+from typing_extensions import Literal, Required, TypedDict
+
+
+class ChatCompletionSystemMessageParam(TypedDict, total=False):
+    content: Required[str]
+    """The contents of the system message."""
+
+    role: Required[Literal["system"]]
+    """The role of the messages author, in this case `system`."""
+
+    name: str
+    """An optional name for the participant.
+
+    Provides the model information to differentiate between participants of the same
+    role.
+    """
+
+
+class ChatCompletionContentPartTextParam(TypedDict, total=False):
+    text: Required[str]
+    """The text content."""
+
+    type: Required[Literal["text"]]
+    """The type of the content part."""
+
+
+class ImageURL(TypedDict, total=False):
+    url: Required[str]
+    """Either a URL of the image or the base64 encoded image data."""
+
+    detail: Literal["auto", "low", "high"]
+    """Specifies the detail level of the image.
+
+    Learn more in the
+    [Vision guide](https://platform.openai.com/docs/guides/vision/low-or-high-fidelity-image-understanding).
+    """
+
+
+class ChatCompletionContentPartImageParam(TypedDict, total=False):
+    image_url: Required[ImageURL]
+
+    type: Required[Literal["image_url"]]
+    """The type of the content part."""
+
+
+ChatCompletionContentPartParam = Union[
+    ChatCompletionContentPartTextParam, ChatCompletionContentPartImageParam
+]
+
+
+class ChatCompletionUserMessageParam(TypedDict, total=False):
+    content: Required[Union[str, Iterable[ChatCompletionContentPartParam]]]
+    """The contents of the user message."""
+
+    role: Required[Literal["user"]]
+    """The role of the messages author, in this case `user`."""
+
+    name: str
+    """An optional name for the participant.
+
+    Provides the model information to differentiate between participants of the same
+    role.
+    """
+
+
+class FunctionCall(TypedDict, total=False):
+    arguments: Required[str]
+    """
+    The arguments to call the function with, as generated by the model in JSON
+    format. Note that the model does not always generate valid JSON, and may
+    hallucinate parameters not defined by your function schema. Validate the
+    arguments in your code before calling your function.
+    """
+
+    name: Required[str]
+    """The name of the function to call."""
+
+
+class Function(TypedDict, total=False):
+    arguments: Required[str]
+    """
+    The arguments to call the function with, as generated by the model in JSON
+    format. Note that the model does not always generate valid JSON, and may
+    hallucinate parameters not defined by your function schema. Validate the
+    arguments in your code before calling your function.
+    """
+
+    name: Required[str]
+    """The name of the function to call."""
+
+
+class ChatCompletionToolMessageParam(TypedDict, total=False):
+    content: Required[str]
+    """The contents of the tool message."""
+
+    role: Required[Literal["tool"]]
+    """The role of the messages author, in this case `tool`."""
+
+    tool_call_id: Required[str]
+    """Tool call that this message is responding to."""
+
+
+class ChatCompletionFunctionMessageParam(TypedDict, total=False):
+    content: Required[Optional[str]]
+    """The contents of the function message."""
+
+    name: Required[str]
+    """The name of the function to call."""
+
+    role: Required[Literal["function"]]
+    """The role of the messages author, in this case `function`."""
+
+
+class ChatCompletionMessageToolCallParam(TypedDict, total=False):
+    id: Required[str]
+    """The ID of the tool call."""
+
+    function: Required[Function]
+    """The function that the model called."""
+
+    type: Required[Literal["function"]]
+    """The type of the tool. Currently, only `function` is supported."""
+
+
+class ChatCompletionAssistantMessageParam(TypedDict, total=False):
+    role: Required[Literal["assistant"]]
+    """The role of the messages author, in this case `assistant`."""
+
+    content: Optional[str]
+    """The contents of the assistant message.
+
+    Required unless `tool_calls` or `function_call` is specified.
+    """
+
+    function_call: FunctionCall
+    """Deprecated and replaced by `tool_calls`.
+
+    The name and arguments of a function that should be called, as generated by the
+    model.
+    """
+
+    name: str
+    """An optional name for the participant.
+
+    Provides the model information to differentiate between participants of the same
+    role.
+    """
+
+    tool_calls: Iterable[ChatCompletionMessageToolCallParam]
+    """The tool calls generated by the model, such as function calls."""
+
+
+ChatCompletionMessageParam = Union[
+    ChatCompletionSystemMessageParam,
+    ChatCompletionUserMessageParam,
+    ChatCompletionAssistantMessageParam,
+    ChatCompletionFunctionMessageParam,
+    ChatCompletionToolMessageParam,
+]
+

 class CompletionRequest(BaseModel):
    model: str
@ -12,7 +172,7 @@ class CompletionRequest(BaseModel):
    n: Optional[int] = None
    stream: Optional[bool] = None
    stop: Optional[dict] = None
-    max_tokens: Optional[float] = None
+    max_tokens: Optional[int] = None
    presence_penalty: Optional[float] = None
    frequency_penalty: Optional[float] = None
    logit_bias: Optional[dict] = None
--- a/litellm/types/llms/init.py
+++ b/litellm/types/llms/init.py
@ -0,0 +1,3 @@
+__all__ = ["openai"]
+
+from . import openai
--- a/litellm/types/llms/anthropic.py
+++ b/litellm/types/llms/anthropic.py
@ -0,0 +1,42 @@
+from typing import List, Optional, Union, Iterable
+
+from pydantic import BaseModel, validator
+
+from typing_extensions import Literal, Required, TypedDict
+
+
+class AnthopicMessagesAssistantMessageTextContentParam(TypedDict, total=False):
+    type: Required[Literal["text"]]
+
+    text: str
+
+
+class AnthopicMessagesAssistantMessageToolCallParam(TypedDict, total=False):
+    type: Required[Literal["tool_use"]]
+
+    id: str
+
+    name: str
+
+    input: dict
+
+
+AnthropicMessagesAssistantMessageValues = Union[
+    AnthopicMessagesAssistantMessageTextContentParam,
+    AnthopicMessagesAssistantMessageToolCallParam,
+]
+
+
+class AnthopicMessagesAssistantMessageParam(TypedDict, total=False):
+    content: Required[Union[str, Iterable[AnthropicMessagesAssistantMessageValues]]]
+    """The contents of the system message."""
+
+    role: Required[Literal["assistant"]]
+    """The role of the messages author, in this case `author`."""
+
+    name: str
+    """An optional name for the participant.
+
+    Provides the model information to differentiate between participants of the same
+    role.
+    """
--- a/litellm/types/llms/openai.py
+++ b/litellm/types/llms/openai.py
@ -0,0 +1,148 @@
+from typing import (
+    Optional,
+    Union,
+    Any,
+    BinaryIO,
+    Literal,
+    Iterable,
+)
+from typing_extensions import override, Required
+from pydantic import BaseModel
+
+from openai.types.beta.threads.message_content import MessageContent
+from openai.types.beta.threads.message import Message as OpenAIMessage
+from openai.types.beta.thread_create_params import (
+    Message as OpenAICreateThreadParamsMessage,
+)
+from openai.types.beta.assistant_tool_param import AssistantToolParam
+from openai.types.beta.threads.run import Run
+from openai.types.beta.assistant import Assistant
+from openai.pagination import SyncCursorPage
+
+from typing import TypedDict, List, Optional
+
+
+class NotGiven:
+    """
+    A sentinel singleton class used to distinguish omitted keyword arguments
+    from those passed in with the value None (which may have different behavior).
+
+    For example:
+
+    ```py
+    def get(timeout: Union[int, NotGiven, None] = NotGiven()) -> Response:
+        ...
+
+
+    get(timeout=1)  # 1s timeout
+    get(timeout=None)  # No timeout
+    get()  # Default timeout behavior, which may not be statically known at the method definition.
+    ```
+    """
+
+    def __bool__(self) -> Literal[False]:
+        return False
+
+    @override
+    def __repr__(self) -> str:
+        return "NOT_GIVEN"
+
+
+NOT_GIVEN = NotGiven()
+
+
+class ToolResourcesCodeInterpreter(TypedDict, total=False):
+    file_ids: List[str]
+    """
+    A list of [file](https://platform.openai.com/docs/api-reference/files) IDs made
+    available to the `code_interpreter` tool. There can be a maximum of 20 files
+    associated with the tool.
+    """
+
+
+class ToolResourcesFileSearchVectorStore(TypedDict, total=False):
+    file_ids: List[str]
+    """
+    A list of [file](https://platform.openai.com/docs/api-reference/files) IDs to
+    add to the vector store. There can be a maximum of 10000 files in a vector
+    store.
+    """
+
+    metadata: object
+    """Set of 16 key-value pairs that can be attached to a vector store.
+
+    This can be useful for storing additional information about the vector store in
+    a structured format. Keys can be a maximum of 64 characters long and values can
+    be a maxium of 512 characters long.
+    """
+
+
+class ToolResourcesFileSearch(TypedDict, total=False):
+    vector_store_ids: List[str]
+    """
+    The
+    [vector store](https://platform.openai.com/docs/api-reference/vector-stores/object)
+    attached to this thread. There can be a maximum of 1 vector store attached to
+    the thread.
+    """
+
+    vector_stores: Iterable[ToolResourcesFileSearchVectorStore]
+    """
+    A helper to create a
+    [vector store](https://platform.openai.com/docs/api-reference/vector-stores/object)
+    with file_ids and attach it to this thread. There can be a maximum of 1 vector
+    store attached to the thread.
+    """
+
+
+class OpenAICreateThreadParamsToolResources(TypedDict, total=False):
+    code_interpreter: ToolResourcesCodeInterpreter
+
+    file_search: ToolResourcesFileSearch
+
+
+class FileSearchToolParam(TypedDict, total=False):
+    type: Required[Literal["file_search"]]
+    """The type of tool being defined: `file_search`"""
+
+
+class CodeInterpreterToolParam(TypedDict, total=False):
+    type: Required[Literal["code_interpreter"]]
+    """The type of tool being defined: `code_interpreter`"""
+
+
+AttachmentTool = Union[CodeInterpreterToolParam, FileSearchToolParam]
+
+
+class Attachment(TypedDict, total=False):
+    file_id: str
+    """The ID of the file to attach to the message."""
+
+    tools: Iterable[AttachmentTool]
+    """The tools to add this file to."""
+
+
+class MessageData(TypedDict):
+    role: Literal["user", "assistant"]
+    content: str
+    attachments: Optional[List[Attachment]]
+    metadata: Optional[dict]
+
+
+class Thread(BaseModel):
+    id: str
+    """The identifier, which can be referenced in API endpoints."""
+
+    created_at: int
+    """The Unix timestamp (in seconds) for when the thread was created."""
+
+    metadata: Optional[object] = None
+    """Set of 16 key-value pairs that can be attached to an object.
+
+    This can be useful for storing additional information about the object in a
+    structured format. Keys can be a maximum of 64 characters long and values can be
+    a maxium of 512 characters long.
+    """
+
+    object: Literal["thread"]
+    """The object type, which is always `thread`."""
--- a/litellm/types/router.py
+++ b/litellm/types/router.py
@ -97,8 +97,12 @@ class ModelInfo(BaseModel):
        setattr(self, key, value)


-class LiteLLM_Params(BaseModel):
-    model: str
+class GenericLiteLLMParams(BaseModel):
+    """
+    LiteLLM Params without 'model' arg (used across completion / assistants api)
+    """
+
+    custom_llm_provider: Optional[str] = None
    tpm: Optional[int] = None
    rpm: Optional[int] = None
    api_key: Optional[str] = None
@ -120,9 +124,70 @@ class LiteLLM_Params(BaseModel):
    aws_secret_access_key: Optional[str] = None
    aws_region_name: Optional[str] = None

+    def __init__(
+        self,
+        custom_llm_provider: Optional[str] = None,
+        max_retries: Optional[Union[int, str]] = None,
+        tpm: Optional[int] = None,
+        rpm: Optional[int] = None,
+        api_key: Optional[str] = None,
+        api_base: Optional[str] = None,
+        api_version: Optional[str] = None,
+        timeout: Optional[Union[float, str]] = None,  # if str, pass in as os.environ/
+        stream_timeout: Optional[Union[float, str]] = (
+            None  # timeout when making stream=True calls, if str, pass in as os.environ/
+        ),
+        organization: Optional[str] = None,  # for openai orgs
+        ## VERTEX AI ##
+        vertex_project: Optional[str] = None,
+        vertex_location: Optional[str] = None,
+        ## AWS BEDROCK / SAGEMAKER ##
+        aws_access_key_id: Optional[str] = None,
+        aws_secret_access_key: Optional[str] = None,
+        aws_region_name: Optional[str] = None,
+        **params
+    ):
+        args = locals()
+        args.pop("max_retries", None)
+        args.pop("self", None)
+        args.pop("params", None)
+        args.pop("__class__", None)
+        if max_retries is not None and isinstance(max_retries, str):
+            max_retries = int(max_retries)  # cast to int
+        super().__init__(max_retries=max_retries, **args, **params)
+
+    class Config:
+        extra = "allow"
+        arbitrary_types_allowed = True
+
+    def __contains__(self, key):
+        # Define custom behavior for the 'in' operator
+        return hasattr(self, key)
+
+    def get(self, key, default=None):
+        # Custom .get() method to access attributes with a default value if the attribute doesn't exist
+        return getattr(self, key, default)
+
+    def __getitem__(self, key):
+        # Allow dictionary-style access to attributes
+        return getattr(self, key)
+
+    def __setitem__(self, key, value):
+        # Allow dictionary-style assignment of attributes
+        setattr(self, key, value)
+
+
+class LiteLLM_Params(GenericLiteLLMParams):
+    """
+    LiteLLM Params with 'model' requirement - used for completions
+    """
+
+    model: str
+
    def __init__(
        self,
        model: str,
+        custom_llm_provider: Optional[str] = None,
        max_retries: Optional[Union[int, str]] = None,
        tpm: Optional[int] = None,
        rpm: Optional[int] = None,
@ -264,3 +329,18 @@ class RouterErrors(enum.Enum):

    user_defined_ratelimit_error = "Deployment over user-defined ratelimit."
    no_deployments_available = "No deployments available for selected model"
+
+
+class RetryPolicy(BaseModel):
+    """
+    Use this to set a custom number of retries per exception type
+    If RateLimitErrorRetries = 3, then 3 retries will be made for RateLimitError
+    Mapping of Exception type to number of retries
+    https://docs.litellm.ai/docs/exception_mapping
+    """
+
+    BadRequestErrorRetries: Optional[int] = None
+    AuthenticationErrorRetries: Optional[int] = None
+    TimeoutErrorRetries: Optional[int] = None
+    RateLimitErrorRetries: Optional[int] = None
+    ContentPolicyViolationErrorRetries: Optional[int] = None
--- a/litellm/utils.py
+++ b/litellm/utils.py
@ -315,6 +315,7 @@ class ChatCompletionDeltaToolCall(OpenAIObject):
 class HiddenParams(OpenAIObject):
    original_response: Optional[str] = None
    model_id: Optional[str] = None  # used in Router for individual deployments
+    api_base: Optional[str] = None  # returns api base used for making completion call

    class Config:
        extra = "allow"
@ -378,16 +379,13 @@ class Message(OpenAIObject):
        super(Message, self).__init__(**params)
        self.content = content
        self.role = role
-        self.tool_calls = None
-        self.function_call = None
-
        if function_call is not None:
            self.function_call = FunctionCall(**function_call)

        if tool_calls is not None:
-            self.tool_calls = [
-                ChatCompletionMessageToolCall(**tool_call) for tool_call in tool_calls
-            ]
+            self.tool_calls = []
+            for tool_call in tool_calls:
+                self.tool_calls.append(ChatCompletionMessageToolCall(**tool_call))

        if logprobs is not None:
            self._logprobs = ChoiceLogprobs(**logprobs)
@ -413,8 +411,6 @@ class Message(OpenAIObject):


 class Delta(OpenAIObject):
-    tool_calls: Optional[List[ChatCompletionDeltaToolCall]] = None
-
    def __init__(
        self,
        content=None,
@ -1700,10 +1696,17 @@ class Logging:
                                print_verbose("reaches langfuse for streaming logging!")
                                result = kwargs["complete_streaming_response"]
                        if langFuseLogger is None or (
-                            self.langfuse_public_key != langFuseLogger.public_key
-                            and self.langfuse_secret != langFuseLogger.secret_key
+                            (
+                                self.langfuse_public_key is not None
+                                and self.langfuse_public_key
+                                != langFuseLogger.public_key
+                            )
+                            and (
+                                self.langfuse_public_key is not None
+                                and self.langfuse_public_key
+                                != langFuseLogger.public_key
+                            )
                        ):
-                            print_verbose("Instantiates langfuse client")
                            langFuseLogger = LangFuseLogger(
                                langfuse_public_key=self.langfuse_public_key,
                                langfuse_secret=self.langfuse_secret,
@ -3155,6 +3158,10 @@ def client(original_function):
                result._hidden_params["model_id"] = kwargs.get("model_info", {}).get(
                    "id", None
                )
+                result._hidden_params["api_base"] = get_api_base(
+                    model=model,
+                    optional_params=getattr(logging_obj, "optional_params", {}),
+                )
            result._response_ms = (
                end_time - start_time
            ).total_seconds() * 1000  # return response latency in ms like openai
@ -3224,6 +3231,8 @@ def client(original_function):
        call_type = original_function.__name__
        if "litellm_call_id" not in kwargs:
            kwargs["litellm_call_id"] = str(uuid.uuid4())
+
+        model = ""
        try:
            model = args[0] if len(args) > 0 else kwargs["model"]
        except:
@ -3545,6 +3554,10 @@ def client(original_function):
                result._hidden_params["model_id"] = kwargs.get("model_info", {}).get(
                    "id", None
                )
+                result._hidden_params["api_base"] = get_api_base(
+                    model=model,
+                    optional_params=kwargs,
+                )
            if (
                isinstance(result, ModelResponse)
                or isinstance(result, EmbeddingResponse)
@ -3773,29 +3786,34 @@ def _select_tokenizer(model: str):
    elif "llama-2" in model.lower() or "replicate" in model.lower():
        tokenizer = Tokenizer.from_pretrained("hf-internal-testing/llama-tokenizer")
        return {"type": "huggingface_tokenizer", "tokenizer": tokenizer}
+    # llama3
+    elif "llama-3" in model.lower():
+        tokenizer = Tokenizer.from_pretrained("Xenova/llama-3-tokenizer")
+        return {"type": "huggingface_tokenizer", "tokenizer": tokenizer}
    # default - tiktoken
    else:
        return {"type": "openai_tokenizer", "tokenizer": encoding}


-def encode(model: str, text: str):
+def encode(model="", text="", custom_tokenizer: Optional[dict] = None):
    """
    Encodes the given text using the specified model.

    Args:
        model (str): The name of the model to use for tokenization.
+        custom_tokenizer (Optional[dict]): A custom tokenizer created with the `create_pretrained_tokenizer` or `create_tokenizer` method. Must be a dictionary with a string value for `type` and Tokenizer for `tokenizer`. Default is None.
        text (str): The text to be encoded.

    Returns:
        enc: The encoded text.
    """
-    tokenizer_json = _select_tokenizer(model=model)
+    tokenizer_json = custom_tokenizer or _select_tokenizer(model=model)
    enc = tokenizer_json["tokenizer"].encode(text)
    return enc


-def decode(model: str, tokens: List[int]):
-    tokenizer_json = _select_tokenizer(model=model)
+def decode(model="", tokens: List[int] = [], custom_tokenizer: Optional[dict] = None):
+    tokenizer_json = custom_tokenizer or _select_tokenizer(model=model)
    dec = tokenizer_json["tokenizer"].decode(tokens)
    return dec

@ -3969,8 +3987,45 @@ def calculage_img_tokens(
        return total_tokens


+def create_pretrained_tokenizer(
+    identifier: str, revision="main", auth_token: Optional[str] = None
+):
+    """
+    Creates a tokenizer from an existing file on a HuggingFace repository to be used with `token_counter`.
+
+    Args:
+    identifier (str): The identifier of a Model on the Hugging Face Hub, that contains a tokenizer.json file
+    revision (str, defaults to main): A branch or commit id
+    auth_token (str, optional, defaults to None): An optional auth token used to access private repositories on the Hugging Face Hub
+
+    Returns:
+    dict: A dictionary with the tokenizer and its type.
+    """
+
+    tokenizer = Tokenizer.from_pretrained(
+        identifier, revision=revision, auth_token=auth_token
+    )
+    return {"type": "huggingface_tokenizer", "tokenizer": tokenizer}
+
+
+def create_tokenizer(json: str):
+    """
+    Creates a tokenizer from a valid JSON string for use with `token_counter`.
+
+    Args:
+    json (str): A valid JSON string representing a previously serialized tokenizer
+
+    Returns:
+    dict: A dictionary with the tokenizer and its type.
+    """
+
+    tokenizer = Tokenizer.from_str(json)
+    return {"type": "huggingface_tokenizer", "tokenizer": tokenizer}
+
+
 def token_counter(
    model="",
+    custom_tokenizer: Optional[dict] = None,
    text: Optional[Union[str, List[str]]] = None,
    messages: Optional[List] = None,
    count_response_tokens: Optional[bool] = False,
@ -3980,13 +4035,14 @@ def token_counter(

    Args:
    model (str): The name of the model to use for tokenization. Default is an empty string.
+    custom_tokenizer (Optional[dict]): A custom tokenizer created with the `create_pretrained_tokenizer` or `create_tokenizer` method. Must be a dictionary with a string value for `type` and Tokenizer for `tokenizer`. Default is None.
    text (str): The raw text string to be passed to the model. Default is None.
    messages (Optional[List[Dict[str, str]]]): Alternative to passing in text. A list of dictionaries representing messages with "role" and "content" keys. Default is None.

    Returns:
    int: The number of tokens in the text.
    """
-    # use tiktoken, anthropic, cohere or llama2's tokenizer depending on the model
+    # use tiktoken, anthropic, cohere, llama2, or llama3's tokenizer depending on the model
    is_tool_call = False
    num_tokens = 0
    if text == None:
@ -4028,8 +4084,8 @@ def token_counter(
    elif isinstance(text, str):
        count_response_tokens = True  # user just trying to count tokens for a text. don't add the chat_ml +3 tokens to this

-    if model is not None:
-        tokenizer_json = _select_tokenizer(model=model)
+    if model is not None or custom_tokenizer is not None:
+        tokenizer_json = custom_tokenizer or _select_tokenizer(model=model)
        if tokenizer_json["type"] == "huggingface_tokenizer":
            print_verbose(
                f"Token Counter - using hugging face token counter, for model={model}"
@ -4397,7 +4453,19 @@ def completion_cost(
        raise e


-def supports_function_calling(model: str):
+def supports_httpx_timeout(custom_llm_provider: str) -> bool:
+    """
+    Helper function to know if a provider implementation supports httpx timeout
+    """
+    supported_providers = ["openai", "azure", "bedrock"]
+
+    if custom_llm_provider in supported_providers:
+        return True
+
+    return False
+
+
+def supports_function_calling(model: str) -> bool:
    """
    Check if the given model supports function calling and return a boolean value.

@ -4698,6 +4766,27 @@ def get_optional_params_embeddings(
                status_code=500,
                message=f"Setting user/encoding format is not supported by {custom_llm_provider}. To drop it from the call, set `litellm.drop_params = True`.",
            )
+    if custom_llm_provider == "bedrock":
+        # if dimensions is in non_default_params -> pass it for model=bedrock/amazon.titan-embed-text-v2
+        if (
+            "dimensions" in non_default_params.keys()
+            and "amazon.titan-embed-text-v2" in model
+        ):
+            kwargs["dimensions"] = non_default_params["dimensions"]
+            non_default_params.pop("dimensions", None)
+
+        if len(non_default_params.keys()) > 0:
+            if litellm.drop_params is True:  # drop the unsupported non-default values
+                keys = list(non_default_params.keys())
+                for k in keys:
+                    non_default_params.pop(k, None)
+                final_params = {**non_default_params, **kwargs}
+                return final_params
+            raise UnsupportedParamsError(
+                status_code=500,
+                message=f"Setting user/encoding format is not supported by {custom_llm_provider}. To drop it from the call, set `litellm.drop_params = True`.",
+            )
+        return {**non_default_params, **kwargs}

    if (
        custom_llm_provider != "openai"
@ -4929,26 +5018,9 @@ def get_optional_params(
            model=model, custom_llm_provider=custom_llm_provider
        )
        _check_valid_arg(supported_params=supported_params)
-        # handle anthropic params
-        if stream:
-            optional_params["stream"] = stream
-        if stop is not None:
-            if type(stop) == str:
-                stop = [stop]  # openai can accept str/list for stop
-            optional_params["stop_sequences"] = stop
-        if temperature is not None:
-            optional_params["temperature"] = temperature
-        if top_p is not None:
-            optional_params["top_p"] = top_p
-        if max_tokens is not None:
-            if (model == "claude-2") or (model == "claude-instant-1"):
-                # these models use antropic_text.py which only accepts max_tokens_to_sample
-                optional_params["max_tokens_to_sample"] = max_tokens
-            else:
-                optional_params["max_tokens"] = max_tokens
-            optional_params["max_tokens"] = max_tokens
-        if tools is not None:
-            optional_params["tools"] = tools
+        optional_params = litellm.AnthropicConfig().map_openai_params(
+            non_default_params=non_default_params, optional_params=optional_params
+        )
    elif custom_llm_provider == "cohere":
        ## check if unsupported param passed in
        supported_params = get_supported_openai_params(
@ -5765,19 +5837,40 @@ def get_api_base(model: str, optional_params: dict) -> Optional[str]:
    get_api_base(model="gemini/gemini-pro")
    ```
    """
+
+    try:
+        if "model" in optional_params:
+            _optional_params = LiteLLM_Params(**optional_params)
+        else:  # prevent needing to copy and pop the dict
            _optional_params = LiteLLM_Params(
                model=model, **optional_params
            )  # convert to pydantic object
+    except Exception as e:
+        verbose_logger.error("Error occurred in getting api base - {}".format(str(e)))
+        return None
    # get llm provider
-    try:
-        model, custom_llm_provider, dynamic_api_key, api_base = get_llm_provider(
-            model=model
-        )
-    except:
-        custom_llm_provider = None
+
    if _optional_params.api_base is not None:
        return _optional_params.api_base

+    try:
+        model, custom_llm_provider, dynamic_api_key, dynamic_api_base = (
+            get_llm_provider(
+                model=model,
+                custom_llm_provider=_optional_params.custom_llm_provider,
+                api_base=_optional_params.api_base,
+                api_key=_optional_params.api_key,
+            )
+        )
+    except Exception as e:
+        verbose_logger.error("Error occurred in getting api base - {}".format(str(e)))
+        custom_llm_provider = None
+        dynamic_api_key = None
+        dynamic_api_base = None
+
+    if dynamic_api_base is not None:
+        return dynamic_api_base
+
    if (
        _optional_params.vertex_location is not None
        and _optional_params.vertex_project is not None
@ -5790,14 +5883,29 @@ def get_api_base(model: str, optional_params: dict) -> Optional[str]:
        )
        return _api_base

-    if custom_llm_provider is not None and custom_llm_provider == "gemini":
+    if custom_llm_provider is None:
+        return None
+
+    if custom_llm_provider == "gemini":
        _api_base = "https://generativelanguage.googleapis.com/v1beta/models/{}:generateContent".format(
            model
        )
        return _api_base
+    elif custom_llm_provider == "openai":
+        _api_base = "https://api.openai.com"
+        return _api_base
    return None


+def get_first_chars_messages(kwargs: dict) -> str:
+    try:
+        _messages = kwargs.get("messages")
+        _messages = str(_messages)[:100]
+        return _messages
+    except:
+        return ""
+
+
 def get_supported_openai_params(model: str, custom_llm_provider: str):
    """
    Returns the supported openai params for a given model + provider
@ -5825,15 +5933,7 @@ def get_supported_openai_params(model: str, custom_llm_provider: str):
    elif custom_llm_provider == "ollama_chat":
        return litellm.OllamaChatConfig().get_supported_openai_params()
    elif custom_llm_provider == "anthropic":
-        return [
-            "stream",
-            "stop",
-            "temperature",
-            "top_p",
-            "max_tokens",
-            "tools",
-            "tool_choice",
-        ]
+        return litellm.AnthropicConfig().get_supported_openai_params()
    elif custom_llm_provider == "groq":
        return [
            "temperature",
@ -6102,7 +6202,6 @@ def get_llm_provider(
    try:
        dynamic_api_key = None
        # check if llm provider provided
-
        # AZURE AI-Studio Logic - Azure AI Studio supports AZURE/Cohere
        # If User passes azure/command-r-plus -> we should send it to cohere_chat/command-r-plus
        if model.split("/", 1)[0] == "azure":
@ -6768,7 +6867,7 @@ def validate_environment(model: Optional[str] = None) -> dict:
                keys_in_environment = True
            else:
                missing_keys.append("NLP_CLOUD_API_KEY")
-        elif custom_llm_provider == "bedrock":
+        elif custom_llm_provider == "bedrock" or custom_llm_provider == "sagemaker":
            if (
                "AWS_ACCESS_KEY_ID" in os.environ
                and "AWS_SECRET_ACCESS_KEY" in os.environ
@ -6782,11 +6881,72 @@ def validate_environment(model: Optional[str] = None) -> dict:
                keys_in_environment = True
            else:
                missing_keys.append("OLLAMA_API_BASE")
+        elif custom_llm_provider == "anyscale":
+            if "ANYSCALE_API_KEY" in os.environ:
+                keys_in_environment = True
+            else:
+                missing_keys.append("ANYSCALE_API_KEY")
+        elif custom_llm_provider == "deepinfra":
+            if "DEEPINFRA_API_KEY" in os.environ:
+                keys_in_environment = True
+            else:
+                missing_keys.append("DEEPINFRA_API_KEY")
+        elif custom_llm_provider == "gemini":
+            if "GEMINI_API_KEY" in os.environ:
+                keys_in_environment = True
+            else:
+                missing_keys.append("GEMINI_API_KEY")
+        elif custom_llm_provider == "groq":
+            if "GROQ_API_KEY" in os.environ:
+                keys_in_environment = True
+            else:
+                missing_keys.append("GROQ_API_KEY")
+        elif custom_llm_provider == "mistral":
+            if "MISTRAL_API_KEY" in os.environ:
+                keys_in_environment = True
+            else:
+                missing_keys.append("MISTRAL_API_KEY")
+        elif custom_llm_provider == "palm":
+            if "PALM_API_KEY" in os.environ:
+                keys_in_environment = True
+            else:
+                missing_keys.append("PALM_API_KEY")
+        elif custom_llm_provider == "perplexity":
+            if "PERPLEXITYAI_API_KEY" in os.environ:
+                keys_in_environment = True
+            else:
+                missing_keys.append("PERPLEXITYAI_API_KEY")
+        elif custom_llm_provider == "voyage":
+            if "VOYAGE_API_KEY" in os.environ:
+                keys_in_environment = True
+            else:
+                missing_keys.append("VOYAGE_API_KEY")
+        elif custom_llm_provider == "fireworks_ai":
+            if (
+                "FIREWORKS_AI_API_KEY" in os.environ
+                or "FIREWORKS_API_KEY" in os.environ
+                or "FIREWORKSAI_API_KEY" in os.environ
+                or "FIREWORKS_AI_TOKEN" in os.environ
+            ):
+                keys_in_environment = True
+            else:
+                missing_keys.append("FIREWORKS_AI_API_KEY")
+        elif custom_llm_provider == "cloudflare":
+            if "CLOUDFLARE_API_KEY" in os.environ and (
+                "CLOUDFLARE_ACCOUNT_ID" in os.environ
+                or "CLOUDFLARE_API_BASE" in os.environ
+            ):
+                keys_in_environment = True
+            else:
+                missing_keys.append("CLOUDFLARE_API_KEY")
+                missing_keys.append("CLOUDFLARE_API_BASE")
    else:
        ## openai - chatcompletion + text completion
        if (
            model in litellm.open_ai_chat_completion_models
            or model in litellm.open_ai_text_completion_models
+            or model in litellm.open_ai_embedding_models
+            or model in litellm.openai_image_generation_models
        ):
            if "OPENAI_API_KEY" in os.environ:
                keys_in_environment = True
@ -6817,7 +6977,11 @@ def validate_environment(model: Optional[str] = None) -> dict:
            else:
                missing_keys.append("OPENROUTER_API_KEY")
        ## vertex - text + chat models
-        elif model in litellm.vertex_chat_models or model in litellm.vertex_text_models:
+        elif (
+            model in litellm.vertex_chat_models
+            or model in litellm.vertex_text_models
+            or model in litellm.models_by_provider["vertex_ai"]
+        ):
            if "VERTEXAI_PROJECT" in os.environ and "VERTEXAI_LOCATION" in os.environ:
                keys_in_environment = True
            else:
@ -7722,18 +7886,46 @@ def exception_type(
                exception_type = type(original_exception).__name__
            else:
                exception_type = ""
-            _api_base = ""
-            try:
-                _api_base = litellm.get_api_base(
-                    model=model, optional_params=extra_kwargs
-                )
-            except:
-                _api_base = ""
+
+            ################################################################################
+            # Common Extra information needed for all providers
+            # We pass num retries, api_base, vertex_deployment etc to the exception here
+            ################################################################################
+
+            _api_base = litellm.get_api_base(model=model, optional_params=extra_kwargs)
+            messages = litellm.get_first_chars_messages(kwargs=completion_kwargs)
+            _vertex_project = extra_kwargs.get("vertex_project")
+            _vertex_location = extra_kwargs.get("vertex_location")
+            _metadata = extra_kwargs.get("metadata", {}) or {}
+            _model_group = _metadata.get("model_group")
+            _deployment = _metadata.get("deployment")
+            extra_information = f"\nModel: {model}"
+            if _api_base:
+                extra_information += f"\nAPI Base: {_api_base}"
+            if messages and len(messages) > 0:
+                extra_information += f"\nMessages: {messages}"
+
+            if _model_group is not None:
+                extra_information += f"\nmodel_group: {_model_group}\n"
+            if _deployment is not None:
+                extra_information += f"\ndeployment: {_deployment}\n"
+            if _vertex_project is not None:
+                extra_information += f"\nvertex_project: {_vertex_project}\n"
+            if _vertex_location is not None:
+                extra_information += f"\nvertex_location: {_vertex_location}\n"
+
+            ################################################################################
+            # End of Common Extra information Needed for all providers
+            ################################################################################
+
+            ################################################################################
+            #################### Start of Provider Exception mapping ####################
+            ################################################################################

            if "Request Timeout Error" in error_str or "Request timed out" in error_str:
                exception_mapping_worked = True
                raise Timeout(
-                    message=f"APITimeoutError - Request timed out. \n model: {model} \n api_base: {_api_base} \n error_str: {error_str}",
+                    message=f"APITimeoutError - Request timed out. {extra_information} \n error_str: {error_str}",
                    model=model,
                    llm_provider=custom_llm_provider,
                )
@ -7768,7 +7960,7 @@ def exception_type(
                ):
                    exception_mapping_worked = True
                    raise ContextWindowExceededError(
-                        message=f"{exception_provider} - {message}",
+                        message=f"{exception_provider} - {message} {extra_information}",
                        llm_provider=custom_llm_provider,
                        model=model,
                        response=original_exception.response,
@ -7779,7 +7971,7 @@ def exception_type(
                ):
                    exception_mapping_worked = True
                    raise NotFoundError(
-                        message=f"{exception_provider} - {message}",
+                        message=f"{exception_provider} - {message} {extra_information}",
                        llm_provider=custom_llm_provider,
                        model=model,
                        response=original_exception.response,
@ -7790,7 +7982,7 @@ def exception_type(
                ):
                    exception_mapping_worked = True
                    raise ContentPolicyViolationError(
-                        message=f"{exception_provider} - {message}",
+                        message=f"{exception_provider} - {message} {extra_information}",
                        llm_provider=custom_llm_provider,
                        model=model,
                        response=original_exception.response,
@ -7801,7 +7993,7 @@ def exception_type(
                ):
                    exception_mapping_worked = True
                    raise BadRequestError(
-                        message=f"{exception_provider} - {message}",
+                        message=f"{exception_provider} - {message} {extra_information}",
                        llm_provider=custom_llm_provider,
                        model=model,
                        response=original_exception.response,
@ -7812,7 +8004,7 @@ def exception_type(
                ):
                    exception_mapping_worked = True
                    raise AuthenticationError(
-                        message=f"{exception_provider} - {message}",
+                        message=f"{exception_provider} - {message} {extra_information}",
                        llm_provider=custom_llm_provider,
                        model=model,
                        response=original_exception.response,
@ -7824,7 +8016,7 @@ def exception_type(
                    )
                    raise APIError(
                        status_code=500,
-                        message=f"{exception_provider} - {message}",
+                        message=f"{exception_provider} - {message} {extra_information}",
                        llm_provider=custom_llm_provider,
                        model=model,
                        request=_request,
@ -7834,7 +8026,7 @@ def exception_type(
                    if original_exception.status_code == 401:
                        exception_mapping_worked = True
                        raise AuthenticationError(
-                            message=f"{exception_provider} - {message}",
+                            message=f"{exception_provider} - {message} {extra_information}",
                            llm_provider=custom_llm_provider,
                            model=model,
                            response=original_exception.response,
@ -7842,7 +8034,7 @@ def exception_type(
                    elif original_exception.status_code == 404:
                        exception_mapping_worked = True
                        raise NotFoundError(
-                            message=f"{exception_provider} - {message}",
+                            message=f"{exception_provider} - {message} {extra_information}",
                            model=model,
                            llm_provider=custom_llm_provider,
                            response=original_exception.response,
@ -7850,14 +8042,14 @@ def exception_type(
                    elif original_exception.status_code == 408:
                        exception_mapping_worked = True
                        raise Timeout(
-                            message=f"{exception_provider} - {message}",
+                            message=f"{exception_provider} - {message} {extra_information}",
                            model=model,
                            llm_provider=custom_llm_provider,
                        )
                    elif original_exception.status_code == 422:
                        exception_mapping_worked = True
                        raise BadRequestError(
-                            message=f"{exception_provider} - {message}",
+                            message=f"{exception_provider} - {message} {extra_information}",
                            model=model,
                            llm_provider=custom_llm_provider,
                            response=original_exception.response,
@ -7865,7 +8057,7 @@ def exception_type(
                    elif original_exception.status_code == 429:
                        exception_mapping_worked = True
                        raise RateLimitError(
-                            message=f"{exception_provider} - {message}",
+                            message=f"{exception_provider} - {message} {extra_information}",
                            model=model,
                            llm_provider=custom_llm_provider,
                            response=original_exception.response,
@ -7873,7 +8065,7 @@ def exception_type(
                    elif original_exception.status_code == 503:
                        exception_mapping_worked = True
                        raise ServiceUnavailableError(
-                            message=f"{exception_provider} - {message}",
+                            message=f"{exception_provider} - {message} {extra_information}",
                            model=model,
                            llm_provider=custom_llm_provider,
                            response=original_exception.response,
@ -7881,7 +8073,7 @@ def exception_type(
                    elif original_exception.status_code == 504:  # gateway timeout error
                        exception_mapping_worked = True
                        raise Timeout(
-                            message=f"{exception_provider} - {message}",
+                            message=f"{exception_provider} - {message} {extra_information}",
                            model=model,
                            llm_provider=custom_llm_provider,
                        )
@ -7889,7 +8081,7 @@ def exception_type(
                        exception_mapping_worked = True
                        raise APIError(
                            status_code=original_exception.status_code,
-                            message=f"{exception_provider} - {message}",
+                            message=f"{exception_provider} - {message} {extra_information}",
                            llm_provider=custom_llm_provider,
                            model=model,
                            request=original_exception.request,
@ -7897,7 +8089,7 @@ def exception_type(
                else:
                    # if no status code then it is an APIConnectionError: https://github.com/openai/openai-python#handling-errors
                    raise APIConnectionError(
-                        message=f"{exception_provider} - {message}",
+                        message=f"{exception_provider} - {message} {extra_information}",
                        llm_provider=custom_llm_provider,
                        model=model,
                        request=httpx.Request(
@ -8204,33 +8396,13 @@ def exception_type(
                        response=original_exception.response,
                    )
            elif custom_llm_provider == "vertex_ai":
-                if completion_kwargs is not None:
-                    # add model, deployment and model_group to the exception message
-                    _model = completion_kwargs.get("model")
-                    error_str += f"\nmodel: {_model}\n"
-                if extra_kwargs is not None:
-                    _vertex_project = extra_kwargs.get("vertex_project")
-                    _vertex_location = extra_kwargs.get("vertex_location")
-                    _metadata = extra_kwargs.get("metadata", {}) or {}
-                    _model_group = _metadata.get("model_group")
-                    _deployment = _metadata.get("deployment")
-
-                    if _model_group is not None:
-                        error_str += f"model_group: {_model_group}\n"
-                    if _deployment is not None:
-                        error_str += f"deployment: {_deployment}\n"
-                    if _vertex_project is not None:
-                        error_str += f"vertex_project: {_vertex_project}\n"
-                    if _vertex_location is not None:
-                        error_str += f"vertex_location: {_vertex_location}\n"
-
                if (
                    "Vertex AI API has not been used in project" in error_str
                    or "Unable to find your project" in error_str
                ):
                    exception_mapping_worked = True
                    raise BadRequestError(
-                        message=f"VertexAIException - {error_str}",
+                        message=f"VertexAIException - {error_str} {extra_information}",
                        model=model,
                        llm_provider="vertex_ai",
                        response=original_exception.response,
@ -8241,7 +8413,7 @@ def exception_type(
                ):
                    exception_mapping_worked = True
                    raise APIError(
-                        message=f"VertexAIException - {error_str}",
+                        message=f"VertexAIException - {error_str} {extra_information}",
                        status_code=500,
                        model=model,
                        llm_provider="vertex_ai",
@ -8250,7 +8422,7 @@ def exception_type(
                elif "403" in error_str:
                    exception_mapping_worked = True
                    raise BadRequestError(
-                        message=f"VertexAIException - {error_str}",
+                        message=f"VertexAIException - {error_str} {extra_information}",
                        model=model,
                        llm_provider="vertex_ai",
                        response=original_exception.response,
@ -8258,7 +8430,7 @@ def exception_type(
                elif "The response was blocked." in error_str:
                    exception_mapping_worked = True
                    raise UnprocessableEntityError(
-                        message=f"VertexAIException - {error_str}",
+                        message=f"VertexAIException - {error_str} {extra_information}",
                        model=model,
                        llm_provider="vertex_ai",
                        response=httpx.Response(
@ -8277,7 +8449,7 @@ def exception_type(
                ):
                    exception_mapping_worked = True
                    raise RateLimitError(
-                        message=f"VertexAIException - {error_str}",
+                        message=f"VertexAIException - {error_str} {extra_information}",
                        model=model,
                        llm_provider="vertex_ai",
                        response=httpx.Response(
@ -8292,7 +8464,7 @@ def exception_type(
                    if original_exception.status_code == 400:
                        exception_mapping_worked = True
                        raise BadRequestError(
-                            message=f"VertexAIException - {error_str}",
+                            message=f"VertexAIException - {error_str} {extra_information}",
                            model=model,
                            llm_provider="vertex_ai",
                            response=original_exception.response,
@ -8300,7 +8472,7 @@ def exception_type(
                    if original_exception.status_code == 500:
                        exception_mapping_worked = True
                        raise APIError(
-                            message=f"VertexAIException - {error_str}",
+                            message=f"VertexAIException - {error_str} {extra_information}",
                            status_code=500,
                            model=model,
                            llm_provider="vertex_ai",
@ -8312,7 +8484,7 @@ def exception_type(
                    # 503 Getting metadata from plugin failed with error: Reauthentication is needed. Please run `gcloud auth application-default login` to reauthenticate.
                    exception_mapping_worked = True
                    raise BadRequestError(
-                        message=f"PalmException - Invalid api key",
+                        message=f"GeminiException - Invalid api key",
                        model=model,
                        llm_provider="palm",
                        response=original_exception.response,
@ -8323,23 +8495,26 @@ def exception_type(
                ):
                    exception_mapping_worked = True
                    raise Timeout(
-                        message=f"PalmException - {original_exception.message}",
+                        message=f"GeminiException - {original_exception.message}",
                        model=model,
                        llm_provider="palm",
                    )
                if "400 Request payload size exceeds" in error_str:
                    exception_mapping_worked = True
                    raise ContextWindowExceededError(
-                        message=f"PalmException - {error_str}",
+                        message=f"GeminiException - {error_str}",
                        model=model,
                        llm_provider="palm",
                        response=original_exception.response,
                    )
-                if "500 An internal error has occurred." in error_str:
+                if (
+                    "500 An internal error has occurred." in error_str
+                    or "list index out of range" in error_str
+                ):
                    exception_mapping_worked = True
                    raise APIError(
                        status_code=getattr(original_exception, "status_code", 500),
-                        message=f"PalmException - {original_exception.message}",
+                        message=f"GeminiException - {original_exception.message}",
                        llm_provider="palm",
                        model=model,
                        request=original_exception.request,
@ -8348,7 +8523,7 @@ def exception_type(
                    if original_exception.status_code == 400:
                        exception_mapping_worked = True
                        raise BadRequestError(
-                            message=f"PalmException - {error_str}",
+                            message=f"GeminiException - {error_str}",
                            model=model,
                            llm_provider="palm",
                            response=original_exception.response,
@ -8891,10 +9066,19 @@ def exception_type(
                            request=original_exception.request,
                        )
            elif custom_llm_provider == "azure":
-                if "This model's maximum context length is" in error_str:
+                if "Internal server error" in error_str:
+                    exception_mapping_worked = True
+                    raise APIError(
+                        status_code=500,
+                        message=f"AzureException - {original_exception.message} {extra_information}",
+                        llm_provider="azure",
+                        model=model,
+                        request=httpx.Request(method="POST", url="https://openai.com/"),
+                    )
+                elif "This model's maximum context length is" in error_str:
                    exception_mapping_worked = True
                    raise ContextWindowExceededError(
-                        message=f"AzureException - {original_exception.message}",
+                        message=f"AzureException - {original_exception.message} {extra_information}",
                        llm_provider="azure",
                        model=model,
                        response=original_exception.response,
@ -8902,7 +9086,7 @@ def exception_type(
                elif "DeploymentNotFound" in error_str:
                    exception_mapping_worked = True
                    raise NotFoundError(
-                        message=f"AzureException - {original_exception.message}",
+                        message=f"AzureException - {original_exception.message} {extra_information}",
                        llm_provider="azure",
                        model=model,
                        response=original_exception.response,
@ -8910,10 +9094,13 @@ def exception_type(
                elif (
                    "invalid_request_error" in error_str
                    and "content_policy_violation" in error_str
+                ) or (
+                    "The response was filtered due to the prompt triggering Azure OpenAI's content management"
+                    in error_str
                ):
                    exception_mapping_worked = True
                    raise ContentPolicyViolationError(
-                        message=f"AzureException - {original_exception.message}",
+                        message=f"AzureException - {original_exception.message} {extra_information}",
                        llm_provider="azure",
                        model=model,
                        response=original_exception.response,
@ -8921,7 +9108,7 @@ def exception_type(
                elif "invalid_request_error" in error_str:
                    exception_mapping_worked = True
                    raise BadRequestError(
-                        message=f"AzureException - {original_exception.message}",
+                        message=f"AzureException - {original_exception.message} {extra_information}",
                        llm_provider="azure",
                        model=model,
                        response=original_exception.response,
@ -8932,7 +9119,7 @@ def exception_type(
                ):
                    exception_mapping_worked = True
                    raise AuthenticationError(
-                        message=f"{exception_provider} - {original_exception.message}",
+                        message=f"{exception_provider} - {original_exception.message} {extra_information}",
                        llm_provider=custom_llm_provider,
                        model=model,
                        response=original_exception.response,
@ -8942,7 +9129,7 @@ def exception_type(
                    if original_exception.status_code == 401:
                        exception_mapping_worked = True
                        raise AuthenticationError(
-                            message=f"AzureException - {original_exception.message}",
+                            message=f"AzureException - {original_exception.message} {extra_information}",
                            llm_provider="azure",
                            model=model,
                            response=original_exception.response,
@ -8950,14 +9137,14 @@ def exception_type(
                    elif original_exception.status_code == 408:
                        exception_mapping_worked = True
                        raise Timeout(
-                            message=f"AzureException - {original_exception.message}",
+                            message=f"AzureException - {original_exception.message} {extra_information}",
                            model=model,
                            llm_provider="azure",
                        )
                    if original_exception.status_code == 422:
                        exception_mapping_worked = True
                        raise BadRequestError(
-                            message=f"AzureException - {original_exception.message}",
+                            message=f"AzureException - {original_exception.message} {extra_information}",
                            model=model,
                            llm_provider="azure",
                            response=original_exception.response,
@ -8965,7 +9152,7 @@ def exception_type(
                    elif original_exception.status_code == 429:
                        exception_mapping_worked = True
                        raise RateLimitError(
-                            message=f"AzureException - {original_exception.message}",
+                            message=f"AzureException - {original_exception.message} {extra_information}",
                            model=model,
                            llm_provider="azure",
                            response=original_exception.response,
@ -8973,7 +9160,7 @@ def exception_type(
                    elif original_exception.status_code == 503:
                        exception_mapping_worked = True
                        raise ServiceUnavailableError(
-                            message=f"AzureException - {original_exception.message}",
+                            message=f"AzureException - {original_exception.message} {extra_information}",
                            model=model,
                            llm_provider="azure",
                            response=original_exception.response,
@ -8981,7 +9168,7 @@ def exception_type(
                    elif original_exception.status_code == 504:  # gateway timeout error
                        exception_mapping_worked = True
                        raise Timeout(
-                            message=f"AzureException - {original_exception.message}",
+                            message=f"AzureException - {original_exception.message} {extra_information}",
                            model=model,
                            llm_provider="azure",
                        )
@ -8989,7 +9176,7 @@ def exception_type(
                        exception_mapping_worked = True
                        raise APIError(
                            status_code=original_exception.status_code,
-                            message=f"AzureException - {original_exception.message}",
+                            message=f"AzureException - {original_exception.message} {extra_information}",
                            llm_provider="azure",
                            model=model,
                            request=httpx.Request(
@ -8999,7 +9186,7 @@ def exception_type(
                else:
                    # if no status code then it is an APIConnectionError: https://github.com/openai/openai-python#handling-errors
                    raise APIConnectionError(
-                        message=f"{exception_provider} - {message}",
+                        message=f"{exception_provider} - {message} {extra_information}",
                        llm_provider="azure",
                        model=model,
                        request=httpx.Request(method="POST", url="https://openai.com/"),
--- a/model_prices_and_context_window.json
+++ b/model_prices_and_context_window.json
@ -338,6 +338,18 @@
        "output_cost_per_second": 0.0001, 
        "litellm_provider": "azure"
    },
+    "azure/gpt-4-turbo-2024-04-09": {
+        "max_tokens": 4096,
+        "max_input_tokens": 128000,
+        "max_output_tokens": 4096,
+        "input_cost_per_token": 0.00001,
+        "output_cost_per_token": 0.00003,
+        "litellm_provider": "azure",
+        "mode": "chat",
+        "supports_function_calling": true,
+        "supports_parallel_function_calling": true,
+        "supports_vision": true
+    },
    "azure/gpt-4-0125-preview": {
        "max_tokens": 4096,
        "max_input_tokens": 128000,
@ -813,6 +825,7 @@
        "litellm_provider": "anthropic",
        "mode": "chat",
        "supports_function_calling": true,
+        "supports_vision": true,
        "tool_use_system_prompt_tokens": 264
    },
    "claude-3-opus-20240229": {
@ -824,6 +837,7 @@
        "litellm_provider": "anthropic",
        "mode": "chat",
        "supports_function_calling": true,
+        "supports_vision": true,
        "tool_use_system_prompt_tokens": 395
    },
    "claude-3-sonnet-20240229": {
@ -835,6 +849,7 @@
        "litellm_provider": "anthropic",
        "mode": "chat",
        "supports_function_calling": true,
+        "supports_vision": true,
        "tool_use_system_prompt_tokens": 159
    },
    "text-bison": {
@ -1142,7 +1157,8 @@
        "output_cost_per_token": 0.000015,
        "litellm_provider": "vertex_ai-anthropic_models",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "vertex_ai/claude-3-haiku@20240307": {
        "max_tokens": 4096, 
@ -1152,7 +1168,8 @@
        "output_cost_per_token": 0.00000125,
        "litellm_provider": "vertex_ai-anthropic_models",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "vertex_ai/claude-3-opus@20240229": {
        "max_tokens": 4096,
@ -1162,7 +1179,8 @@
        "output_cost_per_token": 0.0000075,
        "litellm_provider": "vertex_ai-anthropic_models",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "textembedding-gecko": {
        "max_tokens": 3072,
@ -1581,6 +1599,7 @@
        "litellm_provider": "openrouter",
        "mode": "chat",
        "supports_function_calling": true,
+        "supports_vision": true,
        "tool_use_system_prompt_tokens": 395
    },
    "openrouter/google/palm-2-chat-bison": {
@ -1813,6 +1832,15 @@
        "litellm_provider": "bedrock", 
        "mode": "embedding"
    },
+    "amazon.titan-embed-text-v2:0": {
+        "max_tokens": 8192, 
+        "max_input_tokens": 8192, 
+        "output_vector_size": 1024,
+        "input_cost_per_token": 0.0000002,
+        "output_cost_per_token": 0.0,
+        "litellm_provider": "bedrock", 
+        "mode": "embedding"
+    },
    "mistral.mistral-7b-instruct-v0:2": {
        "max_tokens": 8191,
        "max_input_tokens": 32000,
@ -1929,7 +1957,8 @@
        "output_cost_per_token": 0.000015,
        "litellm_provider": "bedrock",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "anthropic.claude-3-haiku-20240307-v1:0": {
        "max_tokens": 4096, 
@ -1939,7 +1968,8 @@
        "output_cost_per_token": 0.00000125,
        "litellm_provider": "bedrock",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "anthropic.claude-3-opus-20240229-v1:0": {
        "max_tokens": 4096,
@ -1949,7 +1979,8 @@
        "output_cost_per_token": 0.000075,
        "litellm_provider": "bedrock",
        "mode": "chat",
-        "supports_function_calling": true
+        "supports_function_calling": true,
+        "supports_vision": true
    },
    "anthropic.claude-v1": {
        "max_tokens": 8191, 
--- a/poetry.lock
+++ b/poetry.lock
@ -1153,13 +1153,13 @@ typing = ["types-PyYAML", "types-requests", "types-simplejson", "types-toml", "t

 [[package]]
 name = "idna"
-version = "3.6"
+version = "3.7"
 description = "Internationalized Domain Names in Applications (IDNA)"
 optional = false
 python-versions = ">=3.5"
 files = [
-    {file = "idna-3.6-py3-none-any.whl", hash = "sha256:c05567e9c24a6b9faaa835c4821bad0590fbb9d5779e7caa6e1cc4978e7eb24f"},
-    {file = "idna-3.6.tar.gz", hash = "sha256:9ecdbbd083b06798ae1e86adcbfe8ab1479cf864e4ee30fe4e46a003d12491ca"},
+    {file = "idna-3.7-py3-none-any.whl", hash = "sha256:82fee1fc78add43492d3a1898bfa6d8a904cc97d8427f683ed8e798d07761aa0"},
+    {file = "idna-3.7.tar.gz", hash = "sha256:028ff3aadf0609c1fd278d8ea3089299412a7a8b9bd005dd08b9f8285bcb5cfc"},
 ]

 [[package]]
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [tool.poetry]
 name = "litellm"
-version = "1.35.36"
+version = "1.36.0"
 description = "Library to easily interface with LLM API providers"
 authors = ["BerriAI"]
 license = "MIT"
@ -80,7 +80,7 @@ requires = ["poetry-core", "wheel"]
 build-backend = "poetry.core.masonry.api"

 [tool.commitizen]
-version = "1.35.36"
+version = "1.36.0"
 version_files = [
    "pyproject.toml:^version"
 ]
--- a/tests/test_callbacks_on_proxy.py
+++ b/tests/test_callbacks_on_proxy.py
@ -0,0 +1,163 @@
+# What this tests ?
+## Makes sure the number of callbacks on the proxy don't increase over time
+## Num callbacks should be a fixed number at t=0 and t=10, t=20
+"""
+PROD TEST - DO NOT Delete this Test
+"""
+
+import pytest
+import asyncio
+import aiohttp
+import os
+import dotenv
+from dotenv import load_dotenv
+import pytest
+
+load_dotenv()
+
+
+async def config_update(session, routing_strategy=None):
+    url = "http://0.0.0.0:4000/config/update"
+    headers = {"Authorization": "Bearer sk-1234", "Content-Type": "application/json"}
+    print("routing_strategy: ", routing_strategy)
+    data = {
+        "router_settings": {
+            "routing_strategy": routing_strategy,
+        },
+        "general_settings": {
+            "alert_to_webhook_url": {
+                "llm_exceptions": "https://hooks.slack.com/services/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"
+            },
+            "alert_types": ["llm_exceptions", "db_exceptions"],
+        },
+    }
+
+    async with session.post(url, headers=headers, json=data) as response:
+        status = response.status
+        response_text = await response.text()
+
+        print(response_text)
+        print()
+
+        if status != 200:
+            raise Exception(f"Request did not return a 200 status code: {status}")
+        return await response.json()
+
+
+async def get_active_callbacks(session):
+    url = "http://0.0.0.0:4000/active/callbacks"
+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": "Bearer sk-1234",
+    }
+
+    async with session.get(url, headers=headers) as response:
+        status = response.status
+        response_text = await response.text()
+        print("response from /active/callbacks")
+        print(response_text)
+        print()
+
+        if status != 200:
+            raise Exception(f"Request did not return a 200 status code: {status}")
+
+        _json_response = await response.json()
+
+        _num_callbacks = _json_response["num_callbacks"]
+        _num_alerts = _json_response["num_alerting"]
+        print("current number of callbacks: ", _num_callbacks)
+        print("current number of alerts: ", _num_alerts)
+        return _num_callbacks, _num_alerts
+
+
+async def get_current_routing_strategy(session):
+    url = "http://0.0.0.0:4000/get/config/callbacks"
+    headers = {
+        "Content-Type": "application/json",
+        "Authorization": "Bearer sk-1234",
+    }
+
+    async with session.get(url, headers=headers) as response:
+        status = response.status
+        response_text = await response.text()
+        print(response_text)
+        print()
+
+        if status != 200:
+            raise Exception(f"Request did not return a 200 status code: {status}")
+
+        _json_response = await response.json()
+        print("JSON response: ", _json_response)
+
+        router_settings = _json_response["router_settings"]
+        print("Router settings: ", router_settings)
+        routing_strategy = router_settings["routing_strategy"]
+        return routing_strategy
+
+
+@pytest.mark.asyncio
+async def test_check_num_callbacks():
+    """
+    Test 1:  num callbacks should NOT increase over time
+    -> check current callbacks
+    -> sleep for 30s
+    -> check current callbacks
+    -> sleep for 30s
+    -> check current callbacks
+    """
+    import uuid
+
+    async with aiohttp.ClientSession() as session:
+        await asyncio.sleep(30)
+        num_callbacks_1, _ = await get_active_callbacks(session=session)
+        assert num_callbacks_1 > 0
+        await asyncio.sleep(30)
+
+        num_callbacks_2, _ = await get_active_callbacks(session=session)
+
+        assert num_callbacks_1 == num_callbacks_2
+
+        await asyncio.sleep(30)
+
+        num_callbacks_3, _ = await get_active_callbacks(session=session)
+
+        assert num_callbacks_1 == num_callbacks_2 == num_callbacks_3
+
+
+@pytest.mark.asyncio
+async def test_check_num_callbacks_on_lowest_latency():
+    """
+    Test 1:  num callbacks should NOT increase over time
+    -> Update to lowest latency
+    -> check current callbacks
+    -> sleep for 30s
+    -> check current callbacks
+    -> sleep for 30s
+    -> check current callbacks
+    -> update back to original routing-strategy
+    """
+    import uuid
+
+    async with aiohttp.ClientSession() as session:
+        await asyncio.sleep(30)
+
+        original_routing_strategy = await get_current_routing_strategy(session=session)
+        await config_update(session=session, routing_strategy="latency-based-routing")
+
+        num_callbacks_1, num_alerts_1 = await get_active_callbacks(session=session)
+
+        await asyncio.sleep(30)
+
+        num_callbacks_2, num_alerts_2 = await get_active_callbacks(session=session)
+
+        assert num_callbacks_1 == num_callbacks_2
+
+        await asyncio.sleep(30)
+
+        num_callbacks_3, num_alerts_3 = await get_active_callbacks(session=session)
+
+        assert num_callbacks_1 == num_callbacks_2 == num_callbacks_3
+
+        assert num_alerts_1 == num_alerts_2 == num_alerts_3
+
+        await config_update(session=session, routing_strategy=original_routing_strategy)
--- a/tests/test_keys.py
+++ b/tests/test_keys.py
@ -438,6 +438,7 @@ async def get_spend_logs(session, request_id):
        return await response.json()


+@pytest.mark.skip(reason="Hanging on ci/cd")
@pytest.mark.asyncio
 async def test_key_info_spend_values():
    """
--- a/ui/litellm-dashboard/out/404.html
+++ b/ui/litellm-dashboard/out/404.html
--- a/ui/litellm-dashboard/out/_next/static/c5rha8cqAah-saaczjn02/_buildManifest.js
+++ b/ui/litellm-dashboard/out/_next/static/c5rha8cqAah-saaczjn02/_buildManifest.js
@ -1 +0,0 @@
-self.__BUILD_MANIFEST={__rewrites:{afterFiles:[],beforeFiles:[],fallback:[]},"/_error":["static/chunks/pages/_error-d6107f1aac0c574c.js"],sortedPages:["/_app","/_error"]},self.__BUILD_MANIFEST_CB&&self.__BUILD_MANIFEST_CB();
--- a/ui/litellm-dashboard/out/_next/static/c5rha8cqAah-saaczjn02/_ssgManifest.js
+++ b/ui/litellm-dashboard/out/_next/static/c5rha8cqAah-saaczjn02/_ssgManifest.js
@ -1 +0,0 @@
-self.__SSG_MANIFEST=new Set([]);self.__SSG_MANIFEST_CB&&self.__SSG_MANIFEST_CB()
--- a/ui/litellm-dashboard/out/_next/static/chunks/142-11990a208bf93746.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/142-11990a208bf93746.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/2f6dbc85-17d29013b8ff3da5.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/2f6dbc85-17d29013b8ff3da5.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/app/layout-bf3537d6924e801d.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/app/layout-bf3537d6924e801d.js
@ -0,0 +1 @@
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[185],{87421:function(n,e,t){Promise.resolve().then(t.t.bind(t,99646,23)),Promise.resolve().then(t.t.bind(t,63385,23))},63385:function(){},99646:function(n){n.exports={style:{fontFamily:"'__Inter_c23dc8', '__Inter_Fallback_c23dc8'",fontStyle:"normal"},className:"__className_c23dc8"}}},function(n){n.O(0,[971,69,744],function(){return n(n.s=87421)}),_N_E=n.O()}]);
--- a/ui/litellm-dashboard/out/_next/static/chunks/app/page-5a4a198eefedc775.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/app/page-5a4a198eefedc775.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/app/page-d9bdfedbff191985.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/app/page-d9bdfedbff191985.js
--- a/ui/litellm-dashboard/out/_next/static/chunks/main-app-9b4fb13a7db53edf.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/main-app-9b4fb13a7db53edf.js
@ -0,0 +1 @@
+(self.webpackChunk_N_E=self.webpackChunk_N_E||[]).push([[744],{32028:function(e,n,t){Promise.resolve().then(t.t.bind(t,47690,23)),Promise.resolve().then(t.t.bind(t,48955,23)),Promise.resolve().then(t.t.bind(t,5613,23)),Promise.resolve().then(t.t.bind(t,11902,23)),Promise.resolve().then(t.t.bind(t,31778,23)),Promise.resolve().then(t.t.bind(t,77831,23))}},function(e){var n=function(n){return e(e.s=n)};e.O(0,[971,69],function(){return n(35317),n(32028)}),_N_E=e.O()}]);
--- a/ui/litellm-dashboard/out/_next/static/chunks/webpack-202e312607f242a1.js
+++ b/ui/litellm-dashboard/out/_next/static/chunks/webpack-202e312607f242a1.js
@ -0,0 +1 @@
+!function(){"use strict";var e,t,n,r,o,u,i,c,f,a={},l={};function d(e){var t=l[e];if(void 0!==t)return t.exports;var n=l[e]={id:e,loaded:!1,exports:{}},r=!0;try{a[e](n,n.exports,d),r=!1}finally{r&&delete l[e]}return n.loaded=!0,n.exports}d.m=a,e=[],d.O=function(t,n,r,o){if(n){o=o||0;for(var u=e.length;u>0&&e[u-1][2]>o;u--)e[u]=e[u-1];e[u]=[n,r,o];return}for(var i=1/0,u=0;u<e.length;u++){for(var n=e[u][0],r=e[u][1],o=e[u][2],c=!0,f=0;f<n.length;f++)i>=o&&Object.keys(d.O).every(function(e){return d.O[e](n[f])})?n.splice(f--,1):(c=!1,o<i&&(i=o));if(c){e.splice(u--,1);var a=r();void 0!==a&&(t=a)}}return t},d.n=function(e){var t=e&&e.__esModule?function(){return e.default}:function(){return e};return d.d(t,{a:t}),t},n=Object.getPrototypeOf?function(e){return Object.getPrototypeOf(e)}:function(e){return e.__proto__},d.t=function(e,r){if(1&r&&(e=this(e)),8&r||"object"==typeof e&&e&&(4&r&&e.__esModule||16&r&&"function"==typeof e.then))return e;var o=Object.create(null);d.r(o);var u={};t=t||[null,n({}),n([]),n(n)];for(var i=2&r&&e;"object"==typeof i&&!~t.indexOf(i);i=n(i))Object.getOwnPropertyNames(i).forEach(function(t){u[t]=function(){return e[t]}});return u.default=function(){return e},d.d(o,u),o},d.d=function(e,t){for(var n in t)d.o(t,n)&&!d.o(e,n)&&Object.defineProperty(e,n,{enumerable:!0,get:t[n]})},d.f={},d.e=function(e){return Promise.all(Object.keys(d.f).reduce(function(t,n){return d.f[n](e,t),t},[]))},d.u=function(e){},d.miniCssF=function(e){return"static/css/00c2ddbcd01819c0.css"},d.g=function(){if("object"==typeof globalThis)return globalThis;try{return this||Function("return this")()}catch(e){if("object"==typeof window)return window}}(),d.o=function(e,t){return Object.prototype.hasOwnProperty.call(e,t)},r={},o="_N_E:",d.l=function(e,t,n,u){if(r[e]){r[e].push(t);return}if(void 0!==n)for(var i,c,f=document.getElementsByTagName("script"),a=0;a<f.length;a++){var l=f[a];if(l.getAttribute("src")==e||l.getAttribute("data-webpack")==o+n){i=l;break}}i||(c=!0,(i=document.createElement("script")).charset="utf-8",i.timeout=120,d.nc&&i.setAttribute("nonce",d.nc),i.setAttribute("data-webpack",o+n),i.src=d.tu(e)),r[e]=[t];var s=function(t,n){i.onerror=i.onload=null,clearTimeout(p);var o=r[e];if(delete r[e],i.parentNode&&i.parentNode.removeChild(i),o&&o.forEach(function(e){return e(n)}),t)return t(n)},p=setTimeout(s.bind(null,void 0,{type:"timeout",target:i}),12e4);i.onerror=s.bind(null,i.onerror),i.onload=s.bind(null,i.onload),c&&document.head.appendChild(i)},d.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},d.nmd=function(e){return e.paths=[],e.children||(e.children=[]),e},d.tt=function(){return void 0===u&&(u={createScriptURL:function(e){return e}},"undefined"!=typeof trustedTypes&&trustedTypes.createPolicy&&(u=trustedTypes.createPolicy("nextjs#bundler",u))),u},d.tu=function(e){return d.tt().createScriptURL(e)},d.p="/ui/_next/",i={272:0},d.f.j=function(e,t){var n=d.o(i,e)?i[e]:void 0;if(0!==n){if(n)t.push(n[2]);else if(272!=e){var r=new Promise(function(t,r){n=i[e]=[t,r]});t.push(n[2]=r);var o=d.p+d.u(e),u=Error();d.l(o,function(t){if(d.o(i,e)&&(0!==(n=i[e])&&(i[e]=void 0),n)){var r=t&&("load"===t.type?"missing":t.type),o=t&&t.target&&t.target.src;u.message="Loading chunk "+e+" failed.\n("+r+": "+o+")",u.name="ChunkLoadError",u.type=r,u.request=o,n[1](u)}},"chunk-"+e,e)}else i[e]=0}},d.O.j=function(e){return 0===i[e]},c=function(e,t){var n,r,o=t[0],u=t[1],c=t[2],f=0;if(o.some(function(e){return 0!==i[e]})){for(n in u)d.o(u,n)&&(d.m[n]=u[n]);if(c)var a=c(d)}for(e&&e(t);f<o.length;f++)r=o[f],d.o(i,r)&&i[r]&&i[r][0](),i[r]=0;return d.O(a)},(f=self.webpackChunk_N_E=self.webpackChunk_N_E||[]).forEach(c.bind(null,0)),f.push=c.bind(null,f.push.bind(f))}();
--- a/ui/litellm-dashboard/out/_next/static/css/00c2ddbcd01819c0.css
+++ b/ui/litellm-dashboard/out/_next/static/css/00c2ddbcd01819c0.css
--- a/litellm/proxy/_experimental/out/_next/static/dWGL92c5LzTMn7XX6utn2/_buildManifest.js
+++ b/litellm/proxy/_experimental/out/_next/static/dWGL92c5LzTMn7XX6utn2/_buildManifest.js
--- a/litellm/proxy/_experimental/out/_next/static/dWGL92c5LzTMn7XX6utn2/_ssgManifest.js
+++ b/litellm/proxy/_experimental/out/_next/static/dWGL92c5LzTMn7XX6utn2/_ssgManifest.js
--- a/ui/litellm-dashboard/out/index.html
+++ b/ui/litellm-dashboard/out/index.html
@ -1,5 +1 @@
-<<<<<<< HEAD
-<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-202e312607f242a1.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-dafd44dfa2da140c.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-e49705773ae41779.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-9b4fb13a7db53edf.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-202e312607f242a1.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/c9a5bc6a7c948fb0-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/00c2ddbcd01819c0.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[46414,[\"761\",\"static/chunks/761-05f8a8451296476c.js\",\"931\",\"static/chunks/app/page-5a4a198eefedc775.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/00c2ddbcd01819c0.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"c5rha8cqAah-saaczjn02\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_c23dc8\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
-=======
-<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-65a932b4e8bd8abb.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-dafd44dfa2da140c.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-e49705773ae41779.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-096338c8e1915716.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-65a932b4e8bd8abb.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/c9a5bc6a7c948fb0-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/9f51f0573c6b0365.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[46414,[\"386\",\"static/chunks/386-d811195b597a2122.js\",\"931\",\"static/chunks/app/page-e0ee34389254cdf2.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/9f51f0573c6b0365.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"dWGL92c5LzTMn7XX6utn2\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_12bbc4\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
->>>>>>> 73a7b4f4 (refactor(main.py): trigger new build)
+<!DOCTYPE html><html id="__next_error__"><head><meta charSet="utf-8"/><meta name="viewport" content="width=device-width, initial-scale=1"/><link rel="preload" as="script" fetchPriority="low" href="/ui/_next/static/chunks/webpack-202e312607f242a1.js" crossorigin=""/><script src="/ui/_next/static/chunks/fd9d1056-dafd44dfa2da140c.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/69-e49705773ae41779.js" async="" crossorigin=""></script><script src="/ui/_next/static/chunks/main-app-9b4fb13a7db53edf.js" async="" crossorigin=""></script><title>LiteLLM Dashboard</title><meta name="description" content="LiteLLM Proxy Admin UI"/><link rel="icon" href="/ui/favicon.ico" type="image/x-icon" sizes="16x16"/><meta name="next-size-adjust"/><script src="/ui/_next/static/chunks/polyfills-c67a75d1b6f99dc8.js" crossorigin="" noModule=""></script></head><body><script src="/ui/_next/static/chunks/webpack-202e312607f242a1.js" crossorigin="" async=""></script><script>(self.__next_f=self.__next_f||[]).push([0]);self.__next_f.push([2,null])</script><script>self.__next_f.push([1,"1:HL[\"/ui/_next/static/media/c9a5bc6a7c948fb0-s.p.woff2\",\"font\",{\"crossOrigin\":\"\",\"type\":\"font/woff2\"}]\n2:HL[\"/ui/_next/static/css/00c2ddbcd01819c0.css\",\"style\",{\"crossOrigin\":\"\"}]\n0:\"$L3\"\n"])</script><script>self.__next_f.push([1,"4:I[47690,[],\"\"]\n6:I[77831,[],\"\"]\n7:I[58854,[\"936\",\"static/chunks/2f6dbc85-17d29013b8ff3da5.js\",\"142\",\"static/chunks/142-11990a208bf93746.js\",\"931\",\"static/chunks/app/page-d9bdfedbff191985.js\"],\"\"]\n8:I[5613,[],\"\"]\n9:I[31778,[],\"\"]\nb:I[48955,[],\"\"]\nc:[]\n"])</script><script>self.__next_f.push([1,"3:[[[\"$\",\"link\",\"0\",{\"rel\":\"stylesheet\",\"href\":\"/ui/_next/static/css/00c2ddbcd01819c0.css\",\"precedence\":\"next\",\"crossOrigin\":\"\"}]],[\"$\",\"$L4\",null,{\"buildId\":\"e55gTzpa2g2-9SwXgA9Uo\",\"assetPrefix\":\"/ui\",\"initialCanonicalUrl\":\"/\",\"initialTree\":[\"\",{\"children\":[\"__PAGE__\",{}]},\"$undefined\",\"$undefined\",true],\"initialSeedData\":[\"\",{\"children\":[\"__PAGE__\",{},[\"$L5\",[\"$\",\"$L6\",null,{\"propsForComponent\":{\"params\":{}},\"Component\":\"$7\",\"isStaticGeneration\":true}],null]]},[null,[\"$\",\"html\",null,{\"lang\":\"en\",\"children\":[\"$\",\"body\",null,{\"className\":\"__className_c23dc8\",\"children\":[\"$\",\"$L8\",null,{\"parallelRouterKey\":\"children\",\"segmentPath\":[\"children\"],\"loading\":\"$undefined\",\"loadingStyles\":\"$undefined\",\"loadingScripts\":\"$undefined\",\"hasLoading\":false,\"error\":\"$undefined\",\"errorStyles\":\"$undefined\",\"errorScripts\":\"$undefined\",\"template\":[\"$\",\"$L9\",null,{}],\"templateStyles\":\"$undefined\",\"templateScripts\":\"$undefined\",\"notFound\":[[\"$\",\"title\",null,{\"children\":\"404: This page could not be found.\"}],[\"$\",\"div\",null,{\"style\":{\"fontFamily\":\"system-ui,\\\"Segoe UI\\\",Roboto,Helvetica,Arial,sans-serif,\\\"Apple Color Emoji\\\",\\\"Segoe UI Emoji\\\"\",\"height\":\"100vh\",\"textAlign\":\"center\",\"display\":\"flex\",\"flexDirection\":\"column\",\"alignItems\":\"center\",\"justifyContent\":\"center\"},\"children\":[\"$\",\"div\",null,{\"children\":[[\"$\",\"style\",null,{\"dangerouslySetInnerHTML\":{\"__html\":\"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}\"}}],[\"$\",\"h1\",null,{\"className\":\"next-error-h1\",\"style\":{\"display\":\"inline-block\",\"margin\":\"0 20px 0 0\",\"padding\":\"0 23px 0 0\",\"fontSize\":24,\"fontWeight\":500,\"verticalAlign\":\"top\",\"lineHeight\":\"49px\"},\"children\":\"404\"}],[\"$\",\"div\",null,{\"style\":{\"display\":\"inline-block\"},\"children\":[\"$\",\"h2\",null,{\"style\":{\"fontSize\":14,\"fontWeight\":400,\"lineHeight\":\"49px\",\"margin\":0},\"children\":\"This page could not be found.\"}]}]]}]}]],\"notFoundStyles\":[],\"styles\":null}]}]}],null]],\"initialHead\":[false,\"$La\"],\"globalErrorComponent\":\"$b\",\"missingSlots\":\"$Wc\"}]]\n"])</script><script>self.__next_f.push([1,"a:[[\"$\",\"meta\",\"0\",{\"name\":\"viewport\",\"content\":\"width=device-width, initial-scale=1\"}],[\"$\",\"meta\",\"1\",{\"charSet\":\"utf-8\"}],[\"$\",\"title\",\"2\",{\"children\":\"LiteLLM Dashboard\"}],[\"$\",\"meta\",\"3\",{\"name\":\"description\",\"content\":\"LiteLLM Proxy Admin UI\"}],[\"$\",\"link\",\"4\",{\"rel\":\"icon\",\"href\":\"/ui/favicon.ico\",\"type\":\"image/x-icon\",\"sizes\":\"16x16\"}],[\"$\",\"meta\",\"5\",{\"name\":\"next-size-adjust\"}]]\n5:null\n"])</script><script>self.__next_f.push([1,""])</script></body></html>
--- a/ui/litellm-dashboard/out/index.txt
+++ b/ui/litellm-dashboard/out/index.txt
@ -1,14 +1,7 @@
 2:I[77831,[],""]
-<<<<<<< HEAD
-3:I[46414,["761","static/chunks/761-05f8a8451296476c.js","931","static/chunks/app/page-5a4a198eefedc775.js"],""]
+3:I[58854,["936","static/chunks/2f6dbc85-17d29013b8ff3da5.js","142","static/chunks/142-11990a208bf93746.js","931","static/chunks/app/page-d9bdfedbff191985.js"],""]
 4:I[5613,[],""]
 5:I[31778,[],""]
-0:["c5rha8cqAah-saaczjn02",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_c23dc8","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/00c2ddbcd01819c0.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
-=======
-3:I[46414,["386","static/chunks/386-d811195b597a2122.js","931","static/chunks/app/page-e0ee34389254cdf2.js"],""]
-4:I[5613,[],""]
-5:I[31778,[],""]
-0:["dWGL92c5LzTMn7XX6utn2",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_12bbc4","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/9f51f0573c6b0365.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
->>>>>>> 73a7b4f4 (refactor(main.py): trigger new build)
+0:["e55gTzpa2g2-9SwXgA9Uo",[[["",{"children":["__PAGE__",{}]},"$undefined","$undefined",true],["",{"children":["__PAGE__",{},["$L1",["$","$L2",null,{"propsForComponent":{"params":{}},"Component":"$3","isStaticGeneration":true}],null]]},[null,["$","html",null,{"lang":"en","children":["$","body",null,{"className":"__className_c23dc8","children":["$","$L4",null,{"parallelRouterKey":"children","segmentPath":["children"],"loading":"$undefined","loadingStyles":"$undefined","loadingScripts":"$undefined","hasLoading":false,"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L5",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[],"styles":null}]}]}],null]],[[["$","link","0",{"rel":"stylesheet","href":"/ui/_next/static/css/00c2ddbcd01819c0.css","precedence":"next","crossOrigin":""}]],"$L6"]]]]
 6:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"LiteLLM Dashboard"}],["$","meta","3",{"name":"description","content":"LiteLLM Proxy Admin UI"}],["$","link","4",{"rel":"icon","href":"/ui/favicon.ico","type":"image/x-icon","sizes":"16x16"}],["$","meta","5",{"name":"next-size-adjust"}]]
 1:null
--- a/ui/litellm-dashboard/src/components/model_dashboard.tsx
+++ b/ui/litellm-dashboard/src/components/model_dashboard.tsx
@ -16,8 +16,8 @@ import {
  AccordionHeader,
  AccordionBody,
 } from "@tremor/react";
-import { TabPanel, TabPanels, TabGroup, TabList, Tab, TextInput, Icon } from "@tremor/react";
-import { Select, SelectItem, MultiSelect, MultiSelectItem } from "@tremor/react";
+import { TabPanel, TabPanels, TabGroup, TabList, Tab, TextInput, Icon, DateRangePicker } from "@tremor/react";
+import { Select, SelectItem, MultiSelect, MultiSelectItem, DateRangePickerValue } from "@tremor/react";
 import { modelInfoCall, userGetRequesedtModelsCall, modelCreateCall, Model, modelCostMap, modelDeleteCall, healthCheckCall, modelUpdateCall, modelMetricsCall, modelExceptionsCall, modelMetricsSlowResponsesCall } from "./networking";
 import { BarChart, AreaChart } from "@tremor/react";
 import {
@ -206,6 +206,10 @@ const ModelDashboard: React.FC<ModelDashboardProps> = ({
  const [allExceptions, setAllExceptions] = useState<any[]>([]);
  const [failureTableData, setFailureTableData] = useState<any[]>([]);
  const [slowResponsesData, setSlowResponsesData] = useState<any[]>([]);
+  const [dateValue, setDateValue] = useState<DateRangePickerValue>({
+    from: new Date(Date.now() - 7 * 24 * 60 * 60 * 1000), 
+    to: new Date(),
+  });

  const EditModelModal: React.FC<EditModelModalProps> = ({ visible, onCancel, model, onSubmit }) => {
    const [form] = Form.useForm();
@ -454,11 +458,25 @@ const handleEditSubmit = async (formValues: Record<string, any>) => {

        setAvailableModelGroups(_array_model_groups);

+        console.log("array_model_groups:", _array_model_groups)
+        let _initial_model_group = "all"
+        if (_array_model_groups.length > 0) {
+          // set selectedModelGroup to the last model group
+          _initial_model_group = _array_model_groups[_array_model_groups.length - 1];
+          console.log("_initial_model_group:", _initial_model_group)
+          setSelectedModelGroup(_initial_model_group);
+        }
+
+        console.log("selectedModelGroup:", selectedModelGroup)
+        
+
        const modelMetricsResponse = await modelMetricsCall(
          accessToken,
          userID,
          userRole,
-          null
+          _initial_model_group,
+          dateValue.from?.toISOString(),
+          dateValue.to?.toISOString()
        );

        console.log("Model metrics response:", modelMetricsResponse);
@ -473,7 +491,9 @@ const handleEditSubmit = async (formValues: Record<string, any>) => {
          accessToken,
          userID,
          userRole,
-          null
+          _initial_model_group,
+          dateValue.from?.toISOString(),
+          dateValue.to?.toISOString()
        )
        console.log("Model exceptions response:", modelExceptionsResponse);
        setModelExceptions(modelExceptionsResponse.data);
@ -484,7 +504,9 @@ const handleEditSubmit = async (formValues: Record<string, any>) => {
          accessToken,
          userID,
          userRole,
-          null
+          _initial_model_group,
+          dateValue.from?.toISOString(),
+          dateValue.to?.toISOString()
        )

        console.log("slowResponses:", slowResponses)
@ -492,40 +514,6 @@ const handleEditSubmit = async (formValues: Record<string, any>) => {
        setSlowResponsesData(slowResponses);


-        // let modelMetricsData = modelMetricsResponse.data;
-        // let successdeploymentToSuccess: Record<string, number> = {};
-        // for  (let i = 0; i < modelMetricsData.length; i++) {
-        //   let element = modelMetricsData[i];
-        //   let _model_name = element.model;
-        //   let _num_requests = element.num_requests;
-        //   successdeploymentToSuccess[_model_name] = _num_requests
-        // }
-        // console.log("successdeploymentToSuccess:", successdeploymentToSuccess)
-        
-        // let failureTableData = [];
-        // let _failureData = modelExceptionsResponse.data;
-        // for (let i = 0; i < _failureData.length; i++) {
-        //   const model = _failureData[i];
-        //   let _model_name = model.model;
-        //   let total_exceptions = model.total_exceptions;
-        //   let total_Requests = successdeploymentToSuccess[_model_name];
-        //   if (total_Requests == null) {
-        //     total_Requests = 0
-        //   }
-        //   let _data = {
-        //     model: _model_name,
-        //     total_exceptions: total_exceptions,
-        //     total_Requests: total_Requests,
-        //     failure_rate: total_Requests / total_exceptions
-        //   }
-        //   failureTableData.push(_data);
-        //   // sort failureTableData by failure_rate
-        //   failureTableData.sort((a, b) => b.failure_rate - a.failure_rate);
-        
-        //   setFailureTableData(failureTableData);
-        //   console.log("failureTableData:", failureTableData);
-        // }
-
      } catch (error) {
        console.error("There was an error fetching the model data", error);
      }
@ -678,16 +666,17 @@ const handleEditSubmit = async (formValues: Record<string, any>) => {
  };


-  const updateModelMetrics = async (modelGroup: string | null) => {
+  const updateModelMetrics = async (modelGroup: string | null, startTime: Date | undefined, endTime: Date | undefined) => {
    console.log("Updating model metrics for group:", modelGroup);
-    if (!accessToken || !userID || !userRole) {
+    if (!accessToken || !userID || !userRole || !startTime || !endTime) {
      return
    }
+    console.log("inside updateModelMetrics - startTime:", startTime, "endTime:", endTime)
    setSelectedModelGroup(modelGroup);  // If you want to store the selected model group in state

  
    try {
-      const modelMetricsResponse = await modelMetricsCall(accessToken, userID, userRole, modelGroup);
+      const modelMetricsResponse = await modelMetricsCall(accessToken, userID, userRole, modelGroup, startTime.toISOString(), endTime.toISOString());
      console.log("Model metrics response:", modelMetricsResponse);
  
      // Assuming modelMetricsResponse now contains the metric data for the specified model group
@ -698,7 +687,9 @@ const handleEditSubmit = async (formValues: Record<string, any>) => {
        accessToken,
        userID,
        userRole,
-        modelGroup
+        modelGroup,
+        startTime.toISOString(),
+        endTime.toISOString()
      )
      console.log("Model exceptions response:", modelExceptionsResponse);
      setModelExceptions(modelExceptionsResponse.data);
@ -709,7 +700,9 @@ const handleEditSubmit = async (formValues: Record<string, any>) => {
        accessToken,
        userID,
        userRole,
-        modelGroup
+        modelGroup,
+        startTime.toISOString(),
+        endTime.toISOString()
      )

      console.log("slowResponses:", slowResponses)
@ -1118,21 +1111,48 @@ const handleEditSubmit = async (formValues: Record<string, any>) => {
        </Card>
      </TabPanel>
      <TabPanel>
-              <p style={{fontSize: '0.85rem', color: '#808080'}}>View how requests were load balanced within a model group</p>
+              {/* <p style={{fontSize: '0.85rem', color: '#808080'}}>View how requests were load balanced within a model group</p> */}
+            
+            <Grid numItems={2} className="mt-2">
+              <Col>
+              <Text>Select Time Range</Text>
+                <DateRangePicker 
+                  enableSelect={true} 
+                  value={dateValue} 
+                  onValueChange={(value) => {
+                    setDateValue(value);
+                    updateModelMetrics(selectedModelGroup, value.from, value.to); // Call updateModelMetrics with the new date range
+                  }}
+                />
+              </Col>
+              <Col>
+              <Text>Select Model Group</Text>
                <Select
                className="mb-4 mt-2"
+                defaultValue={selectedModelGroup? selectedModelGroup : availableModelGroups[0]}
+                value={selectedModelGroup ? selectedModelGroup : availableModelGroups[0]}
              >
                {availableModelGroups.map((group, idx) => (
                  <SelectItem 
                    key={idx} 
                    value={group}
-                  onClick={() => updateModelMetrics(group)}
+                    onClick={() => updateModelMetrics(group, dateValue.from, dateValue.to)}
                  >
                    {group}
                  </SelectItem>
                ))}
              </Select>

+              </Col>
+
+              
+          
+
+            </Grid>
+            
+
+
+
            <Grid numItems={2}>
              <Col>
              <Card className="mr-2 max-h-[400px] min-h-[400px]">
--- a/ui/litellm-dashboard/src/components/networking.tsx
+++ b/ui/litellm-dashboard/src/components/networking.tsx
@ -441,6 +441,8 @@ export const modelMetricsCall = async (
  userID: String,
  userRole: String, 
  modelGroup: String | null,
+  startTime: String | undefined,
+  endTime: String | undefined
 ) => {
  /**
   * Get all models on proxy
@ -448,7 +450,7 @@ export const modelMetricsCall = async (
  try {
    let url = proxyBaseUrl ? `${proxyBaseUrl}/model/metrics` : `/model/metrics`;
    if (modelGroup) {
-      url = `${url}?_selected_model_group=${modelGroup}`
+      url = `${url}?_selected_model_group=${modelGroup}&startTime=${startTime}&endTime=${endTime}`
    }
    // message.info("Requesting model data");
    const response = await fetch(url, {
@ -481,6 +483,8 @@ export const modelMetricsSlowResponsesCall = async (
  userID: String,
  userRole: String, 
  modelGroup: String | null,
+  startTime: String | undefined,
+  endTime: String | undefined
 ) => {
  /**
   * Get all models on proxy
@ -488,8 +492,9 @@ export const modelMetricsSlowResponsesCall = async (
  try {
    let url = proxyBaseUrl ? `${proxyBaseUrl}/model/metrics/slow_responses` : `/model/metrics/slow_responses`;
    if (modelGroup) {
-      url = `${url}?_selected_model_group=${modelGroup}`
+      url = `${url}?_selected_model_group=${modelGroup}&startTime=${startTime}&endTime=${endTime}`
    }
+    
    // message.info("Requesting model data");
    const response = await fetch(url, {
      method: "GET",
@ -520,6 +525,8 @@ export const modelExceptionsCall = async (
  userID: String,
  userRole: String, 
  modelGroup: String | null,
+  startTime: String | undefined,
+  endTime: String | undefined
 ) => {
  /**
   * Get all models on proxy
@ -527,6 +534,9 @@ export const modelExceptionsCall = async (
  try {
    let url = proxyBaseUrl ? `${proxyBaseUrl}/model/metrics/exceptions` : `/model/metrics/exceptions`;

+    if (modelGroup) {
+      url = `${url}?_selected_model_group=${modelGroup}&startTime=${startTime}&endTime=${endTime}`
+    }
    const response = await fetch(url, {
      method: "GET",
      headers: {
--- a/ui/litellm-dashboard/src/components/settings.tsx
+++ b/ui/litellm-dashboard/src/components/settings.tsx
@ -106,7 +106,8 @@ const Settings: React.FC<SettingsPageProps> = ({
    "llm_exceptions": "LLM Exceptions",
    "llm_too_slow": "LLM Responses Too Slow",
    "llm_requests_hanging": "LLM Requests Hanging",
-    "budget_alerts": "Budget Alerts (API Keys, Users)"
+    "budget_alerts": "Budget Alerts (API Keys, Users)",
+    "db_exceptions": "Database Exceptions (Read/Write)",
  }

  useEffect(() => {
				`@ -1 +0,0 @@`
				`(self.webpackChunk_N_E=self.webpackChunk_N_E\|\|[]).push([[185],{93553:function(n,e,t){Promise.resolve().then(t.t.bind(t,63385,23)),Promise.resolve().then(t.t.bind(t,99646,23))},63385:function(){},99646:function(n){n.exports={style:{fontFamily:"'__Inter_12bbc4', '__Inter_Fallback_12bbc4'",fontStyle:"normal"},className:"__className_12bbc4"}}},function(n){n.O(0,[971,69,744],function(){return n(n.s=93553)}),_N_E=n.O()}]);`
				`@ -0,0 +1 @@`
				`(self.webpackChunk_N_E=self.webpackChunk_N_E\|\|[]).push([[185],{87421:function(n,e,t){Promise.resolve().then(t.t.bind(t,99646,23)),Promise.resolve().then(t.t.bind(t,63385,23))},63385:function(){},99646:function(n){n.exports={style:{fontFamily:"'__Inter_c23dc8', '__Inter_Fallback_c23dc8'",fontStyle:"normal"},className:"__className_c23dc8"}}},function(n){n.O(0,[971,69,744],function(){return n(n.s=87421)}),_N_E=n.O()}]);`
				`@ -1 +0,0 @@`
				`self.__BUILD_MANIFEST={__rewrites:{afterFiles:[],beforeFiles:[],fallback:[]},"/_error":["static/chunks/pages/_error-d6107f1aac0c574c.js"],sortedPages:["/_app","/_error"]},self.__BUILD_MANIFEST_CB&&self.__BUILD_MANIFEST_CB();`
				`@ -1 +0,0 @@`
				`self.__SSG_MANIFEST=new Set([]);self.__SSG_MANIFEST_CB&&self.__SSG_MANIFEST_CB()`
				`@ -0,0 +1 @@`
				!function(){"use strict";var e,t,n,r,o,u,i,c,f,a={},l={};function d(e){var t=l[e];if(void 0!==t)return t.exports;var n=l[e]={id:e,loaded:!1,exports:{}},r=!0;try{a[e](n,n.exports,d),r=!1}finally{r&&delete l[e]}return n.loaded=!0,n.exports}d.m=a,e=[],d.O=function(t,n,r,o){if(n){o=o\|\|0;for(var u=e.length;u>0&&e[u-1][2]>o;u--)e[u]=e[u-1];e[u]=[n,r,o];return}for(var i=1/0,u=0;u<e.length;u++){for(var n=e[u][0],r=e[u][1],o=e[u][2],c=!0,f=0;f<n.length;f++)i>=o&&Object.keys(d.O).every(function(e){return d.O[e](n[f])})?n.splice(f--,1):(c=!1,o<i&&(i=o));if(c){e.splice(u--,1);var a=r();void 0!==a&&(t=a)}}return t},d.n=function(e){var t=e&&e.__esModule?function(){return e.default}:function(){return e};return d.d(t,{a:t}),t},n=Object.getPrototypeOf?function(e){return Object.getPrototypeOf(e)}:function(e){return e.__proto__},d.t=function(e,r){if(1&r&&(e=this(e)),8&r\|\|"object"==typeof e&&e&&(4&r&&e.__esModule\|\|16&r&&"function"==typeof e.then))return e;var o=Object.create(null);d.r(o);var u={};t=t\|\|[null,n({}),n([]),n(n)];for(var i=2&r&&e;"object"==typeof i&&!~t.indexOf(i);i=n(i))Object.getOwnPropertyNames(i).forEach(function(t){u[t]=function(){return e[t]}});return u.default=function(){return e},d.d(o,u),o},d.d=function(e,t){for(var n in t)d.o(t,n)&&!d.o(e,n)&&Object.defineProperty(e,n,{enumerable:!0,get:t[n]})},d.f={},d.e=function(e){return Promise.all(Object.keys(d.f).reduce(function(t,n){return d.f[n](e,t),t},[]))},d.u=function(e){},d.miniCssF=function(e){return"static/css/00c2ddbcd01819c0.css"},d.g=function(){if("object"==typeof globalThis)return globalThis;try{return this\|\|Function("return this")()}catch(e){if("object"==typeof window)return window}}(),d.o=function(e,t){return Object.prototype.hasOwnProperty.call(e,t)},r={},o="_N_E:",d.l=function(e,t,n,u){if(r[e]){r[e].push(t);return}if(void 0!==n)for(var i,c,f=document.getElementsByTagName("script"),a=0;a<f.length;a++){var l=f[a];if(l.getAttribute("src")==e\|\|l.getAttribute("data-webpack")==o+n){i=l;break}}i\|\|(c=!0,(i=document.createElement("script")).charset="utf-8",i.timeout=120,d.nc&&i.setAttribute("nonce",d.nc),i.setAttribute("data-webpack",o+n),i.src=d.tu(e)),r[e]=[t];var s=function(t,n){i.onerror=i.onload=null,clearTimeout(p);var o=r[e];if(delete r[e],i.parentNode&&i.parentNode.removeChild(i),o&&o.forEach(function(e){return e(n)}),t)return t(n)},p=setTimeout(s.bind(null,void 0,{type:"timeout",target:i}),12e4);i.onerror=s.bind(null,i.onerror),i.onload=s.bind(null,i.onload),c&&document.head.appendChild(i)},d.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},d.nmd=function(e){return e.paths=[],e.children\|\|(e.children=[]),e},d.tt=function(){return void 0===u&&(u={createScriptURL:function(e){return e}},"undefined"!=typeof trustedTypes&&trustedTypes.createPolicy&&(u=trustedTypes.createPolicy("nextjs#bundler",u))),u},d.tu=function(e){return d.tt().createScriptURL(e)},d.p="/ui/_next/",i={272:0},d.f.j=function(e,t){var n=d.o(i,e)?i[e]:void 0;if(0!==n){if(n)t.push(n[2]);else if(272!=e){var r=new Promise(function(t,r){n=i[e]=[t,r]});t.push(n[2]=r);var o=d.p+d.u(e),u=Error();d.l(o,function(t){if(d.o(i,e)&&(0!==(n=i[e])&&(i[e]=void 0),n)){var r=t&&("load"===t.type?"missing":t.type),o=t&&t.target&&t.target.src;u.message="Loading chunk "+e+" failed.\n("+r+": "+o+")",u.name="ChunkLoadError",u.type=r,u.request=o,n[1](u)}},"chunk-"+e,e)}else i[e]=0}},d.O.j=function(e){return 0===i[e]},c=function(e,t){var n,r,o=t[0],u=t[1],c=t[2],f=0;if(o.some(function(e){return 0!==i[e]})){for(n in u)d.o(u,n)&&(d.m[n]=u[n]);if(c)var a=c(d)}for(e&&e(t);f<o.length;f++)r=o[f],d.o(i,r)&&i[r]&&i[r][0](),i[r]=0;return d.O(a)},(f=self.webpackChunk_N_E=self.webpackChunk_N_E\|\|[]).forEach(c.bind(null,0)),f.push=c.bind(null,f.push.bind(f))}();