LiteLLM Minor Fixes & Improvements (11/13/2024) (#6729)

* fix(utils.py): add logprobs support for together ai Fixes https://github.com/BerriAI/litellm/issues/6724 * feat(pass_through_endpoints/): add anthropic/ pass-through endpoint adds new `anthropic/` pass-through endpoint + refactors docs * feat(spend_management_endpoints.py): allow /global/spend/report to query team + customer id enables seeing spend for a customer in a team * Add integration with MLflow Tracing (#6147) * Add MLflow logger Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Streaming handling Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * lint Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * address comments and fix issues Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * address comments and fix issues Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Move logger construction code Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Add docs Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * async handlers Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * new picture Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> --------- Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * fix(mlflow.py): fix ruff linting errors * ci(config.yml): add mlflow to ci testing * fix: fix test * test: fix test * Litellm key update fix (#6710) * fix(caching): convert arg to equivalent kwargs in llm caching handler prevent unexpected errors * fix(caching_handler.py): don't pass args to caching * fix(caching): remove all *args from caching.py * fix(caching): consistent function signatures + abc method * test(caching_unit_tests.py): add unit tests for llm caching ensures coverage for common caching scenarios across different implementations * refactor(litellm_logging.py): move to using cache key from hidden params instead of regenerating one * fix(router.py): drop redis password requirement * fix(proxy_server.py): fix faulty slack alerting check * fix(langfuse.py): avoid copying functions/thread lock objects in metadata fixes metadata copy error when parent otel span in metadata * test: update test * fix(key_management_endpoints.py): fix /key/update with metadata update * fix(key_management_endpoints.py): fix key_prepare_update helper * fix(key_management_endpoints.py): reset value to none if set in key update * fix: update test ' * Litellm dev 11 11 2024 (#6693) * fix(__init__.py): add 'watsonx_text' as mapped llm api route Fixes https://github.com/BerriAI/litellm/issues/6663 * fix(opentelemetry.py): fix passing parallel tool calls to otel Fixes https://github.com/BerriAI/litellm/issues/6677 * refactor(test_opentelemetry_unit_tests.py): create a base set of unit tests for all logging integrations - test for parallel tool call handling reduces bugs in repo * fix(__init__.py): update provider-model mapping to include all known provider-model mappings Fixes https://github.com/BerriAI/litellm/issues/6669 * feat(anthropic): support passing document in llm api call * docs(anthropic.md): add pdf anthropic call to docs + expose new 'supports_pdf_input' function * fix(factory.py): fix linting error * add clear doc string for GCS bucket logging * Add docs to export logs to Laminar (#6674) * Add docs to export logs to Laminar * minor fix: newline at end of file * place laminar after http and grpc * (Feat) Add langsmith key based logging (#6682) * add langsmith_api_key to StandardCallbackDynamicParams * create a file for langsmith types * langsmith add key / team based logging * add key based logging for langsmith * fix langsmith key based logging * fix linting langsmith * remove NOQA violation * add unit test coverage for all helpers in test langsmith * test_langsmith_key_based_logging * docs langsmith key based logging * run langsmith tests in logging callback tests * fix logging testing * test_langsmith_key_based_logging * test_add_callback_via_key_litellm_pre_call_utils_langsmith * add debug statement langsmith key based logging * test_langsmith_key_based_logging * (fix) OpenAI's optional messages[].name does not work with Mistral API (#6701) * use helper for _transform_messages mistral * add test_message_with_name to base LLMChat test * fix linting * add xAI on Admin UI (#6680) * (docs) add benchmarks on 1K RPS (#6704) * docs litellm proxy benchmarks * docs GCS bucket * doc fix - reduce clutter on logging doc title * (feat) add cost tracking stable diffusion 3 on Bedrock (#6676) * add cost tracking for sd3 * test_image_generation_bedrock * fix get model info for image cost * add cost_calculator for stability 1 models * add unit testing for bedrock image cost calc * test_cost_calculator_with_no_optional_params * add test_cost_calculator_basic * correctly allow size Optional * fix cost_calculator * sd3 unit tests cost calc * fix raise correct error 404 when /key/info is called on non-existent key (#6653) * fix raise correct error on /key/info * add not_found_error error * fix key not found in DB error * use 1 helper for checking token hash * fix error code on key info * fix test key gen prisma * test_generate_and_call_key_info * test fix test_call_with_valid_model_using_all_models * fix key info tests * bump: version 1.52.4 → 1.52.5 * add defaults used for GCS logging * LiteLLM Minor Fixes & Improvements (11/12/2024) (#6705) * fix(caching): convert arg to equivalent kwargs in llm caching handler prevent unexpected errors * fix(caching_handler.py): don't pass args to caching * fix(caching): remove all *args from caching.py * fix(caching): consistent function signatures + abc method * test(caching_unit_tests.py): add unit tests for llm caching ensures coverage for common caching scenarios across different implementations * refactor(litellm_logging.py): move to using cache key from hidden params instead of regenerating one * fix(router.py): drop redis password requirement * fix(proxy_server.py): fix faulty slack alerting check * fix(langfuse.py): avoid copying functions/thread lock objects in metadata fixes metadata copy error when parent otel span in metadata * test: update test * bump: version 1.52.5 → 1.52.6 * (feat) helm hook to sync db schema (#6715) * v0 migration job * fix job * fix migrations job.yml * handle standalone DB on helm hook * fix argo cd annotations * fix db migration helm hook * fix migration job * doc fix Using Http/2 with Hypercorn * (fix proxy redis) Add redis sentinel support (#6154) * add sentinel_password support * add doc for setting redis sentinel password * fix redis sentinel - use sentinel password * Fix: Update gpt-4o costs to that of gpt-4o-2024-08-06 (#6714) Fixes #6713 * (fix) using Anthropic `response_format={"type": "json_object"}` (#6721) * add support for response_format=json anthropic * add test_json_response_format to baseLLM ChatTest * fix test_litellm_anthropic_prompt_caching_tools * fix test_anthropic_function_call_with_no_schema * test test_create_json_tool_call_for_response_format * (feat) Add cost tracking for Azure Dall-e-3 Image Generation + use base class to ensure basic image generation tests pass (#6716) * add BaseImageGenTest * use 1 class for unit testing * add debugging to BaseImageGenTest * TestAzureOpenAIDalle3 * fix response_cost_calculator * test_basic_image_generation * fix img gen basic test * fix _select_model_name_for_cost_calc * fix test_aimage_generation_bedrock_with_optional_params * fix undo changes cost tracking * fix response_cost_calculator * fix test_cost_azure_gpt_35 * fix remove dup test (#6718) * (build) update db helm hook * (build) helm db pre sync hook * (build) helm db sync hook * test: run test_team_logging firdst --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Dinmukhamed Mailibay <47117969+dinmukhamedm@users.noreply.github.com> Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de> * test: update test * test: skip anthropic overloaded error * test: cleanup test * test: update tests * test: fix test * test: handle gemini overloaded model error * test: handle internal server error * test: handle anthropic overloaded error * test: handle claude instability --------- Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Dinmukhamed Mailibay <47117969+dinmukhamedm@users.noreply.github.com> Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de>
2024-11-15 11:18:31 +05:30 · 2024-11-15 11:18:31 +05:30 · 3beecfb0d4
commit 3beecfb0d4
parent 3f8a9167ae
23 changed files with 874 additions and 97 deletions
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@ -690,6 +690,7 @@ jobs:
            pip install "respx==0.21.1"
            pip install "google-generativeai==0.3.2"
            pip install "google-cloud-aiplatform==1.43.0"
+            pip install "mlflow==2.17.2"
      # Run pytest and generate JUnit XML report
      - run:
          name: Run tests
--- a/README.md
+++ b/README.md
@ -113,7 +113,7 @@ for part in response:

 ## Logging Observability ([Docs](https://docs.litellm.ai/docs/observability/callbacks))

-LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack
+LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack, MLflow

 ```python
 from litellm import completion
--- a/docs/my-website/docs/anthropic_completion.md
+++ b/docs/my-website/docs/anthropic_completion.md
@ -1,54 +0,0 @@
-# [BETA] Anthropic `/v1/messages`
-
-Call 100+ LLMs in the Anthropic format. 
-
-
-1. Setup config.yaml 
-
-```yaml
-model_list:
-  - model_name: my-test-model
-    litellm_params:
-      model: gpt-3.5-turbo
-```
-
-2. Start proxy 
-
-```bash
-litellm --config /path/to/config.yaml
-```
-
-3. Test it! 
-
-```bash
-curl -X POST 'http://0.0.0.0:4000/v1/messages' \
-H 'x-api-key: sk-1234' \
-H 'content-type: application/json' \
-D '{
-    "model": "my-test-model",
-    "max_tokens": 1024,
-    "messages": [
-        {"role": "user", "content": "Hello, world"}
-    ]
-}'
-```
-
-## Test with Anthropic SDK 
-
-```python
-import os
-from anthropic import Anthropic
-
-client = Anthropic(api_key="sk-1234", base_url="http://0.0.0.0:4000") # 👈 CONNECT TO PROXY
-
-message = client.messages.create(
-    messages=[
-        {
-            "role": "user",
-            "content": "Hello, Claude",
-        }
-    ],
-    model="my-test-model", # 👈 set 'model_name'
-)
-print(message.content)
-```
--- a/docs/my-website/docs/observability/mlflow.md
+++ b/docs/my-website/docs/observability/mlflow.md
@ -0,0 +1,108 @@
+# MLflow
+
+## What is MLflow?
+
+**MLflow** is an end-to-end open source MLOps platform for [experiment tracking](https://www.mlflow.org/docs/latest/tracking.html), [model management](https://www.mlflow.org/docs/latest/models.html), [evaluation](https://www.mlflow.org/docs/latest/llms/llm-evaluate/index.html), [observability (tracing)](https://www.mlflow.org/docs/latest/llms/tracing/index.html), and [deployment](https://www.mlflow.org/docs/latest/deployment/index.html). MLflow empowers teams to collaboratively develop and refine LLM applications efficiently.
+
+MLflow’s integration with LiteLLM supports advanced observability compatible with OpenTelemetry.
+
+
+<Image img={require('../../img/mlflow_tracing.png')} />
+
+
+## Getting Started
+
+Install MLflow:
+
+```shell
+pip install mlflow
+```
+
+To enable LiteLLM tracing:
+
+```python
+import mlflow
+
+mlflow.litellm.autolog()
+
+# Alternative, you can set the callback manually in LiteLLM
+# litellm.callbacks = ["mlflow"]
+```
+
+Since MLflow is open-source, no sign-up or API key is needed to log traces!
+
+```
+import litellm
+import os
+
+# Set your LLM provider's API key
+os.environ["OPENAI_API_KEY"] = ""
+
+# Call LiteLLM as usual
+response = litellm.completion(
+    model="gpt-4o-mini",
+    messages=[
+      {"role": "user", "content": "Hi 👋 - i'm openai"}
+    ]
+)
+```
+
+Open the MLflow UI and go to the `Traces` tab to view logged traces:
+
+```bash
+mlflow ui
+```
+
+## Exporting Traces to OpenTelemetry collectors
+
+MLflow traces are compatible with OpenTelemetry. You can export traces to any OpenTelemetry collector (e.g., Jaeger, Zipkin, Datadog, New Relic) by setting the endpoint URL in the environment variables.
+
+```
+# Set the endpoint of the OpenTelemetry Collector
+os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = "http://localhost:4317/v1/traces"
+# Optionally, set the service name to group traces
+os.environ["OTEL_SERVICE_NAME"] = "<your-service-name>"
+```
+
+See [MLflow documentation](https://mlflow.org/docs/latest/llms/tracing/index.html#using-opentelemetry-collector-for-exporting-traces) for more details.
+
+## Combine LiteLLM Trace with Your Application Trace
+
+LiteLLM is often part of larger LLM applications, such as agentic models. MLflow Tracing allows you to instrument custom Python code, which can then be combined with LiteLLM traces.
+
+```python
+import litellm
+import mlflow
+from mlflow.entities import SpanType
+
+# Enable LiteLLM tracing
+mlflow.litellm.autolog()
+
+
+class CustomAgent:
+    # Use @mlflow.trace to instrument Python functions.
+    @mlflow.trace(span_type=SpanType.AGENT)
+    def run(self, query: str):
+        # do something
+
+        while i < self.max_turns:
+            response = litellm.completion(
+                model="gpt-4o-mini",
+                messages=messages,
+            )
+
+            action = self.get_action(response)
+            ...
+
+    @mlflow.trace
+    def get_action(llm_response):
+        ...
+```
+
+This approach generates a unified trace, combining your custom Python code with LiteLLM calls.
+
+
+## Support
+
+* For advanced usage and integrations of tracing, visit the [MLflow Tracing documentation](https://mlflow.org/docs/latest/llms/tracing/index.html).
+* For any question or issue with this integration, please [submit an issue](https://github.com/mlflow/mlflow/issues/new/choose) on our [Github](https://github.com/mlflow/mlflow) repository!
--- a/docs/my-website/docs/pass_through/anthropic_completion.md
+++ b/docs/my-website/docs/pass_through/anthropic_completion.md
@ -0,0 +1,282 @@
+# Anthropic `/v1/messages`
+
+Pass-through endpoints for Anthropic - call provider-specific endpoint, in native format (no translation).
+
+Just replace `https://api.anthropic.com` with `LITELLM_PROXY_BASE_URL/anthropic` 🚀
+
+#### **Example Usage**
+```bash
+curl --request POST \
+  --url http://0.0.0.0:4000/anthropic/v1/messages \
+  --header 'accept: application/json' \
+  --header 'content-type: application/json' \
+  --header "Authorization: bearer sk-anything" \
+  --data '{
+        "model": "claude-3-5-sonnet-20241022",
+        "max_tokens": 1024,
+        "messages": [
+            {"role": "user", "content": "Hello, world"}
+        ]
+    }'
+```
+
+Supports **ALL** Anthropic Endpoints (including streaming).
+
+[**See All Anthropic Endpoints**](https://docs.anthropic.com/en/api/messages)
+
+## Quick Start
+
+Let's call the Anthropic [`/messages` endpoint](https://docs.anthropic.com/en/api/messages)
+
+1. Add Anthropic API Key to your environment 
+
+```bash
+export ANTHROPIC_API_KEY=""
+```
+
+2. Start LiteLLM Proxy 
+
+```bash
+litellm
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+3. Test it! 
+
+Let's call the Anthropic /messages endpoint
+
+```bash
+curl http://0.0.0.0:4000/anthropic/v1/messages \
+     --header "x-api-key: $LITELLM_API_KEY" \
+     --header "anthropic-version: 2023-06-01" \
+     --header "content-type: application/json" \
+     --data \
+    '{
+        "model": "claude-3-5-sonnet-20241022",
+        "max_tokens": 1024,
+        "messages": [
+            {"role": "user", "content": "Hello, world"}
+        ]
+    }'
+```
+
+
+## Examples
+
+Anything after `http://0.0.0.0:4000/anthropic` is treated as a provider-specific route, and handled accordingly.
+
+Key Changes: 
+
+| **Original Endpoint**                                | **Replace With**                  |
+|------------------------------------------------------|-----------------------------------|
+| `https://api.anthropic.com`          | `http://0.0.0.0:4000/anthropic` (LITELLM_PROXY_BASE_URL="http://0.0.0.0:4000")      |
+| `bearer $ANTHROPIC_API_KEY`                                 | `bearer anything` (use `bearer LITELLM_VIRTUAL_KEY` if Virtual Keys are setup on proxy)                    |
+    
+
+### **Example 1: Messages endpoint**
+
+#### LiteLLM Proxy Call 
+
+```bash
+curl --request POST \
+  --url http://0.0.0.0:4000/anthropic/v1/messages \
+  --header "x-api-key: $LITELLM_API_KEY" \
+    --header "anthropic-version: 2023-06-01" \
+    --header "content-type: application/json" \
+  --data '{
+    "model": "claude-3-5-sonnet-20241022",
+    "max_tokens": 1024,
+    "messages": [
+        {"role": "user", "content": "Hello, world"}
+    ]
+  }'
+```
+
+#### Direct Anthropic API Call 
+
+```bash
+curl https://api.anthropic.com/v1/messages \
+     --header "x-api-key: $ANTHROPIC_API_KEY" \
+     --header "anthropic-version: 2023-06-01" \
+     --header "content-type: application/json" \
+     --data \
+    '{
+        "model": "claude-3-5-sonnet-20241022",
+        "max_tokens": 1024,
+        "messages": [
+            {"role": "user", "content": "Hello, world"}
+        ]
+    }'
+```
+
+### **Example 2: Token Counting API**
+
+#### LiteLLM Proxy Call 
+
+```bash
+curl --request POST \
+    --url http://0.0.0.0:4000/anthropic/v1/messages/count_tokens \
+    --header "x-api-key: $LITELLM_API_KEY" \
+    --header "anthropic-version: 2023-06-01" \
+    --header "anthropic-beta: token-counting-2024-11-01" \
+    --header "content-type: application/json" \
+    --data \
+    '{
+        "model": "claude-3-5-sonnet-20241022",
+        "messages": [
+            {"role": "user", "content": "Hello, world"}
+        ]
+    }'
+```
+
+#### Direct Anthropic API Call 
+
+```bash
+curl https://api.anthropic.com/v1/messages/count_tokens \
+     --header "x-api-key: $ANTHROPIC_API_KEY" \
+     --header "anthropic-version: 2023-06-01" \
+     --header "anthropic-beta: token-counting-2024-11-01" \
+     --header "content-type: application/json" \
+     --data \
+'{
+    "model": "claude-3-5-sonnet-20241022",
+    "messages": [
+        {"role": "user", "content": "Hello, world"}
+    ]
+}'
+```
+
+### **Example 3: Batch Messages**
+
+
+#### LiteLLM Proxy Call 
+
+```bash
+curl --request POST \
+    --url http://0.0.0.0:4000/anthropic/v1/messages/batches \
+    --header "x-api-key: $LITELLM_API_KEY" \
+    --header "anthropic-version: 2023-06-01" \
+    --header "anthropic-beta: message-batches-2024-09-24" \
+    --header "content-type: application/json" \
+    --data \
+'{
+    "requests": [
+        {
+            "custom_id": "my-first-request",
+            "params": {
+                "model": "claude-3-5-sonnet-20241022",
+                "max_tokens": 1024,
+                "messages": [
+                    {"role": "user", "content": "Hello, world"}
+                ]
+            }
+        },
+        {
+            "custom_id": "my-second-request",
+            "params": {
+                "model": "claude-3-5-sonnet-20241022",
+                "max_tokens": 1024,
+                "messages": [
+                    {"role": "user", "content": "Hi again, friend"}
+                ]
+            }
+        }
+    ]
+}'
+```
+
+#### Direct Anthropic API Call 
+
+```bash
+curl https://api.anthropic.com/v1/messages/batches \
+     --header "x-api-key: $ANTHROPIC_API_KEY" \
+     --header "anthropic-version: 2023-06-01" \
+     --header "anthropic-beta: message-batches-2024-09-24" \
+     --header "content-type: application/json" \
+     --data \
+'{
+    "requests": [
+        {
+            "custom_id": "my-first-request",
+            "params": {
+                "model": "claude-3-5-sonnet-20241022",
+                "max_tokens": 1024,
+                "messages": [
+                    {"role": "user", "content": "Hello, world"}
+                ]
+            }
+        },
+        {
+            "custom_id": "my-second-request",
+            "params": {
+                "model": "claude-3-5-sonnet-20241022",
+                "max_tokens": 1024,
+                "messages": [
+                    {"role": "user", "content": "Hi again, friend"}
+                ]
+            }
+        }
+    ]
+}'
+```
+
+
+## Advanced - Use with Virtual Keys 
+
+Pre-requisites
+- [Setup proxy with DB](../proxy/virtual_keys.md#setup)
+
+Use this, to avoid giving developers the raw Anthropic API key, but still letting them use Anthropic endpoints.
+
+### Usage
+
+1. Setup environment
+
+```bash
+export DATABASE_URL=""
+export LITELLM_MASTER_KEY=""
+export COHERE_API_KEY=""
+```
+
+```bash
+litellm
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+2. Generate virtual key 
+
+```bash
+curl -X POST 'http://0.0.0.0:4000/key/generate' \
+-H 'Authorization: Bearer sk-1234' \
+-H 'Content-Type: application/json' \
+-d '{}'
+```
+
+Expected Response 
+
+```bash
+{
+    ...
+    "key": "sk-1234ewknldferwedojwojw"
+}
+```
+
+3. Test it! 
+
+
+```bash
+curl --request POST \
+  --url http://0.0.0.0:4000/anthropic/v1/messages \
+  --header 'accept: application/json' \
+  --header 'content-type: application/json' \
+  --header "Authorization: bearer sk-1234ewknldferwedojwojw" \
+  --data '{
+    "model": "claude-3-5-sonnet-20241022",
+    "max_tokens": 1024,
+    "messages": [
+        {"role": "user", "content": "Hello, world"}
+    ]
+  }'
+```
--- a/docs/my-website/img/mlflow_tracing.png
+++ b/docs/my-website/img/mlflow_tracing.png
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -65,12 +65,12 @@ const sidebars = {
        },
        {
          type: "category",
-          label: "Use with Provider SDKs",
+          label: "Pass-through Endpoints (Provider-specific)",
          items: [
            "pass_through/vertex_ai",
            "pass_through/google_ai_studio",
            "pass_through/cohere",
-            "anthropic_completion",
+            "pass_through/anthropic_completion",
            "pass_through/bedrock",
            "pass_through/langfuse"
          ],
--- a/litellm/init.py
+++ b/litellm/init.py
@ -57,6 +57,7 @@ _custom_logger_compatible_callbacks_literal = Literal[
    "gcs_bucket",
    "opik",
    "argilla",
+    "mlflow",
 ]
 logged_real_time_event_types: Optional[Union[List[str], Literal["*"]]] = None
 _known_custom_logger_compatible_callbacks: List = list(
--- a/litellm/integrations/mlflow.py
+++ b/litellm/integrations/mlflow.py
@ -0,0 +1,247 @@
+import json
+import threading
+from typing import Optional
+
+from litellm._logging import verbose_logger
+from litellm.integrations.custom_logger import CustomLogger
+
+
+class MlflowLogger(CustomLogger):
+    def __init__(self):
+        from mlflow.tracking import MlflowClient
+
+        self._client = MlflowClient()
+
+        self._stream_id_to_span = {}
+        self._lock = threading.Lock()  # lock for _stream_id_to_span
+
+    def log_success_event(self, kwargs, response_obj, start_time, end_time):
+        self._handle_success(kwargs, response_obj, start_time, end_time)
+
+    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
+        self._handle_success(kwargs, response_obj, start_time, end_time)
+
+    def _handle_success(self, kwargs, response_obj, start_time, end_time):
+        """
+        Log the success event as an MLflow span.
+        Note that this method is called asynchronously in the background thread.
+        """
+        from mlflow.entities import SpanStatusCode
+
+        try:
+            verbose_logger.debug("MLflow logging start for success event")
+
+            if kwargs.get("stream"):
+                self._handle_stream_event(kwargs, response_obj, start_time, end_time)
+            else:
+                span = self._start_span_or_trace(kwargs, start_time)
+                end_time_ns = int(end_time.timestamp() * 1e9)
+                self._end_span_or_trace(
+                    span=span,
+                    outputs=response_obj,
+                    status=SpanStatusCode.OK,
+                    end_time_ns=end_time_ns,
+                )
+        except Exception:
+            verbose_logger.debug("MLflow Logging Error", stack_info=True)
+
+    def log_failure_event(self, kwargs, response_obj, start_time, end_time):
+        self._handle_failure(kwargs, response_obj, start_time, end_time)
+
+    async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
+        self._handle_failure(kwargs, response_obj, start_time, end_time)
+
+    def _handle_failure(self, kwargs, response_obj, start_time, end_time):
+        """
+        Log the failure event as an MLflow span.
+        Note that this method is called *synchronously* unlike the success handler.
+        """
+        from mlflow.entities import SpanEvent, SpanStatusCode
+
+        try:
+            span = self._start_span_or_trace(kwargs, start_time)
+
+            end_time_ns = int(end_time.timestamp() * 1e9)
+
+            # Record exception info as event
+            if exception := kwargs.get("exception"):
+                span.add_event(SpanEvent.from_exception(exception))
+
+            self._end_span_or_trace(
+                span=span,
+                outputs=response_obj,
+                status=SpanStatusCode.ERROR,
+                end_time_ns=end_time_ns,
+            )
+
+        except Exception as e:
+            verbose_logger.debug(f"MLflow Logging Error - {e}", stack_info=True)
+
+    def _handle_stream_event(self, kwargs, response_obj, start_time, end_time):
+        """
+        Handle the success event for a streaming response. For streaming calls,
+        log_success_event handle is triggered for every chunk of the stream.
+        We create a single span for the entire stream request as follows:
+
+        1. For the first chunk, start a new span and store it in the map.
+        2. For subsequent chunks, add the chunk as an event to the span.
+        3. For the final chunk, end the span and remove the span from the map.
+        """
+        from mlflow.entities import SpanStatusCode
+
+        litellm_call_id = kwargs.get("litellm_call_id")
+
+        if litellm_call_id not in self._stream_id_to_span:
+            with self._lock:
+                # Check again after acquiring lock
+                if litellm_call_id not in self._stream_id_to_span:
+                    # Start a new span for the first chunk of the stream
+                    span = self._start_span_or_trace(kwargs, start_time)
+                    self._stream_id_to_span[litellm_call_id] = span
+
+        # Add chunk as event to the span
+        span = self._stream_id_to_span[litellm_call_id]
+        self._add_chunk_events(span, response_obj)
+
+        # If this is the final chunk, end the span. The final chunk
+        # has complete_streaming_response that gathers the full response.
+        if final_response := kwargs.get("complete_streaming_response"):
+            end_time_ns = int(end_time.timestamp() * 1e9)
+            self._end_span_or_trace(
+                span=span,
+                outputs=final_response,
+                status=SpanStatusCode.OK,
+                end_time_ns=end_time_ns,
+            )
+
+            # Remove the stream_id from the map
+            with self._lock:
+                self._stream_id_to_span.pop(litellm_call_id)
+
+    def _add_chunk_events(self, span, response_obj):
+        from mlflow.entities import SpanEvent
+
+        try:
+            for choice in response_obj.choices:
+                span.add_event(
+                    SpanEvent(
+                        name="streaming_chunk",
+                        attributes={"delta": json.dumps(choice.delta.model_dump())},
+                    )
+                )
+        except Exception:
+            verbose_logger.debug("Error adding chunk events to span", stack_info=True)
+
+    def _construct_input(self, kwargs):
+        """Construct span inputs with optional parameters"""
+        inputs = {"messages": kwargs.get("messages")}
+        for key in ["functions", "tools", "stream", "tool_choice", "user"]:
+            if value := kwargs.get("optional_params", {}).pop(key, None):
+                inputs[key] = value
+        return inputs
+
+    def _extract_attributes(self, kwargs):
+        """
+        Extract span attributes from kwargs.
+
+        With the latest version of litellm, the standard_logging_object contains
+        canonical information for logging. If it is not present, we extract
+        subset of attributes from other kwargs.
+        """
+        attributes = {
+            "litellm_call_id": kwargs.get("litellm_call_id"),
+            "call_type": kwargs.get("call_type"),
+            "model": kwargs.get("model"),
+        }
+        standard_obj = kwargs.get("standard_logging_object")
+        if standard_obj:
+            attributes.update(
+                {
+                    "api_base": standard_obj.get("api_base"),
+                    "cache_hit": standard_obj.get("cache_hit"),
+                    "usage": {
+                        "completion_tokens": standard_obj.get("completion_tokens"),
+                        "prompt_tokens": standard_obj.get("prompt_tokens"),
+                        "total_tokens": standard_obj.get("total_tokens"),
+                    },
+                    "raw_llm_response": standard_obj.get("response"),
+                    "response_cost": standard_obj.get("response_cost"),
+                    "saved_cache_cost": standard_obj.get("saved_cache_cost"),
+                }
+            )
+        else:
+            litellm_params = kwargs.get("litellm_params", {})
+            attributes.update(
+                {
+                    "model": kwargs.get("model"),
+                    "cache_hit": kwargs.get("cache_hit"),
+                    "custom_llm_provider": kwargs.get("custom_llm_provider"),
+                    "api_base": litellm_params.get("api_base"),
+                    "response_cost": kwargs.get("response_cost"),
+                }
+            )
+        return attributes
+
+    def _get_span_type(self, call_type: Optional[str]) -> str:
+        from mlflow.entities import SpanType
+
+        if call_type in ["completion", "acompletion"]:
+            return SpanType.LLM
+        elif call_type == "embeddings":
+            return SpanType.EMBEDDING
+        else:
+            return SpanType.LLM
+
+    def _start_span_or_trace(self, kwargs, start_time):
+        """
+        Start an MLflow span or a trace.
+
+        If there is an active span, we start a new span as a child of
+        that span. Otherwise, we start a new trace.
+        """
+        import mlflow
+
+        call_type = kwargs.get("call_type", "completion")
+        span_name = f"litellm-{call_type}"
+        span_type = self._get_span_type(call_type)
+        start_time_ns = int(start_time.timestamp() * 1e9)
+
+        inputs = self._construct_input(kwargs)
+        attributes = self._extract_attributes(kwargs)
+
+        if active_span := mlflow.get_current_active_span():  # type: ignore
+            return self._client.start_span(
+                name=span_name,
+                request_id=active_span.request_id,
+                parent_id=active_span.span_id,
+                span_type=span_type,
+                inputs=inputs,
+                attributes=attributes,
+                start_time_ns=start_time_ns,
+            )
+        else:
+            return self._client.start_trace(
+                name=span_name,
+                span_type=span_type,
+                inputs=inputs,
+                attributes=attributes,
+                start_time_ns=start_time_ns,
+            )
+
+    def _end_span_or_trace(self, span, outputs, end_time_ns, status):
+        """End an MLflow span or a trace."""
+        if span.parent_id is None:
+            self._client.end_trace(
+                request_id=span.request_id,
+                outputs=outputs,
+                status=status,
+                end_time_ns=end_time_ns,
+            )
+        else:
+            self._client.end_span(
+                request_id=span.request_id,
+                span_id=span.span_id,
+                outputs=outputs,
+                status=status,
+                end_time_ns=end_time_ns,
+            )
--- a/litellm/litellm_core_utils/get_supported_openai_params.py
+++ b/litellm/litellm_core_utils/get_supported_openai_params.py
@ -161,17 +161,7 @@ def get_supported_openai_params(  # noqa: PLR0915
    elif custom_llm_provider == "huggingface":
        return litellm.HuggingfaceConfig().get_supported_openai_params()
    elif custom_llm_provider == "together_ai":
-        return [
-            "stream",
-            "temperature",
-            "max_tokens",
-            "top_p",
-            "stop",
-            "frequency_penalty",
-            "tools",
-            "tool_choice",
-            "response_format",
-        ]
+        return litellm.TogetherAIConfig().get_supported_openai_params(model=model)
    elif custom_llm_provider == "ai21":
        return [
            "stream",
--- a/litellm/litellm_core_utils/litellm_logging.py
+++ b/litellm/litellm_core_utils/litellm_logging.py
@ -28,6 +28,7 @@ from litellm.caching.caching_handler import LLMCachingHandler
 from litellm.cost_calculator import _select_model_name_for_cost_calc
 from litellm.integrations.custom_guardrail import CustomGuardrail
 from litellm.integrations.custom_logger import CustomLogger
+from litellm.integrations.mlflow import MlflowLogger
 from litellm.litellm_core_utils.redact_messages import (
    redact_message_input_output_from_custom_logger,
    redact_message_input_output_from_logging,
@ -563,6 +564,7 @@ class Logging:
                            message=f"Model Call Details pre-call: {details_to_log}",
                            level="info",
                        )
+
                    elif isinstance(callback, CustomLogger):  # custom logger class
                        callback.log_pre_api_call(
                            model=self.model,
@ -1258,6 +1260,7 @@ class Logging:
                                end_time=end_time,
                                print_verbose=print_verbose,
                            )
+
                    if (
                        callback == "openmeter"
                        and self.model_call_details.get("litellm_params", {}).get(
@ -2347,6 +2350,14 @@ def _init_custom_logger_compatible_class(  # noqa: PLR0915
        _in_memory_loggers.append(_otel_logger)
        return _otel_logger  # type: ignore

+    elif logging_integration == "mlflow":
+        for callback in _in_memory_loggers:
+            if isinstance(callback, MlflowLogger):
+                return callback  # type: ignore
+
+        _mlflow_logger = MlflowLogger()
+        _in_memory_loggers.append(_mlflow_logger)
+        return _mlflow_logger  # type: ignore

 def get_custom_logger_compatible_class(
    logging_integration: litellm._custom_logger_compatible_callbacks_literal,
@ -2448,6 +2459,12 @@ def get_custom_logger_compatible_class(
                and callback.callback_name == "langtrace"
            ):
                return callback
+
+    elif logging_integration == "mlflow":
+        for callback in _in_memory_loggers:
+            if isinstance(callback, MlflowLogger):
+                return callback
+
    return None


--- a/litellm/llms/together_ai/chat.py
+++ b/litellm/llms/together_ai/chat.py
@ -6,8 +6,8 @@ Calls done in OpenAI/openai.py as TogetherAI is openai-compatible.
 Docs: https://docs.together.ai/reference/completions-1
 """

-from ..OpenAI.openai import OpenAIConfig
+from ..OpenAI.chat.gpt_transformation import OpenAIGPTConfig


-class TogetherAIConfig(OpenAIConfig):
+class TogetherAIConfig(OpenAIGPTConfig):
    pass
--- a/litellm/proxy/proxy_server.py
+++ b/litellm/proxy/proxy_server.py
@ -1069,7 +1069,7 @@ async def update_cache(  # noqa: PLR0915
    end_user_id: Optional[str],
    team_id: Optional[str],
    response_cost: Optional[float],
-    parent_otel_span: Optional[Span],
+    parent_otel_span: Optional[Span],  # type: ignore
 ):
    """
    Use this to update the cache with new user spend.
@ -5657,6 +5657,13 @@ async def anthropic_response(  # noqa: PLR0915
    request: Request,
    user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
 ):
+    """
+    This is a BETA endpoint that calls 100+ LLMs in the anthropic format.
+
+    To do a simple pass-through for anthropic, do `{PROXY_BASE_URL}/anthropic/v1/messages`
+
+    Docs - https://docs.litellm.ai/docs/anthropic_completion
+    """
    from litellm import adapter_completion
    from litellm.adapters.anthropic_adapter import anthropic_adapter

--- a/litellm/proxy/spend_tracking/spend_management_endpoints.py
+++ b/litellm/proxy/spend_tracking/spend_management_endpoints.py
@ -9,6 +9,9 @@ import litellm
 from litellm._logging import verbose_proxy_logger
 from litellm.proxy._types import *
 from litellm.proxy.auth.user_api_key_auth import user_api_key_auth
+from litellm.proxy.spend_tracking.spend_tracking_utils import (
+    get_spend_by_team_and_customer,
+)

 router = APIRouter()

@ -932,6 +935,14 @@ async def get_global_spend_report(
        default=None,
        description="View spend for a specific internal_user_id. Example internal_user_id='1234",
    ),
+    team_id: Optional[str] = fastapi.Query(
+        default=None,
+        description="View spend for a specific team_id. Example team_id='1234",
+    ),
+    customer_id: Optional[str] = fastapi.Query(
+        default=None,
+        description="View spend for a specific customer_id. Example customer_id='1234. Can be used in conjunction with team_id as well.",
+    ),
 ):
    """
    Get Daily Spend per Team, based on specific startTime and endTime. Per team, view usage by each key, model
@ -1074,8 +1085,12 @@ async def get_global_spend_report(
                return []

            return db_response
-
+        elif team_id is not None and customer_id is not None:
+            return await get_spend_by_team_and_customer(
+                start_date_obj, end_date_obj, team_id, customer_id, prisma_client
+            )
        if group_by == "team":
+
            # first get data from spend logs -> SpendByModelApiKey
            # then read data from "SpendByModelApiKey" to format the response obj
            sql_query = """
@ -1305,7 +1320,6 @@ async def global_get_all_tag_names():
    "/global/spend/tags",
    tags=["Budget & Spend Tracking"],
    dependencies=[Depends(user_api_key_auth)],
-    include_in_schema=False,
    responses={
        200: {"model": List[LiteLLM_SpendLogs]},
    },
--- a/litellm/proxy/spend_tracking/spend_tracking_utils.py
+++ b/litellm/proxy/spend_tracking/spend_tracking_utils.py
@ -1,7 +1,9 @@
+import datetime
 import json
 import os
 import secrets
 import traceback
+from datetime import datetime as dt
 from typing import Optional

 from pydantic import BaseModel
@ -9,7 +11,7 @@ from pydantic import BaseModel
 import litellm
 from litellm._logging import verbose_proxy_logger
 from litellm.proxy._types import SpendLogsMetadata, SpendLogsPayload
-from litellm.proxy.utils import hash_token
+from litellm.proxy.utils import PrismaClient, hash_token


 def _is_master_key(api_key: str, _master_key: Optional[str]) -> bool:
@ -163,3 +165,79 @@ def get_logging_payload(
            "Error creating spendlogs object - {}".format(str(e))
        )
        raise e
+
+
+async def get_spend_by_team_and_customer(
+    start_date: dt,
+    end_date: dt,
+    team_id: str,
+    customer_id: str,
+    prisma_client: PrismaClient,
+):
+    sql_query = """
+    WITH SpendByModelApiKey AS (
+        SELECT
+            date_trunc('day', sl."startTime") AS group_by_day,
+            COALESCE(tt.team_alias, 'Unassigned Team') AS team_name,
+            sl.end_user AS customer,
+            sl.model,
+            sl.api_key,
+            SUM(sl.spend) AS model_api_spend,
+            SUM(sl.total_tokens) AS model_api_tokens
+        FROM 
+            "LiteLLM_SpendLogs" sl
+        LEFT JOIN 
+            "LiteLLM_TeamTable" tt 
+        ON 
+            sl.team_id = tt.team_id
+        WHERE
+            sl."startTime" BETWEEN $1::date AND $2::date
+            AND sl.team_id = $3
+            AND sl.end_user = $4
+        GROUP BY
+            date_trunc('day', sl."startTime"),
+            tt.team_alias,
+            sl.end_user,
+            sl.model,
+            sl.api_key
+    )
+        SELECT
+            group_by_day,
+            jsonb_agg(jsonb_build_object(
+                'team_name', team_name,
+                'customer', customer,
+                'total_spend', total_spend,
+                'metadata', metadata
+            )) AS teams_customers
+        FROM (
+            SELECT
+                group_by_day,
+                team_name,
+                customer,
+                SUM(model_api_spend) AS total_spend,
+                jsonb_agg(jsonb_build_object(
+                    'model', model,
+                    'api_key', api_key,
+                    'spend', model_api_spend,
+                    'total_tokens', model_api_tokens
+                )) AS metadata
+            FROM 
+                SpendByModelApiKey
+            GROUP BY
+                group_by_day,
+                team_name,
+                customer
+        ) AS aggregated
+        GROUP BY
+            group_by_day
+        ORDER BY
+            group_by_day;
+    """
+
+    db_response = await prisma_client.db.query_raw(
+        sql_query, start_date, end_date, team_id, customer_id
+    )
+    if db_response is None:
+        return []
+
+    return db_response
--- a/litellm/proxy/vertex_ai_endpoints/google_ai_studio_endpoints.py
+++ b/litellm/proxy/vertex_ai_endpoints/google_ai_studio_endpoints.py
@ -155,6 +155,51 @@ async def cohere_proxy_route(
    return received_value


+@router.api_route(
+    "/anthropic/{endpoint:path}", methods=["GET", "POST", "PUT", "DELETE"]
+)
+async def anthropic_proxy_route(
+    endpoint: str,
+    request: Request,
+    fastapi_response: Response,
+    user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
+):
+    base_target_url = "https://api.anthropic.com"
+    encoded_endpoint = httpx.URL(endpoint).path
+
+    # Ensure endpoint starts with '/' for proper URL construction
+    if not encoded_endpoint.startswith("/"):
+        encoded_endpoint = "/" + encoded_endpoint
+
+    # Construct the full target URL using httpx
+    base_url = httpx.URL(base_target_url)
+    updated_url = base_url.copy_with(path=encoded_endpoint)
+
+    # Add or update query parameters
+    anthropic_api_key = litellm.utils.get_secret(secret_name="ANTHROPIC_API_KEY")
+
+    ## check for streaming
+    is_streaming_request = False
+    if "stream" in str(updated_url):
+        is_streaming_request = True
+
+    ## CREATE PASS-THROUGH
+    endpoint_func = create_pass_through_route(
+        endpoint=endpoint,
+        target=str(updated_url),
+        custom_headers={"x-api-key": "{}".format(anthropic_api_key)},
+        _forward_headers=True,
+    )  # dynamically construct pass-through endpoint based on incoming path
+    received_value = await endpoint_func(
+        request,
+        fastapi_response,
+        user_api_key_dict,
+        stream=is_streaming_request,  # type: ignore
+    )
+
+    return received_value
+
+
@router.api_route("/bedrock/{endpoint:path}", methods=["GET", "POST", "PUT", "DELETE"])
 async def bedrock_proxy_route(
    endpoint: str,
--- a/litellm/tests/test_mlflow.py
+++ b/litellm/tests/test_mlflow.py
@ -0,0 +1,29 @@
+import pytest
+
+import litellm
+
+
+def test_mlflow_logging():
+    litellm.success_callback = ["mlflow"]
+    litellm.failure_callback = ["mlflow"]
+
+    litellm.completion(
+        model="gpt-4o-mini",
+        messages=[{"role": "user", "content": "what llm are u"}],
+        max_tokens=10,
+        temperature=0.2,
+        user="test-user",
+    )
+
+@pytest.mark.asyncio()
+async def test_async_mlflow_logging():
+    litellm.success_callback = ["mlflow"]
+    litellm.failure_callback = ["mlflow"]
+
+    await litellm.acompletion(
+        model="gpt-4o-mini",
+        messages=[{"role": "user", "content": "hi test from local arize"}],
+        mock_response="hello",
+        temperature=0.1,
+        user="OTEL_USER",
+    )
--- a/litellm/utils.py
+++ b/litellm/utils.py
@ -2903,24 +2903,16 @@ def get_optional_params(  # noqa: PLR0915
        )
        _check_valid_arg(supported_params=supported_params)

-        if stream:
-            optional_params["stream"] = stream
-        if temperature is not None:
-            optional_params["temperature"] = temperature
-        if top_p is not None:
-            optional_params["top_p"] = top_p
-        if max_tokens is not None:
-            optional_params["max_tokens"] = max_tokens
-        if frequency_penalty is not None:
-            optional_params["frequency_penalty"] = frequency_penalty
-        if stop is not None:
-            optional_params["stop"] = stop
-        if tools is not None:
-            optional_params["tools"] = tools
-        if tool_choice is not None:
-            optional_params["tool_choice"] = tool_choice
-        if response_format is not None:
-            optional_params["response_format"] = response_format
+        optional_params = litellm.TogetherAIConfig().map_openai_params(
+            non_default_params=non_default_params,
+            optional_params=optional_params,
+            model=model,
+            drop_params=(
+                drop_params
+                if drop_params is not None and isinstance(drop_params, bool)
+                else False
+            ),
+        )
    elif custom_llm_provider == "ai21":
        ## check if unsupported param passed in
        supported_params = get_supported_openai_params(
--- a/tests/llm_translation/test_optional_params.py
+++ b/tests/llm_translation/test_optional_params.py
@ -923,6 +923,14 @@ def test_watsonx_text_top_k():
    assert optional_params["top_k"] == 10


+
+def test_together_ai_model_params():
+    optional_params = get_optional_params(
+        model="together_ai", custom_llm_provider="together_ai", logprobs=1
+    )
+    print(optional_params)
+    assert optional_params["logprobs"] == 1
+
 def test_forward_user_param():
    from litellm.utils import get_supported_openai_params, get_optional_params

--- a/tests/local_testing/test_completion.py
+++ b/tests/local_testing/test_completion.py
@ -406,8 +406,13 @@ def test_completion_claude_3_empty_response():
            "content": "I was hoping we could chat a bit",
        },
    ]
-    response = litellm.completion(model="claude-3-opus-20240229", messages=messages)
-    print(response)
+    try:
+        response = litellm.completion(model="claude-3-opus-20240229", messages=messages)
+        print(response)
+    except litellm.InternalServerError as e:
+        pytest.skip(f"InternalServerError - {str(e)}")
+    except Exception as e:
+        pytest.fail(f"Error occurred: {e}")


 def test_completion_claude_3():
@ -434,6 +439,8 @@ def test_completion_claude_3():
        )
        # Add any assertions, here to check response args
        print(response)
+    except litellm.InternalServerError as e:
+        pytest.skip(f"InternalServerError - {str(e)}")
    except Exception as e:
        pytest.fail(f"Error occurred: {e}")

@ -917,6 +924,9 @@ def test_completion_base64(model):
    except litellm.ServiceUnavailableError as e:
        print("got service unavailable error: ", e)
        pass
+    except litellm.InternalServerError as e:
+        print("got internal server error: ", e)
+        pass
    except Exception as e:
        if "500 Internal error encountered.'" in str(e):
            pass
@ -1055,7 +1065,6 @@ def test_completion_mistral_api():
        cost = litellm.completion_cost(completion_response=response)
        print("cost to make mistral completion=", cost)
        assert cost > 0.0
-        assert response.model == "mistral/mistral-tiny"
    except Exception as e:
        pytest.fail(f"Error occurred: {e}")

--- a/tests/local_testing/test_streaming.py
+++ b/tests/local_testing/test_streaming.py
@ -3333,8 +3333,8 @@ async def test_acompletion_function_call_with_streaming(model):
                validate_final_streaming_function_calling_chunk(chunk=chunk)
            idx += 1
        # raise Exception("it worked! ")
-    except litellm.InternalServerError:
-        pass
+    except litellm.InternalServerError as e:
+        pytest.skip(f"InternalServerError - {str(e)}")
    except litellm.ServiceUnavailableError:
        pass
    except Exception as e:
--- a/tests/logging_callback_tests/test_otel_logging.py
+++ b/tests/logging_callback_tests/test_otel_logging.py
@ -144,6 +144,7 @@ def validate_raw_gen_ai_request_openai_streaming(span):
    "model",
    ["anthropic/claude-3-opus-20240229"],
 )
+@pytest.mark.flaky(retries=6, delay=2)
 def test_completion_claude_3_function_call_with_otel(model):
    litellm.set_verbose = True

--- a/tests/logging_callback_tests/test_unit_tests_init_callbacks.py
+++ b/tests/logging_callback_tests/test_unit_tests_init_callbacks.py
@ -31,6 +31,7 @@ from litellm.integrations.datadog.datadog_llm_obs import DataDogLLMObsLogger
 from litellm.integrations.gcs_bucket.gcs_bucket import GCSBucketLogger
 from litellm.integrations.opik.opik import OpikLogger
 from litellm.integrations.opentelemetry import OpenTelemetry
+from litellm.integrations.mlflow import MlflowLogger
 from litellm.integrations.argilla import ArgillaLogger
 from litellm.proxy.hooks.dynamic_rate_limiter import _PROXY_DynamicRateLimitHandler
 from unittest.mock import patch
@ -59,6 +60,7 @@ callback_class_str_to_classType = {
    "logfire": OpenTelemetry,
    "arize": OpenTelemetry,
    "langtrace": OpenTelemetry,
+    "mlflow": MlflowLogger,
 }

 expected_env_vars = {