Merge branch 'main' into litellm_dev_11_13_2024

test: handle claude instability
test: handle anthropic overloaded error
2024-11-15 11:18:02 +05:30 · 2024-11-15 01:11:59 +05:30 · 2024-11-15 00:43:56 +05:30 · 2024-11-15 00:13:22 +05:30 · 2024-11-14 23:47:14 +05:30 · 2024-11-14 22:31:22 +05:30
23 changed files with 874 additions and 97 deletions
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@ -690,6 +690,7 @@ jobs:
            pip install "respx==0.21.1"
            pip install "google-generativeai==0.3.2"
            pip install "google-cloud-aiplatform==1.43.0"
            pip install "mlflow==2.17.2"
      # Run pytest and generate JUnit XML report
      - run:
          name: Run tests
--- a/README.md
+++ b/README.md
@ -113,7 +113,7 @@ for part in response:
 ## Logging Observability ([Docs](https://docs.litellm.ai/docs/observability/callbacks))
-LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack
+LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack, MLflow
 ```python
 from litellm import completion
--- a/docs/my-website/docs/anthropic_completion.md
+++ b/docs/my-website/docs/anthropic_completion.md
@ -1,54 +0,0 @@
 # [BETA] Anthropic `/v1/messages`
 Call 100+ LLMs in the Anthropic format. 
 1. Setup config.yaml 
 ```yaml
 model_list:
  - model_name: my-test-model
    litellm_params:
      model: gpt-3.5-turbo
 ```
 2. Start proxy 
 ```bash
 litellm --config /path/to/config.yaml
 ```
 3. Test it! 
 ```bash
 curl -X POST 'http://0.0.0.0:4000/v1/messages' \
 -H 'x-api-key: sk-1234' \
 -H 'content-type: application/json' \
 -D '{
    "model": "my-test-model",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Hello, world"}
    ]
 }'
 ```
 ## Test with Anthropic SDK 
 ```python
 import os
 from anthropic import Anthropic
 client = Anthropic(api_key="sk-1234", base_url="http://0.0.0.0:4000") # 👈 CONNECT TO PROXY
 message = client.messages.create(
    messages=[
        {
            "role": "user",
            "content": "Hello, Claude",
        }
    ],
    model="my-test-model", # 👈 set 'model_name'
 )
 print(message.content)
 ```
--- a/docs/my-website/docs/observability/mlflow.md
+++ b/docs/my-website/docs/observability/mlflow.md
@ -0,0 +1,108 @@
 # MLflow
 ## What is MLflow?
 **MLflow** is an end-to-end open source MLOps platform for [experiment tracking](https://www.mlflow.org/docs/latest/tracking.html), [model management](https://www.mlflow.org/docs/latest/models.html), [evaluation](https://www.mlflow.org/docs/latest/llms/llm-evaluate/index.html), [observability (tracing)](https://www.mlflow.org/docs/latest/llms/tracing/index.html), and [deployment](https://www.mlflow.org/docs/latest/deployment/index.html). MLflow empowers teams to collaboratively develop and refine LLM applications efficiently.
 MLflow’s integration with LiteLLM supports advanced observability compatible with OpenTelemetry.
 <Image img={require('../../img/mlflow_tracing.png')} />
 ## Getting Started
 Install MLflow:
 ```shell
 pip install mlflow
 ```
 To enable LiteLLM tracing:
 ```python
 import mlflow
 mlflow.litellm.autolog()
 # Alternative, you can set the callback manually in LiteLLM
 # litellm.callbacks = ["mlflow"]
 ```
 Since MLflow is open-source, no sign-up or API key is needed to log traces!
 ```
 import litellm
 import os
 # Set your LLM provider's API key
 os.environ["OPENAI_API_KEY"] = ""
 # Call LiteLLM as usual
 response = litellm.completion(
    model="gpt-4o-mini",
    messages=[
      {"role": "user", "content": "Hi 👋 - i'm openai"}
    ]
 )
 ```
 Open the MLflow UI and go to the `Traces` tab to view logged traces:
 ```bash
 mlflow ui
 ```
 ## Exporting Traces to OpenTelemetry collectors
 MLflow traces are compatible with OpenTelemetry. You can export traces to any OpenTelemetry collector (e.g., Jaeger, Zipkin, Datadog, New Relic) by setting the endpoint URL in the environment variables.
 ```
 # Set the endpoint of the OpenTelemetry Collector
 os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] = "http://localhost:4317/v1/traces"
 # Optionally, set the service name to group traces
 os.environ["OTEL_SERVICE_NAME"] = "<your-service-name>"
 ```
 See [MLflow documentation](https://mlflow.org/docs/latest/llms/tracing/index.html#using-opentelemetry-collector-for-exporting-traces) for more details.
 ## Combine LiteLLM Trace with Your Application Trace
 LiteLLM is often part of larger LLM applications, such as agentic models. MLflow Tracing allows you to instrument custom Python code, which can then be combined with LiteLLM traces.
 ```python
 import litellm
 import mlflow
 from mlflow.entities import SpanType
 # Enable LiteLLM tracing
 mlflow.litellm.autolog()
 class CustomAgent:
    # Use @mlflow.trace to instrument Python functions.
    @mlflow.trace(span_type=SpanType.AGENT)
    def run(self, query: str):
        # do something
        while i < self.max_turns:
            response = litellm.completion(
                model="gpt-4o-mini",
                messages=messages,
            )
            action = self.get_action(response)
            ...
    @mlflow.trace
    def get_action(llm_response):
        ...
 ```
 This approach generates a unified trace, combining your custom Python code with LiteLLM calls.
 ## Support
 * For advanced usage and integrations of tracing, visit the [MLflow Tracing documentation](https://mlflow.org/docs/latest/llms/tracing/index.html).
 * For any question or issue with this integration, please [submit an issue](https://github.com/mlflow/mlflow/issues/new/choose) on our [Github](https://github.com/mlflow/mlflow) repository!
--- a/docs/my-website/docs/pass_through/anthropic_completion.md
+++ b/docs/my-website/docs/pass_through/anthropic_completion.md
@ -0,0 +1,282 @@
 # Anthropic `/v1/messages`
 Pass-through endpoints for Anthropic - call provider-specific endpoint, in native format (no translation).
 Just replace `https://api.anthropic.com` with `LITELLM_PROXY_BASE_URL/anthropic` 🚀
 #### **Example Usage**
 ```bash
 curl --request POST \
  --url http://0.0.0.0:4000/anthropic/v1/messages \
  --header 'accept: application/json' \
  --header 'content-type: application/json' \
  --header "Authorization: bearer sk-anything" \
  --data '{
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": "Hello, world"}
        ]
    }'
 ```
 Supports **ALL** Anthropic Endpoints (including streaming).
 [**See All Anthropic Endpoints**](https://docs.anthropic.com/en/api/messages)
 ## Quick Start
 Let's call the Anthropic [`/messages` endpoint](https://docs.anthropic.com/en/api/messages)
 1. Add Anthropic API Key to your environment 
 ```bash
 export ANTHROPIC_API_KEY=""
 ```
 2. Start LiteLLM Proxy 
 ```bash
 litellm
 # RUNNING on http://0.0.0.0:4000
 ```
 3. Test it! 
 Let's call the Anthropic /messages endpoint
 ```bash
 curl http://0.0.0.0:4000/anthropic/v1/messages \
     --header "x-api-key: $LITELLM_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
    '{
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": "Hello, world"}
        ]
    }'
 ```
 ## Examples
 Anything after `http://0.0.0.0:4000/anthropic` is treated as a provider-specific route, and handled accordingly.
 Key Changes: 
 | **Original Endpoint**                                | **Replace With**                  |
 |------------------------------------------------------|-----------------------------------|
 | `https://api.anthropic.com`          | `http://0.0.0.0:4000/anthropic` (LITELLM_PROXY_BASE_URL="http://0.0.0.0:4000")      |
 | `bearer $ANTHROPIC_API_KEY`                                 | `bearer anything` (use `bearer LITELLM_VIRTUAL_KEY` if Virtual Keys are setup on proxy)                    |
 ### **Example 1: Messages endpoint**
 #### LiteLLM Proxy Call 
 ```bash
 curl --request POST \
  --url http://0.0.0.0:4000/anthropic/v1/messages \
  --header "x-api-key: $LITELLM_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "content-type: application/json" \
  --data '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Hello, world"}
    ]
  }'
 ```
 #### Direct Anthropic API Call 
 ```bash
 curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
    '{
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": "Hello, world"}
        ]
    }'
 ```
 ### **Example 2: Token Counting API**
 #### LiteLLM Proxy Call 
 ```bash
 curl --request POST \
    --url http://0.0.0.0:4000/anthropic/v1/messages/count_tokens \
    --header "x-api-key: $LITELLM_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: token-counting-2024-11-01" \
    --header "content-type: application/json" \
    --data \
    '{
        "model": "claude-3-5-sonnet-20241022",
        "messages": [
            {"role": "user", "content": "Hello, world"}
        ]
    }'
 ```
 #### Direct Anthropic API Call 
 ```bash
 curl https://api.anthropic.com/v1/messages/count_tokens \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "anthropic-beta: token-counting-2024-11-01" \
     --header "content-type: application/json" \
     --data \
 '{
    "model": "claude-3-5-sonnet-20241022",
    "messages": [
        {"role": "user", "content": "Hello, world"}
    ]
 }'
 ```
 ### **Example 3: Batch Messages**
 #### LiteLLM Proxy Call 
 ```bash
 curl --request POST \
    --url http://0.0.0.0:4000/anthropic/v1/messages/batches \
    --header "x-api-key: $LITELLM_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: message-batches-2024-09-24" \
    --header "content-type: application/json" \
    --data \
 '{
    "requests": [
        {
            "custom_id": "my-first-request",
            "params": {
                "model": "claude-3-5-sonnet-20241022",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Hello, world"}
                ]
            }
        },
        {
            "custom_id": "my-second-request",
            "params": {
                "model": "claude-3-5-sonnet-20241022",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Hi again, friend"}
                ]
            }
        }
    ]
 }'
 ```
 #### Direct Anthropic API Call 
 ```bash
 curl https://api.anthropic.com/v1/messages/batches \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "anthropic-beta: message-batches-2024-09-24" \
     --header "content-type: application/json" \
     --data \
 '{
    "requests": [
        {
            "custom_id": "my-first-request",
            "params": {
                "model": "claude-3-5-sonnet-20241022",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Hello, world"}
                ]
            }
        },
        {
            "custom_id": "my-second-request",
            "params": {
                "model": "claude-3-5-sonnet-20241022",
                "max_tokens": 1024,
                "messages": [
                    {"role": "user", "content": "Hi again, friend"}
                ]
            }
        }
    ]
 }'
 ```
 ## Advanced - Use with Virtual Keys 
 Pre-requisites
 - [Setup proxy with DB](../proxy/virtual_keys.md#setup)
 Use this, to avoid giving developers the raw Anthropic API key, but still letting them use Anthropic endpoints.
 ### Usage
 1. Setup environment
 ```bash
 export DATABASE_URL=""
 export LITELLM_MASTER_KEY=""
 export COHERE_API_KEY=""
 ```
 ```bash
 litellm
 # RUNNING on http://0.0.0.0:4000
 ```
 2. Generate virtual key 
 ```bash
 curl -X POST 'http://0.0.0.0:4000/key/generate' \
 -H 'Authorization: Bearer sk-1234' \
 -H 'Content-Type: application/json' \
 -d '{}'
 ```
 Expected Response 
 ```bash
 {
    ...
    "key": "sk-1234ewknldferwedojwojw"
 }
 ```
 3. Test it! 
 ```bash
 curl --request POST \
  --url http://0.0.0.0:4000/anthropic/v1/messages \
  --header 'accept: application/json' \
  --header 'content-type: application/json' \
  --header "Authorization: bearer sk-1234ewknldferwedojwojw" \
  --data '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Hello, world"}
    ]
  }'
 ```
--- a/docs/my-website/img/mlflow_tracing.png
+++ b/docs/my-website/img/mlflow_tracing.png
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -65,12 +65,12 @@ const sidebars = {
        },
        {
          type: "category",
-          label: "Use with Provider SDKs",
+          label: "Pass-through Endpoints (Provider-specific)",
          items: [
            "pass_through/vertex_ai",
            "pass_through/google_ai_studio",
            "pass_through/cohere",
-            "anthropic_completion",
+            "pass_through/anthropic_completion",
            "pass_through/bedrock",
            "pass_through/langfuse"
          ],
--- a/litellm/init.py
+++ b/litellm/init.py
@ -57,6 +57,7 @@ _custom_logger_compatible_callbacks_literal = Literal[
    "gcs_bucket",
    "opik",
    "argilla",
    "mlflow",
 ]
 logged_real_time_event_types: Optional[Union[List[str], Literal["*"]]] = None
 _known_custom_logger_compatible_callbacks: List = list(
--- a/litellm/integrations/mlflow.py
+++ b/litellm/integrations/mlflow.py
@ -0,0 +1,247 @@
 import json
 import threading
 from typing import Optional
 from litellm._logging import verbose_logger
 from litellm.integrations.custom_logger import CustomLogger
 class MlflowLogger(CustomLogger):
    def __init__(self):
        from mlflow.tracking import MlflowClient
        self._client = MlflowClient()
        self._stream_id_to_span = {}
        self._lock = threading.Lock()  # lock for _stream_id_to_span
    def log_success_event(self, kwargs, response_obj, start_time, end_time):
        self._handle_success(kwargs, response_obj, start_time, end_time)
    async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
        self._handle_success(kwargs, response_obj, start_time, end_time)
    def _handle_success(self, kwargs, response_obj, start_time, end_time):
        """
        Log the success event as an MLflow span.
        Note that this method is called asynchronously in the background thread.
        """
        from mlflow.entities import SpanStatusCode
        try:
            verbose_logger.debug("MLflow logging start for success event")
            if kwargs.get("stream"):
                self._handle_stream_event(kwargs, response_obj, start_time, end_time)
            else:
                span = self._start_span_or_trace(kwargs, start_time)
                end_time_ns = int(end_time.timestamp() * 1e9)
                self._end_span_or_trace(
                    span=span,
                    outputs=response_obj,
                    status=SpanStatusCode.OK,
                    end_time_ns=end_time_ns,
                )
        except Exception:
            verbose_logger.debug("MLflow Logging Error", stack_info=True)
    def log_failure_event(self, kwargs, response_obj, start_time, end_time):
        self._handle_failure(kwargs, response_obj, start_time, end_time)
    async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
        self._handle_failure(kwargs, response_obj, start_time, end_time)
    def _handle_failure(self, kwargs, response_obj, start_time, end_time):
        """
        Log the failure event as an MLflow span.
        Note that this method is called *synchronously* unlike the success handler.
        """
        from mlflow.entities import SpanEvent, SpanStatusCode
        try:
            span = self._start_span_or_trace(kwargs, start_time)
            end_time_ns = int(end_time.timestamp() * 1e9)
            # Record exception info as event
            if exception := kwargs.get("exception"):
                span.add_event(SpanEvent.from_exception(exception))
            self._end_span_or_trace(
                span=span,
                outputs=response_obj,
                status=SpanStatusCode.ERROR,
                end_time_ns=end_time_ns,
            )
        except Exception as e:
            verbose_logger.debug(f"MLflow Logging Error - {e}", stack_info=True)
    def _handle_stream_event(self, kwargs, response_obj, start_time, end_time):
        """
        Handle the success event for a streaming response. For streaming calls,
        log_success_event handle is triggered for every chunk of the stream.
        We create a single span for the entire stream request as follows:
        1. For the first chunk, start a new span and store it in the map.
        2. For subsequent chunks, add the chunk as an event to the span.
        3. For the final chunk, end the span and remove the span from the map.
        """
        from mlflow.entities import SpanStatusCode
        litellm_call_id = kwargs.get("litellm_call_id")
        if litellm_call_id not in self._stream_id_to_span:
            with self._lock:
                # Check again after acquiring lock
                if litellm_call_id not in self._stream_id_to_span:
                    # Start a new span for the first chunk of the stream
                    span = self._start_span_or_trace(kwargs, start_time)
                    self._stream_id_to_span[litellm_call_id] = span
        # Add chunk as event to the span
        span = self._stream_id_to_span[litellm_call_id]
        self._add_chunk_events(span, response_obj)
        # If this is the final chunk, end the span. The final chunk
        # has complete_streaming_response that gathers the full response.
        if final_response := kwargs.get("complete_streaming_response"):
            end_time_ns = int(end_time.timestamp() * 1e9)
            self._end_span_or_trace(
                span=span,
                outputs=final_response,
                status=SpanStatusCode.OK,
                end_time_ns=end_time_ns,
            )
            # Remove the stream_id from the map
            with self._lock:
                self._stream_id_to_span.pop(litellm_call_id)
    def _add_chunk_events(self, span, response_obj):
        from mlflow.entities import SpanEvent
        try:
            for choice in response_obj.choices:
                span.add_event(
                    SpanEvent(
                        name="streaming_chunk",
                        attributes={"delta": json.dumps(choice.delta.model_dump())},
                    )
                )
        except Exception:
            verbose_logger.debug("Error adding chunk events to span", stack_info=True)
    def _construct_input(self, kwargs):
        """Construct span inputs with optional parameters"""
        inputs = {"messages": kwargs.get("messages")}
        for key in ["functions", "tools", "stream", "tool_choice", "user"]:
            if value := kwargs.get("optional_params", {}).pop(key, None):
                inputs[key] = value
        return inputs
    def _extract_attributes(self, kwargs):
        """
        Extract span attributes from kwargs.
        With the latest version of litellm, the standard_logging_object contains
        canonical information for logging. If it is not present, we extract
        subset of attributes from other kwargs.
        """
        attributes = {
            "litellm_call_id": kwargs.get("litellm_call_id"),
            "call_type": kwargs.get("call_type"),
            "model": kwargs.get("model"),
        }
        standard_obj = kwargs.get("standard_logging_object")
        if standard_obj:
            attributes.update(
                {
                    "api_base": standard_obj.get("api_base"),
                    "cache_hit": standard_obj.get("cache_hit"),
                    "usage": {
                        "completion_tokens": standard_obj.get("completion_tokens"),
                        "prompt_tokens": standard_obj.get("prompt_tokens"),
                        "total_tokens": standard_obj.get("total_tokens"),
                    },
                    "raw_llm_response": standard_obj.get("response"),
                    "response_cost": standard_obj.get("response_cost"),
                    "saved_cache_cost": standard_obj.get("saved_cache_cost"),
                }
            )
        else:
            litellm_params = kwargs.get("litellm_params", {})
            attributes.update(
                {
                    "model": kwargs.get("model"),
                    "cache_hit": kwargs.get("cache_hit"),
                    "custom_llm_provider": kwargs.get("custom_llm_provider"),
                    "api_base": litellm_params.get("api_base"),
                    "response_cost": kwargs.get("response_cost"),
                }
            )
        return attributes
    def _get_span_type(self, call_type: Optional[str]) -> str:
        from mlflow.entities import SpanType
        if call_type in ["completion", "acompletion"]:
            return SpanType.LLM
        elif call_type == "embeddings":
            return SpanType.EMBEDDING
        else:
            return SpanType.LLM
    def _start_span_or_trace(self, kwargs, start_time):
        """
        Start an MLflow span or a trace.
        If there is an active span, we start a new span as a child of
        that span. Otherwise, we start a new trace.
        """
        import mlflow
        call_type = kwargs.get("call_type", "completion")
        span_name = f"litellm-{call_type}"
        span_type = self._get_span_type(call_type)
        start_time_ns = int(start_time.timestamp() * 1e9)
        inputs = self._construct_input(kwargs)
        attributes = self._extract_attributes(kwargs)
        if active_span := mlflow.get_current_active_span():  # type: ignore
            return self._client.start_span(
                name=span_name,
                request_id=active_span.request_id,
                parent_id=active_span.span_id,
                span_type=span_type,
                inputs=inputs,
                attributes=attributes,
                start_time_ns=start_time_ns,
            )
        else:
            return self._client.start_trace(
                name=span_name,
                span_type=span_type,
                inputs=inputs,
                attributes=attributes,
                start_time_ns=start_time_ns,
            )
    def _end_span_or_trace(self, span, outputs, end_time_ns, status):
        """End an MLflow span or a trace."""
        if span.parent_id is None:
            self._client.end_trace(
                request_id=span.request_id,
                outputs=outputs,
                status=status,
                end_time_ns=end_time_ns,
            )
        else:
            self._client.end_span(
                request_id=span.request_id,
                span_id=span.span_id,
                outputs=outputs,
                status=status,
                end_time_ns=end_time_ns,
            )
--- a/litellm/litellm_core_utils/get_supported_openai_params.py
+++ b/litellm/litellm_core_utils/get_supported_openai_params.py
@ -161,17 +161,7 @@ def get_supported_openai_params(  # noqa: PLR0915
    elif custom_llm_provider == "huggingface":
        return litellm.HuggingfaceConfig().get_supported_openai_params()
    elif custom_llm_provider == "together_ai":
-        return [
+        return litellm.TogetherAIConfig().get_supported_openai_params(model=model)
            "stream",
            "temperature",
            "max_tokens",
            "top_p",
            "stop",
            "frequency_penalty",
            "tools",
            "tool_choice",
            "response_format",
        ]
    elif custom_llm_provider == "ai21":
        return [
            "stream",
--- a/litellm/litellm_core_utils/litellm_logging.py
+++ b/litellm/litellm_core_utils/litellm_logging.py
@ -28,6 +28,7 @@ from litellm.caching.caching_handler import LLMCachingHandler
 from litellm.cost_calculator import _select_model_name_for_cost_calc
 from litellm.integrations.custom_guardrail import CustomGuardrail
 from litellm.integrations.custom_logger import CustomLogger
 from litellm.integrations.mlflow import MlflowLogger
 from litellm.litellm_core_utils.redact_messages import (
    redact_message_input_output_from_custom_logger,
    redact_message_input_output_from_logging,
@ -563,6 +564,7 @@ class Logging:
                            message=f"Model Call Details pre-call: {details_to_log}",
                            level="info",
                        )
                    elif isinstance(callback, CustomLogger):  # custom logger class
                        callback.log_pre_api_call(
                            model=self.model,
@ -1258,6 +1260,7 @@ class Logging:
                                end_time=end_time,
                                print_verbose=print_verbose,
                            )
                    if (
                        callback == "openmeter"
                        and self.model_call_details.get("litellm_params", {}).get(
@ -2347,6 +2350,14 @@ def _init_custom_logger_compatible_class(  # noqa: PLR0915
        _in_memory_loggers.append(_otel_logger)
        return _otel_logger  # type: ignore
    elif logging_integration == "mlflow":
        for callback in _in_memory_loggers:
            if isinstance(callback, MlflowLogger):
                return callback  # type: ignore
        _mlflow_logger = MlflowLogger()
        _in_memory_loggers.append(_mlflow_logger)
        return _mlflow_logger  # type: ignore
 def get_custom_logger_compatible_class(
    logging_integration: litellm._custom_logger_compatible_callbacks_literal,
@ -2448,6 +2459,12 @@ def get_custom_logger_compatible_class(
                and callback.callback_name == "langtrace"
            ):
                return callback
    elif logging_integration == "mlflow":
        for callback in _in_memory_loggers:
            if isinstance(callback, MlflowLogger):
                return callback
    return None
--- a/litellm/llms/together_ai/chat.py
+++ b/litellm/llms/together_ai/chat.py
@ -6,8 +6,8 @@ Calls done in OpenAI/openai.py as TogetherAI is openai-compatible.
 Docs: https://docs.together.ai/reference/completions-1
 """
-from ..OpenAI.openai import OpenAIConfig
+from ..OpenAI.chat.gpt_transformation import OpenAIGPTConfig
-class TogetherAIConfig(OpenAIConfig):
+class TogetherAIConfig(OpenAIGPTConfig):
    pass
--- a/litellm/proxy/proxy_server.py
+++ b/litellm/proxy/proxy_server.py
@ -1069,7 +1069,7 @@ async def update_cache(  # noqa: PLR0915
    end_user_id: Optional[str],
    team_id: Optional[str],
    response_cost: Optional[float],
-    parent_otel_span: Optional[Span],
+    parent_otel_span: Optional[Span],  # type: ignore
 ):
    """
    Use this to update the cache with new user spend.
@ -5657,6 +5657,13 @@ async def anthropic_response(  # noqa: PLR0915
    request: Request,
    user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
 ):
    """
    This is a BETA endpoint that calls 100+ LLMs in the anthropic format.
    To do a simple pass-through for anthropic, do `{PROXY_BASE_URL}/anthropic/v1/messages`
    Docs - https://docs.litellm.ai/docs/anthropic_completion
    """
    from litellm import adapter_completion
    from litellm.adapters.anthropic_adapter import anthropic_adapter
--- a/litellm/proxy/spend_tracking/spend_management_endpoints.py
+++ b/litellm/proxy/spend_tracking/spend_management_endpoints.py
@ -9,6 +9,9 @@ import litellm
 from litellm._logging import verbose_proxy_logger
 from litellm.proxy._types import *
 from litellm.proxy.auth.user_api_key_auth import user_api_key_auth
 from litellm.proxy.spend_tracking.spend_tracking_utils import (
    get_spend_by_team_and_customer,
 )
 router = APIRouter()
@ -932,6 +935,14 @@ async def get_global_spend_report(
        default=None,
        description="View spend for a specific internal_user_id. Example internal_user_id='1234",
    ),
    team_id: Optional[str] = fastapi.Query(
        default=None,
        description="View spend for a specific team_id. Example team_id='1234",
    ),
    customer_id: Optional[str] = fastapi.Query(
        default=None,
        description="View spend for a specific customer_id. Example customer_id='1234. Can be used in conjunction with team_id as well.",
    ),
 ):
    """
    Get Daily Spend per Team, based on specific startTime and endTime. Per team, view usage by each key, model
@ -1074,8 +1085,12 @@ async def get_global_spend_report(
                return []
            return db_response
-
+        elif team_id is not None and customer_id is not None:
            return await get_spend_by_team_and_customer(
                start_date_obj, end_date_obj, team_id, customer_id, prisma_client
            )
        if group_by == "team":
            # first get data from spend logs -> SpendByModelApiKey
            # then read data from "SpendByModelApiKey" to format the response obj
            sql_query = """
@ -1305,7 +1320,6 @@ async def global_get_all_tag_names():
    "/global/spend/tags",
    tags=["Budget & Spend Tracking"],
    dependencies=[Depends(user_api_key_auth)],
    include_in_schema=False,
    responses={
        200: {"model": List[LiteLLM_SpendLogs]},
    },
--- a/litellm/proxy/spend_tracking/spend_tracking_utils.py
+++ b/litellm/proxy/spend_tracking/spend_tracking_utils.py
@ -1,7 +1,9 @@
 import datetime
 import json
 import os
 import secrets
 import traceback
 from datetime import datetime as dt
 from typing import Optional
 from pydantic import BaseModel
@ -9,7 +11,7 @@ from pydantic import BaseModel
 import litellm
 from litellm._logging import verbose_proxy_logger
 from litellm.proxy._types import SpendLogsMetadata, SpendLogsPayload
-from litellm.proxy.utils import hash_token
+from litellm.proxy.utils import PrismaClient, hash_token
 def _is_master_key(api_key: str, _master_key: Optional[str]) -> bool:
@ -163,3 +165,79 @@ def get_logging_payload(
            "Error creating spendlogs object - {}".format(str(e))
        )
        raise e
 async def get_spend_by_team_and_customer(
    start_date: dt,
    end_date: dt,
    team_id: str,
    customer_id: str,
    prisma_client: PrismaClient,
 ):
    sql_query = """
    WITH SpendByModelApiKey AS (
        SELECT
            date_trunc('day', sl."startTime") AS group_by_day,
            COALESCE(tt.team_alias, 'Unassigned Team') AS team_name,
            sl.end_user AS customer,
            sl.model,
            sl.api_key,
            SUM(sl.spend) AS model_api_spend,
            SUM(sl.total_tokens) AS model_api_tokens
        FROM 
            "LiteLLM_SpendLogs" sl
        LEFT JOIN 
            "LiteLLM_TeamTable" tt 
        ON 
            sl.team_id = tt.team_id
        WHERE
            sl."startTime" BETWEEN $1::date AND $2::date
            AND sl.team_id = $3
            AND sl.end_user = $4
        GROUP BY
            date_trunc('day', sl."startTime"),
            tt.team_alias,
            sl.end_user,
            sl.model,
            sl.api_key
    )
        SELECT
            group_by_day,
            jsonb_agg(jsonb_build_object(
                'team_name', team_name,
                'customer', customer,
                'total_spend', total_spend,
                'metadata', metadata
            )) AS teams_customers
        FROM (
            SELECT
                group_by_day,
                team_name,
                customer,
                SUM(model_api_spend) AS total_spend,
                jsonb_agg(jsonb_build_object(
                    'model', model,
                    'api_key', api_key,
                    'spend', model_api_spend,
                    'total_tokens', model_api_tokens
                )) AS metadata
            FROM 
                SpendByModelApiKey
            GROUP BY
                group_by_day,
                team_name,
                customer
        ) AS aggregated
        GROUP BY
            group_by_day
        ORDER BY
            group_by_day;
    """
    db_response = await prisma_client.db.query_raw(
        sql_query, start_date, end_date, team_id, customer_id
    )
    if db_response is None:
        return []
    return db_response
--- a/litellm/proxy/vertex_ai_endpoints/google_ai_studio_endpoints.py
+++ b/litellm/proxy/vertex_ai_endpoints/google_ai_studio_endpoints.py
@ -155,6 +155,51 @@ async def cohere_proxy_route(
    return received_value
@router.api_route(
    "/anthropic/{endpoint:path}", methods=["GET", "POST", "PUT", "DELETE"]
 )
 async def anthropic_proxy_route(
    endpoint: str,
    request: Request,
    fastapi_response: Response,
    user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
 ):
    base_target_url = "https://api.anthropic.com"
    encoded_endpoint = httpx.URL(endpoint).path
    # Ensure endpoint starts with '/' for proper URL construction
    if not encoded_endpoint.startswith("/"):
        encoded_endpoint = "/" + encoded_endpoint
    # Construct the full target URL using httpx
    base_url = httpx.URL(base_target_url)
    updated_url = base_url.copy_with(path=encoded_endpoint)
    # Add or update query parameters
    anthropic_api_key = litellm.utils.get_secret(secret_name="ANTHROPIC_API_KEY")
    ## check for streaming
    is_streaming_request = False
    if "stream" in str(updated_url):
        is_streaming_request = True
    ## CREATE PASS-THROUGH
    endpoint_func = create_pass_through_route(
        endpoint=endpoint,
        target=str(updated_url),
        custom_headers={"x-api-key": "{}".format(anthropic_api_key)},
        _forward_headers=True,
    )  # dynamically construct pass-through endpoint based on incoming path
    received_value = await endpoint_func(
        request,
        fastapi_response,
        user_api_key_dict,
        stream=is_streaming_request,  # type: ignore
    )
    return received_value
@router.api_route("/bedrock/{endpoint:path}", methods=["GET", "POST", "PUT", "DELETE"])
 async def bedrock_proxy_route(
    endpoint: str,
--- a/litellm/tests/test_mlflow.py
+++ b/litellm/tests/test_mlflow.py
@ -0,0 +1,29 @@
 import pytest
 import litellm
 def test_mlflow_logging():
    litellm.success_callback = ["mlflow"]
    litellm.failure_callback = ["mlflow"]
    litellm.completion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "what llm are u"}],
        max_tokens=10,
        temperature=0.2,
        user="test-user",
    )
@pytest.mark.asyncio()
 async def test_async_mlflow_logging():
    litellm.success_callback = ["mlflow"]
    litellm.failure_callback = ["mlflow"]
    await litellm.acompletion(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "hi test from local arize"}],
        mock_response="hello",
        temperature=0.1,
        user="OTEL_USER",
    )
--- a/litellm/utils.py
+++ b/litellm/utils.py
@ -2903,24 +2903,16 @@ def get_optional_params(  # noqa: PLR0915
        )
        _check_valid_arg(supported_params=supported_params)
-        if stream:
+        optional_params = litellm.TogetherAIConfig().map_openai_params(
-            optional_params["stream"] = stream
+            non_default_params=non_default_params,
-        if temperature is not None:
+            optional_params=optional_params,
-            optional_params["temperature"] = temperature
+            model=model,
-        if top_p is not None:
+            drop_params=(
-            optional_params["top_p"] = top_p
+                drop_params
-        if max_tokens is not None:
+                if drop_params is not None and isinstance(drop_params, bool)
-            optional_params["max_tokens"] = max_tokens
+                else False
-        if frequency_penalty is not None:
+            ),
-            optional_params["frequency_penalty"] = frequency_penalty
+        )
        if stop is not None:
            optional_params["stop"] = stop
        if tools is not None:
            optional_params["tools"] = tools
        if tool_choice is not None:
            optional_params["tool_choice"] = tool_choice
        if response_format is not None:
            optional_params["response_format"] = response_format
    elif custom_llm_provider == "ai21":
        ## check if unsupported param passed in
        supported_params = get_supported_openai_params(
--- a/tests/llm_translation/test_optional_params.py
+++ b/tests/llm_translation/test_optional_params.py
@ -923,6 +923,14 @@ def test_watsonx_text_top_k():
    assert optional_params["top_k"] == 10
 def test_together_ai_model_params():
    optional_params = get_optional_params(
        model="together_ai", custom_llm_provider="together_ai", logprobs=1
    )
    print(optional_params)
    assert optional_params["logprobs"] == 1
 def test_forward_user_param():
    from litellm.utils import get_supported_openai_params, get_optional_params
--- a/tests/local_testing/test_completion.py
+++ b/tests/local_testing/test_completion.py
@ -406,8 +406,13 @@ def test_completion_claude_3_empty_response():
            "content": "I was hoping we could chat a bit",
        },
    ]
-    response = litellm.completion(model="claude-3-opus-20240229", messages=messages)
+    try:
-    print(response)
+        response = litellm.completion(model="claude-3-opus-20240229", messages=messages)
        print(response)
    except litellm.InternalServerError as e:
        pytest.skip(f"InternalServerError - {str(e)}")
    except Exception as e:
        pytest.fail(f"Error occurred: {e}")
 def test_completion_claude_3():
@ -434,6 +439,8 @@ def test_completion_claude_3():
        )
        # Add any assertions, here to check response args
        print(response)
    except litellm.InternalServerError as e:
        pytest.skip(f"InternalServerError - {str(e)}")
    except Exception as e:
        pytest.fail(f"Error occurred: {e}")
@ -917,6 +924,9 @@ def test_completion_base64(model):
    except litellm.ServiceUnavailableError as e:
        print("got service unavailable error: ", e)
        pass
    except litellm.InternalServerError as e:
        print("got internal server error: ", e)
        pass
    except Exception as e:
        if "500 Internal error encountered.'" in str(e):
            pass
@ -1055,7 +1065,6 @@ def test_completion_mistral_api():
        cost = litellm.completion_cost(completion_response=response)
        print("cost to make mistral completion=", cost)
        assert cost > 0.0
        assert response.model == "mistral/mistral-tiny"
    except Exception as e:
        pytest.fail(f"Error occurred: {e}")
--- a/tests/local_testing/test_streaming.py
+++ b/tests/local_testing/test_streaming.py
@ -3333,8 +3333,8 @@ async def test_acompletion_function_call_with_streaming(model):
                validate_final_streaming_function_calling_chunk(chunk=chunk)
            idx += 1
        # raise Exception("it worked! ")
-    except litellm.InternalServerError:
+    except litellm.InternalServerError as e:
-        pass
+        pytest.skip(f"InternalServerError - {str(e)}")
    except litellm.ServiceUnavailableError:
        pass
    except Exception as e:
--- a/tests/logging_callback_tests/test_otel_logging.py
+++ b/tests/logging_callback_tests/test_otel_logging.py
@ -144,6 +144,7 @@ def validate_raw_gen_ai_request_openai_streaming(span):
    "model",
    ["anthropic/claude-3-opus-20240229"],
 )
@pytest.mark.flaky(retries=6, delay=2)
 def test_completion_claude_3_function_call_with_otel(model):
    litellm.set_verbose = True
--- a/tests/logging_callback_tests/test_unit_tests_init_callbacks.py
+++ b/tests/logging_callback_tests/test_unit_tests_init_callbacks.py
@ -31,6 +31,7 @@ from litellm.integrations.datadog.datadog_llm_obs import DataDogLLMObsLogger
 from litellm.integrations.gcs_bucket.gcs_bucket import GCSBucketLogger
 from litellm.integrations.opik.opik import OpikLogger
 from litellm.integrations.opentelemetry import OpenTelemetry
 from litellm.integrations.mlflow import MlflowLogger
 from litellm.integrations.argilla import ArgillaLogger
 from litellm.proxy.hooks.dynamic_rate_limiter import _PROXY_DynamicRateLimitHandler
 from unittest.mock import patch
@ -59,6 +60,7 @@ callback_class_str_to_classType = {
    "logfire": OpenTelemetry,
    "arize": OpenTelemetry,
    "langtrace": OpenTelemetry,
    "mlflow": MlflowLogger,
 }
 expected_env_vars = {
Author	SHA1	Message	Date
Krish Dholakia	1dcbfda202	Merge branch 'main' into litellm_dev_11_13_2024	2024-11-15 11:18:02 +05:30
Krrish Dholakia	fbba9b464c	test: handle claude instability	2024-11-15 01:11:59 +05:30
Krrish Dholakia	2995f3adc2	test: handle anthropic overloaded error	2024-11-15 00:43:56 +05:30
Krrish Dholakia	50fe6a639d	test: handle internal server error	2024-11-15 00:13:22 +05:30
Krrish Dholakia	b92700cc19	test: handle gemini overloaded model error	2024-11-14 23:47:14 +05:30
Krrish Dholakia	ffbdaf868f	test: fix test	2024-11-14 22:31:22 +05:30
Krrish Dholakia	dc4235e7f1	test: update tests	2024-11-14 18:56:59 +05:30
Krrish Dholakia	74c2177e5e	test: cleanup test	2024-11-14 11:39:50 +05:30
Krrish Dholakia	7e05fc846d	test: skip anthropic overloaded error	2024-11-14 11:30:21 +05:30
Krrish Dholakia	0c4b28225f	test: update test	2024-11-14 01:22:49 +05:30
Krish Dholakia	3ee871b7a9	Litellm key update fix (#6710 ) * fix(caching): convert arg to equivalent kwargs in llm caching handler prevent unexpected errors * fix(caching_handler.py): don't pass args to caching * fix(caching): remove all args from caching.py fix(caching): consistent function signatures + abc method * test(caching_unit_tests.py): add unit tests for llm caching ensures coverage for common caching scenarios across different implementations * refactor(litellm_logging.py): move to using cache key from hidden params instead of regenerating one * fix(router.py): drop redis password requirement * fix(proxy_server.py): fix faulty slack alerting check * fix(langfuse.py): avoid copying functions/thread lock objects in metadata fixes metadata copy error when parent otel span in metadata * test: update test * fix(key_management_endpoints.py): fix /key/update with metadata update * fix(key_management_endpoints.py): fix key_prepare_update helper * fix(key_management_endpoints.py): reset value to none if set in key update * fix: update test ' * Litellm dev 11 11 2024 (#6693) * fix(__init__.py): add 'watsonx_text' as mapped llm api route Fixes https://github.com/BerriAI/litellm/issues/6663 * fix(opentelemetry.py): fix passing parallel tool calls to otel Fixes https://github.com/BerriAI/litellm/issues/6677 * refactor(test_opentelemetry_unit_tests.py): create a base set of unit tests for all logging integrations - test for parallel tool call handling reduces bugs in repo * fix(__init__.py): update provider-model mapping to include all known provider-model mappings Fixes https://github.com/BerriAI/litellm/issues/6669 * feat(anthropic): support passing document in llm api call * docs(anthropic.md): add pdf anthropic call to docs + expose new 'supports_pdf_input' function * fix(factory.py): fix linting error * add clear doc string for GCS bucket logging * Add docs to export logs to Laminar (#6674) * Add docs to export logs to Laminar * minor fix: newline at end of file * place laminar after http and grpc * (Feat) Add langsmith key based logging (#6682) * add langsmith_api_key to StandardCallbackDynamicParams * create a file for langsmith types * langsmith add key / team based logging * add key based logging for langsmith * fix langsmith key based logging * fix linting langsmith * remove NOQA violation * add unit test coverage for all helpers in test langsmith * test_langsmith_key_based_logging * docs langsmith key based logging * run langsmith tests in logging callback tests * fix logging testing * test_langsmith_key_based_logging * test_add_callback_via_key_litellm_pre_call_utils_langsmith * add debug statement langsmith key based logging * test_langsmith_key_based_logging * (fix) OpenAI's optional messages[].name does not work with Mistral API (#6701) * use helper for _transform_messages mistral * add test_message_with_name to base LLMChat test * fix linting * add xAI on Admin UI (#6680) * (docs) add benchmarks on 1K RPS (#6704) * docs litellm proxy benchmarks * docs GCS bucket * doc fix - reduce clutter on logging doc title * (feat) add cost tracking stable diffusion 3 on Bedrock (#6676) * add cost tracking for sd3 * test_image_generation_bedrock * fix get model info for image cost * add cost_calculator for stability 1 models * add unit testing for bedrock image cost calc * test_cost_calculator_with_no_optional_params * add test_cost_calculator_basic * correctly allow size Optional * fix cost_calculator * sd3 unit tests cost calc * fix raise correct error 404 when /key/info is called on non-existent key (#6653) * fix raise correct error on /key/info * add not_found_error error * fix key not found in DB error * use 1 helper for checking token hash * fix error code on key info * fix test key gen prisma * test_generate_and_call_key_info * test fix test_call_with_valid_model_using_all_models * fix key info tests * bump: version 1.52.4 → 1.52.5 * add defaults used for GCS logging * LiteLLM Minor Fixes & Improvements (11/12/2024) (#6705) * fix(caching): convert arg to equivalent kwargs in llm caching handler prevent unexpected errors * fix(caching_handler.py): don't pass args to caching * fix(caching): remove all args from caching.py fix(caching): consistent function signatures + abc method * test(caching_unit_tests.py): add unit tests for llm caching ensures coverage for common caching scenarios across different implementations * refactor(litellm_logging.py): move to using cache key from hidden params instead of regenerating one * fix(router.py): drop redis password requirement * fix(proxy_server.py): fix faulty slack alerting check * fix(langfuse.py): avoid copying functions/thread lock objects in metadata fixes metadata copy error when parent otel span in metadata * test: update test * bump: version 1.52.5 → 1.52.6 * (feat) helm hook to sync db schema (#6715) * v0 migration job * fix job * fix migrations job.yml * handle standalone DB on helm hook * fix argo cd annotations * fix db migration helm hook * fix migration job * doc fix Using Http/2 with Hypercorn * (fix proxy redis) Add redis sentinel support (#6154) * add sentinel_password support * add doc for setting redis sentinel password * fix redis sentinel - use sentinel password * Fix: Update gpt-4o costs to that of gpt-4o-2024-08-06 (#6714) Fixes #6713 * (fix) using Anthropic `response_format={"type": "json_object"}` (#6721) * add support for response_format=json anthropic * add test_json_response_format to baseLLM ChatTest * fix test_litellm_anthropic_prompt_caching_tools * fix test_anthropic_function_call_with_no_schema * test test_create_json_tool_call_for_response_format * (feat) Add cost tracking for Azure Dall-e-3 Image Generation + use base class to ensure basic image generation tests pass (#6716) * add BaseImageGenTest * use 1 class for unit testing * add debugging to BaseImageGenTest * TestAzureOpenAIDalle3 * fix response_cost_calculator * test_basic_image_generation * fix img gen basic test * fix _select_model_name_for_cost_calc * fix test_aimage_generation_bedrock_with_optional_params * fix undo changes cost tracking * fix response_cost_calculator * fix test_cost_azure_gpt_35 * fix remove dup test (#6718) * (build) update db helm hook * (build) helm db pre sync hook * (build) helm db sync hook * test: run test_team_logging firdst --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Dinmukhamed Mailibay <47117969+dinmukhamedm@users.noreply.github.com> Co-authored-by: Kilian Lieret <kilian.lieret@posteo.de>	2024-11-14 01:22:49 +05:30
Krrish Dholakia	e8faafea42	test: fix test	2024-11-14 00:27:58 +05:30
Krrish Dholakia	0bc7bdd550	fix: fix test	2024-11-14 00:02:15 +05:30
Krrish Dholakia	a79211c270	ci(config.yml): add mlflow to ci testing	2024-11-13 23:43:40 +05:30
Krrish Dholakia	3b1b397160	fix(mlflow.py): fix ruff linting errors	2024-11-13 23:35:02 +05:30
Yuki Watanabe	82f405adcb	Add integration with MLflow Tracing (#6147 ) * Add MLflow logger Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Streaming handling Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * lint Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * address comments and fix issues Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * address comments and fix issues Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Move logger construction code Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * Add docs Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * async handlers Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> * new picture Signed-off-by: B-Step62 <yuki.watanabe@databricks.com> --------- Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>	2024-11-13 20:30:41 +05:30
Krrish Dholakia	1e097bbfbe	feat(spend_management_endpoints.py): allow /global/spend/report to query team + customer id enables seeing spend for a customer in a team	2024-11-13 20:26:10 +05:30
Krrish Dholakia	b873b16f36	feat(pass_through_endpoints/): add anthropic/ pass-through endpoint adds new `anthropic/` pass-through endpoint + refactors docs	2024-11-13 19:53:19 +05:30
Krrish Dholakia	b2f1e47104	fix(utils.py): add logprobs support for together ai Fixes https://github.com/BerriAI/litellm/issues/6724	2024-11-13 12:26:06 +05:30