diff --git a/docs/my-website/docs/proxy/logging.md b/docs/my-website/docs/proxy/logging.md index c2f583366..8a5a06a8e 100644 --- a/docs/my-website/docs/proxy/logging.md +++ b/docs/my-website/docs/proxy/logging.md @@ -1,28 +1,21 @@ +# 🪢 Logging + +Log Proxy input, output, and exceptions using: + +- Langfuse +- OpenTelemetry +- Custom Callbacks +- DataDog +- DynamoDB +- s3 Bucket +- etc. + import Image from '@theme/IdealImage'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; - -# 🪢 Logging - Langfuse, OpenTelemetry, Custom Callbacks, DataDog, s3 Bucket, Sentry, Athina, Azure Content-Safety - -Log Proxy Input, Output, Exceptions using Langfuse, OpenTelemetry, Custom Callbacks, DataDog, DynamoDB, s3 Bucket - -## Table of Contents - -- [Logging to Langfuse](#logging-proxy-inputoutput---langfuse) -- [Logging with OpenTelemetry (OpenTelemetry)](#logging-proxy-inputoutput-in-opentelemetry-format) -- [Async Custom Callbacks](#custom-callback-class-async) -- [Async Custom Callback APIs](#custom-callback-apis-async) -- [Logging to Galileo](#logging-llm-io-to-galileo) -- [Logging to OpenMeter](#logging-proxy-inputoutput---langfuse) -- [Logging to s3 Buckets](#logging-proxy-inputoutput---s3-buckets) -- [Logging to DataDog](#logging-proxy-inputoutput---datadog) -- [Logging to DynamoDB](#logging-proxy-inputoutput---dynamodb) -- [Logging to Sentry](#logging-proxy-inputoutput---sentry) -- [Logging to Athina](#logging-proxy-inputoutput-athina) -- [(BETA) Moderation with Azure Content-Safety](#moderation-with-azure-content-safety) - ## Logging Proxy Input/Output - Langfuse + We will use the `--config` to set `litellm.success_callback = ["langfuse"]` this will log all successfull LLM calls to langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your environment **Step 1** Install langfuse @@ -32,6 +25,7 @@ pip install langfuse>=2.0.0 ``` **Step 2**: Create a `config.yaml` file and set `litellm_settings`: `success_callback` + ```yaml model_list: - model_name: gpt-3.5-turbo @@ -42,6 +36,7 @@ litellm_settings: ``` **Step 3**: Set required env variables for logging to langfuse + ```shell export LANGFUSE_PUBLIC_KEY="pk_kk" export LANGFUSE_SECRET_KEY="sk_ss" @@ -52,11 +47,13 @@ export LANGFUSE_HOST="https://xxx.langfuse.com" **Step 4**: Start the proxy, make a test request Start proxy + ```shell litellm --config config.yaml --debug ``` Test Request + ``` litellm --test ``` @@ -67,7 +64,6 @@ Expected output on Langfuse ### Logging Metadata to Langfuse - @@ -93,6 +89,7 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ } }' ``` + @@ -126,6 +123,7 @@ response = client.chat.completions.create( print(response) ``` + @@ -168,7 +166,6 @@ print(response) - ### Team based Logging to Langfuse **Example:** @@ -257,6 +254,7 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ } }' ``` + @@ -287,6 +285,7 @@ response = client.chat.completions.create( print(response) ``` + @@ -332,7 +331,6 @@ You will see `raw_request` in your Langfuse Metadata. This is the RAW CURL comma - ## Logging Proxy Input/Output in OpenTelemetry format :::info @@ -348,10 +346,8 @@ OTEL_SERVICE_NAME=` # default="litellm" - - **Step 1:** Set callbacks and env vars Add the following to your env @@ -367,7 +363,6 @@ litellm_settings: callbacks: ["otel"] ``` - **Step 2**: Start the proxy, make a test request Start proxy @@ -427,7 +422,6 @@ This is the Span from OTEL Logging - #### Quick Start - Log to Honeycomb @@ -449,7 +443,6 @@ litellm_settings: callbacks: ["otel"] ``` - **Step 2**: Start the proxy, make a test request Start proxy @@ -474,10 +467,8 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ }' ``` - - #### Quick Start - Log to OTEL Collector @@ -499,7 +490,6 @@ litellm_settings: callbacks: ["otel"] ``` - **Step 2**: Start the proxy, make a test request Start proxy @@ -526,7 +516,6 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ - #### Quick Start - Log to OTEL GRPC Collector @@ -548,7 +537,6 @@ litellm_settings: callbacks: ["otel"] ``` - **Step 2**: Start the proxy, make a test request Start proxy @@ -573,7 +561,6 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ }' ``` - @@ -596,7 +583,6 @@ environment_variables: TRACELOOP_API_KEY: "XXXXX" ``` - **Step 3**: Start the proxy, make a test request Start proxy @@ -632,11 +618,15 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ ❓ Use this when you want to **pass information about the incoming request in a distributed tracing system** ✅ Key change: Pass the **`traceparent` header** in your requests. [Read more about traceparent headers here](https://uptrace.dev/opentelemetry/opentelemetry-traceparent.html#what-is-traceparent-header) + ```curl traceparent: 00-80e1afed08e019fc1110464cfa66635c-7a085853722dc6d2-01 ``` + Example Usage + 1. Make Request to LiteLLM Proxy with `traceparent` header + ```python import openai import uuid @@ -660,7 +650,6 @@ response = client.chat.completions.create( ) print(response) - ``` ```shell @@ -674,12 +663,12 @@ Search for Trace=`80e1afed08e019fc1110464cfa66635c` on your OTEL Collector - - ## Custom Callback Class [Async] + Use this when you want to run custom callbacks in `python` #### Step 1 - Create your custom `litellm` callback class + We use `litellm.integrations.custom_logger` for this, **more details about litellm custom callbacks [here](https://docs.litellm.ai/docs/observability/custom_callback)** Define your custom callback class in a python file. @@ -782,16 +771,17 @@ proxy_handler_instance = MyCustomHandler() ``` #### Step 2 - Pass your custom callback class in `config.yaml` + We pass the custom callback class defined in **Step1** to the config.yaml. Set `callbacks` to `python_filename.logger_instance_name` In the config below, we pass + - python_filename: `custom_callbacks.py` - logger_instance_name: `proxy_handler_instance`. This is defined in Step 1 `callbacks: custom_callbacks.proxy_handler_instance` - ```yaml model_list: - model_name: gpt-3.5-turbo @@ -804,6 +794,7 @@ litellm_settings: ``` #### Step 3 - Start proxy + test request + ```shell litellm --config proxy_config.yaml ``` @@ -825,6 +816,7 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ ``` #### Resulting Log on Proxy + ```shell On Success Model: gpt-3.5-turbo, @@ -877,7 +869,6 @@ class MyCustomHandler(CustomLogger): "max_tokens": 10 } } - ``` #### Logging `model_info` set in config.yaml @@ -895,11 +886,13 @@ class MyCustomHandler(CustomLogger): ``` **Expected Output** + ```json {'mode': 'embedding', 'input_cost_per_token': 0.002} ``` ### Logging responses from proxy + Both `/chat/completions` and `/embeddings` responses are available as `response_obj` **Note: for `/chat/completions`, both `stream=True` and `non stream` responses are available as `response_obj`** @@ -913,6 +906,7 @@ class MyCustomHandler(CustomLogger): ``` **Expected Output /chat/completion [for both `stream` and `non-stream` responses]** + ```json ModelResponse( id='chatcmpl-8Tfu8GoMElwOZuj2JlHBhNHG01PPo', @@ -939,6 +933,7 @@ ModelResponse( ``` **Expected Output /embeddings** + ```json { 'model': 'ada', @@ -958,7 +953,6 @@ ModelResponse( } ``` - ## Custom Callback APIs [Async] :::info @@ -968,10 +962,12 @@ This is an Enterprise only feature [Get Started with Enterprise here](https://gi ::: Use this if you: + - Want to use custom callbacks written in a non Python programming language - Want your callbacks to run on a different microservice #### Step 1. Create your generic logging API endpoint + Set up a generic API endpoint that can receive data in JSON format. The data will be included within a "data" field. Your server should support the following Request format: @@ -1034,11 +1030,8 @@ async def log_event(request: Request): if __name__ == "__main__": import uvicorn uvicorn.run(app, host="127.0.0.1", port=4000) - - ``` - #### Step 2. Set your `GENERIC_LOGGER_ENDPOINT` to the endpoint + route we should send callback logs to ```shell @@ -1048,6 +1041,7 @@ os.environ["GENERIC_LOGGER_ENDPOINT"] = "http://localhost:4000/log-event" #### Step 3. Create a `config.yaml` file and set `litellm_settings`: `success_callback` = ["generic"] Example litellm proxy config.yaml + ```yaml model_list: - model_name: gpt-3.5-turbo @@ -1059,8 +1053,8 @@ litellm_settings: Start the LiteLLM Proxy and make a test request to verify the logs reached your callback API - ## Logging LLM IO to Galileo + [BETA] Log LLM I/O on [www.rungalileo.io](https://www.rungalileo.io/) @@ -1083,6 +1077,7 @@ export GALILEO_PASSWORD="" ### Quick Start 1. Add to Config.yaml + ```yaml model_list: - litellm_params: @@ -1118,7 +1113,6 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ ' ``` - 🎉 That's it - Expect to see your Logs on your Galileo Dashboard ## Logging Proxy Cost + Usage - OpenMeter @@ -1136,6 +1130,7 @@ export OPENMETER_API_KEY="" ### Quick Start 1. Add to Config.yaml + ```yaml model_list: - litellm_params: @@ -1171,13 +1166,14 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ ' ``` - ## Logging Proxy Input/Output - DataDog + We will use the `--config` to set `litellm.success_callback = ["datadog"]` this will log all successfull LLM calls to DataDog **Step 1**: Create a `config.yaml` file and set `litellm_settings`: `success_callback` + ```yaml model_list: - model_name: gpt-3.5-turbo @@ -1197,6 +1193,7 @@ DD_SITE="us5.datadoghq.com" # your datadog base url **Step 3**: Start the proxy, make a test request Start proxy + ```shell litellm --config config.yaml --debug ``` @@ -1224,10 +1221,10 @@ Expected output on Datadog - ## Logging Proxy Input/Output - s3 Buckets We will use the `--config` to set + - `litellm.success_callback = ["s3"]` This will log all successfull LLM calls to s3 Bucket @@ -1241,6 +1238,7 @@ AWS_REGION_NAME = "" ``` **Step 2**: Create a `config.yaml` file and set `litellm_settings`: `success_callback` + ```yaml model_list: - model_name: gpt-3.5-turbo @@ -1260,11 +1258,13 @@ litellm_settings: **Step 3**: Start the proxy, make a test request Start proxy + ```shell litellm --config config.yaml --debug ``` Test Request + ```shell curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Content-Type: application/json' \ @@ -1284,6 +1284,7 @@ Your logs should be available on the specified s3 Bucket ## Logging Proxy Input/Output - DynamoDB We will use the `--config` to set + - `litellm.success_callback = ["dynamodb"]` - `litellm.dynamodb_table_name = "your-table-name"` @@ -1298,6 +1299,7 @@ AWS_REGION_NAME = "" ``` **Step 2**: Create a `config.yaml` file and set `litellm_settings`: `success_callback` + ```yaml model_list: - model_name: gpt-3.5-turbo @@ -1311,11 +1313,13 @@ litellm_settings: **Step 3**: Start the proxy, make a test request Start proxy + ```shell litellm --config config.yaml --debug ``` Test Request + ```shell curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Content-Type: application/json' \ @@ -1403,19 +1407,18 @@ Your logs should be available on DynamoDB } ``` - - - ## Logging Proxy Input/Output - Sentry If api calls fail (llm/database) you can log those to Sentry: **Step 1** Install Sentry + ```shell pip install --upgrade sentry-sdk ``` **Step 2**: Save your Sentry_DSN and add `litellm_settings`: `failure_callback` + ```shell export SENTRY_DSN="your-sentry-dsn" ``` @@ -1435,11 +1438,13 @@ general_settings: **Step 3**: Start the proxy, make a test request Start proxy + ```shell litellm --config config.yaml --debug ``` Test Request + ``` litellm --test ``` @@ -1457,6 +1462,7 @@ ATHINA_API_KEY = "your-athina-api-key" ``` **Step 2**: Create a `config.yaml` file and set `litellm_settings`: `success_callback` + ```yaml model_list: - model_name: gpt-3.5-turbo @@ -1469,11 +1475,13 @@ litellm_settings: **Step 3**: Start the proxy, make a test request Start proxy + ```shell litellm --config config.yaml --debug ``` Test Request + ``` curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Content-Type: application/json' \ @@ -1505,6 +1513,7 @@ AZURE_CONTENT_SAFETY_KEY = "" ``` **Step 2**: Create a `config.yaml` file and set `litellm_settings`: `success_callback` + ```yaml model_list: - model_name: gpt-3.5-turbo @@ -1520,11 +1529,13 @@ litellm_settings: **Step 3**: Start the proxy, make a test request Start proxy + ```shell litellm --config config.yaml --debug ``` Test Request + ``` curl --location 'http://0.0.0.0:4000/chat/completions' \ --header 'Content-Type: application/json' \ @@ -1540,7 +1551,8 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \ ``` An HTTP 400 error will be returned if the content is detected with a value greater than the threshold set in the `config.yaml`. -The details of the response will describe : +The details of the response will describe: + - The `source` : input text or llm generated text - The `category` : the category of the content that triggered the moderation - The `severity` : the severity from 0 to 10 diff --git a/litellm/proxy/proxy_server.py b/litellm/proxy/proxy_server.py index d3fa658f9..0e074e990 100644 --- a/litellm/proxy/proxy_server.py +++ b/litellm/proxy/proxy_server.py @@ -2796,7 +2796,7 @@ async def chat_completion( ## LOGGING OBJECT ## - initialize logging object for logging success/failure events for call ## IMPORTANT Note: - initialize this before running pre-call checks. Ensures we log rejected requests to langfuse. - data["litellm_call_id"] = str(uuid.uuid4()) + data["litellm_call_id"] = request.headers.get('x-litellm-call-id', str(uuid.uuid4())) logging_obj, data = litellm.utils.function_setup( original_function="acompletion", rules_obj=litellm.utils.Rules(),