+ LiteLLM MCP Architecture: Use MCP tools with all LiteLLM supported models +
+ +#### How it works + +LiteLLM exposes the following MCP endpoints: + +- `/mcp/tools/list` - List all available tools +- `/mcp/tools/call` - Call a specific tool with the provided arguments + +When MCP clients connect to LiteLLM they can follow this workflow: + +1. Connect to the LiteLLM MCP server +2. List all available tools on LiteLLM +3. Client makes LLM API request with tool call(s) +4. LLM API returns which tools to call and with what arguments +5. MCP client makes MCP tool calls to LiteLLM +6. LiteLLM makes the tool calls to the appropriate MCP server +7. LiteLLM returns the tool call results to the MCP client + +#### Usage + +#### 1. Define your tools on under `mcp_servers` in your config.yaml file. + +LiteLLM allows you to define your tools on the `mcp_servers` section in your config.yaml file. All tools listed here will be available to MCP clients (when they connect to LiteLLM and call `list_tools`). + +```yaml title="config.yaml" showLineNumbers +model_list: + - model_name: gpt-4o + litellm_params: + model: openai/gpt-4o + api_key: sk-xxxxxxx + +mcp_servers: + { + "zapier_mcp": { + "url": "https://actions.zapier.com/mcp/sk-akxxxxx/sse" + }, + "fetch": { + "url": "http://localhost:8000/sse" + } + } +``` + + +#### 2. Start LiteLLM Gateway + ++Each instance writes updates to redis +
+ + +### Stage 2. A single instance flushes the redis queue to the DB + +A single instance will acquire a lock on the DB and flush all elements in the redis queue to the DB. + +- 1 instance will attempt to acquire the lock for the DB update job +- The status of the lock is stored in redis +- If the instance acquires the lock to write to DB + - It will read all updates from redis + - Aggregate all updates into 1 transaction + - Write updates to DB + - Release the lock +- Note: Only 1 instance can acquire the lock at a time, this limits the number of instances that can write to the DB at once + + ++A single instance flushes the redis queue to the DB +
+ + +## Usage + +### Required components + +- Redis +- Postgres + +### Setup on LiteLLM config + +You can enable using the redis buffer by setting `use_redis_transaction_buffer: true` in the `general_settings` section of your `proxy_config.yaml` file. + +Note: This setup requires litellm to be connected to a redis instance. + +```yaml showLineNumbers title="litellm proxy_config.yaml" +general_settings: + use_redis_transaction_buffer: true + +litellm_settings: + cache: True + cache_params: + type: redis + supported_call_types: [] # Optional: Set cache for proxy, but not on the actual llm api call +``` + +## Monitoring + +LiteLLM emits the following prometheus metrics to monitor the health/status of the in memory buffer and redis buffer. + + +| Metric Name | Description | Storage Type | +|-----------------------------------------------------|-----------------------------------------------------------------------------|--------------| +| `litellm_pod_lock_manager_size` | Indicates which pod has the lock to write updates to the database. | Redis | +| `litellm_in_memory_daily_spend_update_queue_size` | Number of items in the in-memory daily spend update queue. These are the aggregate spend logs for each user. | In-Memory | +| `litellm_redis_daily_spend_update_queue_size` | Number of items in the Redis daily spend update queue. These are the aggregate spend logs for each user. | Redis | +| `litellm_in_memory_spend_update_queue_size` | In-memory aggregate spend values for keys, users, teams, team members, etc.| In-Memory | +| `litellm_redis_spend_update_queue_size` | Redis aggregate spend values for keys, users, teams, etc. | Redis | + diff --git a/docs/my-website/docs/proxy/db_info.md b/docs/my-website/docs/proxy/db_info.md index 1b87aa1e54..946089bf14 100644 --- a/docs/my-website/docs/proxy/db_info.md +++ b/docs/my-website/docs/proxy/db_info.md @@ -46,18 +46,17 @@ You can see the full DB Schema [here](https://github.com/BerriAI/litellm/blob/ma | Table Name | Description | Row Insert Frequency | |------------|-------------|---------------------| -| LiteLLM_SpendLogs | Detailed logs of all API requests. Records token usage, spend, and timing information. Tracks which models and keys were used. | **High - every LLM API request** | -| LiteLLM_ErrorLogs | Captures failed requests and errors. Stores exception details and request information. Helps with debugging and monitoring. | **Medium - on errors only** | +| LiteLLM_SpendLogs | Detailed logs of all API requests. Records token usage, spend, and timing information. Tracks which models and keys were used. | **High - every LLM API request - Success or Failure** | | LiteLLM_AuditLog | Tracks changes to system configuration. Records who made changes and what was modified. Maintains history of updates to teams, users, and models. | **Off by default**, **High - when enabled** | -## Disable `LiteLLM_SpendLogs` & `LiteLLM_ErrorLogs` +## Disable `LiteLLM_SpendLogs` You can disable spend_logs and error_logs by setting `disable_spend_logs` and `disable_error_logs` to `True` on the `general_settings` section of your proxy_config.yaml file. ```yaml general_settings: disable_spend_logs: True # Disable writing spend logs to DB - disable_error_logs: True # Disable writing error logs to DB + disable_error_logs: True # Only disable writing error logs to DB, regular spend logs will still be written unless `disable_spend_logs: True` ``` ### What is the impact of disabling these logs? diff --git a/docs/my-website/docs/proxy/guardrails/aim_security.md b/docs/my-website/docs/proxy/guardrails/aim_security.md index 3de933c0b7..d76c4e0c1c 100644 --- a/docs/my-website/docs/proxy/guardrails/aim_security.md +++ b/docs/my-website/docs/proxy/guardrails/aim_security.md @@ -23,6 +23,12 @@ In the newly created guard's page, you can find a reference to the prompt policy You can decide which detections will be enabled, and set the threshold for each detection. +:::info +When using LiteLLM with virtual keys, key-specific policies can be set directly in Aim's guards page by specifying the virtual key alias when creating the guard. + +Only the aliases of your virtual keys (and not the actual key secrets) will be sent to Aim. +::: + ### 3. Add Aim Guardrail on your LiteLLM config.yaml Define your guardrails under the `guardrails` section @@ -37,7 +43,7 @@ guardrails: - guardrail_name: aim-protected-app litellm_params: guardrail: aim - mode: pre_call # 'during_call' is also available + mode: [pre_call, post_call] # "During_call" is also available api_key: os.environ/AIM_API_KEY api_base: os.environ/AIM_API_BASE # Optional, use only when using a self-hosted Aim Outpost ``` @@ -134,7 +140,7 @@ The above request should not be blocked, and you should receive a regular LLM re -# Advanced +## Advanced Aim Guard provides user-specific Guardrail policies, enabling you to apply tailored policies to individual users. To utilize this feature, include the end-user's email in the request payload by setting the `x-aim-user-email` header of your request. diff --git a/docs/my-website/docs/proxy/guardrails/custom_guardrail.md b/docs/my-website/docs/proxy/guardrails/custom_guardrail.md index 50deac511f..657ccab68e 100644 --- a/docs/my-website/docs/proxy/guardrails/custom_guardrail.md +++ b/docs/my-website/docs/proxy/guardrails/custom_guardrail.md @@ -10,10 +10,12 @@ Use this is you want to write code to run a custom guardrail ### 1. Write a `CustomGuardrail` Class -A CustomGuardrail has 3 methods to enforce guardrails +A CustomGuardrail has 4 methods to enforce guardrails - `async_pre_call_hook` - (Optional) modify input or reject request before making LLM API call - `async_moderation_hook` - (Optional) reject request, runs while making LLM API call (help to lower latency) - `async_post_call_success_hook`- (Optional) apply guardrail on input/output, runs after making LLM API call +- `async_post_call_streaming_iterator_hook` - (Optional) pass the entire stream to the guardrail + **[See detailed spec of methods here](#customguardrail-methods)** @@ -128,6 +130,23 @@ class myCustomGuardrail(CustomGuardrail): ): raise ValueError("Guardrail failed Coffee Detected") + async def async_post_call_streaming_iterator_hook( + self, + user_api_key_dict: UserAPIKeyAuth, + response: Any, + request_data: dict, + ) -> AsyncGenerator[ModelResponseStream, None]: + """ + Passes the entire stream to the guardrail + + This is useful for guardrails that need to see the entire response, such as PII masking. + + See Aim guardrail implementation for an example - https://github.com/BerriAI/litellm/blob/d0e022cfacb8e9ebc5409bb652059b6fd97b45c0/litellm/proxy/guardrails/guardrail_hooks/aim.py#L168 + + Triggered by mode: 'post_call' + """ + async for item in response: + yield item ``` diff --git a/docs/my-website/docs/prompt_injection.md b/docs/my-website/docs/proxy/guardrails/prompt_injection.md similarity index 100% rename from docs/my-website/docs/prompt_injection.md rename to docs/my-website/docs/proxy/guardrails/prompt_injection.md diff --git a/docs/my-website/docs/proxy/guardrails/quick_start.md b/docs/my-website/docs/proxy/guardrails/quick_start.md index 6744dc6578..aeac507e0a 100644 --- a/docs/my-website/docs/proxy/guardrails/quick_start.md +++ b/docs/my-website/docs/proxy/guardrails/quick_start.md @@ -17,6 +17,14 @@ model_list: api_key: os.environ/OPENAI_API_KEY guardrails: + - guardrail_name: general-guard + litellm_params: + guardrail: aim + mode: [pre_call, post_call] + api_key: os.environ/AIM_API_KEY + api_base: os.environ/AIM_API_BASE + default_on: true # Optional + - guardrail_name: "aporia-pre-guard" litellm_params: guardrail: aporia # supported values: "aporia", "lakera" @@ -45,6 +53,7 @@ guardrails: - `pre_call` Run **before** LLM call, on **input** - `post_call` Run **after** LLM call, on **input & output** - `during_call` Run **during** LLM call, on **input** Same as `pre_call` but runs in parallel as LLM call. Response not returned until guardrail check completes +- A list of the above values to run multiple modes, e.g. `mode: [pre_call, post_call]` ## 2. Start LiteLLM Gateway @@ -569,4 +578,4 @@ guardrails: Union[ class DynamicGuardrailParams: extra_body: Dict[str, Any] # Additional parameters for the guardrail -``` \ No newline at end of file +``` diff --git a/docs/my-website/docs/proxy/image_handling.md b/docs/my-website/docs/proxy/image_handling.md new file mode 100644 index 0000000000..300ab0bc38 --- /dev/null +++ b/docs/my-website/docs/proxy/image_handling.md @@ -0,0 +1,21 @@ +import Image from '@theme/IdealImage'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# Image URL Handling + ++ Expose and use MCP servers through LiteLLM +
+ +## UI view total usage after 1M+ logs + +This release brings the ability to view total usage analytics even after exceeding 1M+ logs in your database. We've implemented a scalable architecture that stores only aggregate usage data, resulting in significantly more efficient queries and reduced database CPU utilization. + + ++ View total usage after 1M+ logs +
+ + +- How this works: + - We now aggregate usage data into a dedicated DailyUserSpend table, significantly reducing query load and CPU usage even beyond 1M+ logs. + +- Daily Spend Breakdown API: + + - Retrieve granular daily usage data (by model, provider, and API key) with a single endpoint. + Example Request: + + ```shell title="Daily Spend Breakdown API" showLineNumbers + curl -L -X GET 'http://localhost:4000/user/daily/activity?start_date=2025-03-20&end_date=2025-03-27' \ + -H 'Authorization: Bearer sk-...' + ``` + + ```json title="Daily Spend Breakdown API Response" showLineNumbers + { + "results": [ + { + "date": "2025-03-27", + "metrics": { + "spend": 0.0177072, + "prompt_tokens": 111, + "completion_tokens": 1711, + "total_tokens": 1822, + "api_requests": 11 + }, + "breakdown": { + "models": { + "gpt-4o-mini": { + "spend": 1.095e-05, + "prompt_tokens": 37, + "completion_tokens": 9, + "total_tokens": 46, + "api_requests": 1 + }, + "providers": { "openai": { ... }, "azure_ai": { ... } }, + "api_keys": { "3126b6eaf1...": { ... } } + } + } + ], + "metadata": { + "total_spend": 0.7274667, + "total_prompt_tokens": 280990, + "total_completion_tokens": 376674, + "total_api_requests": 14 + } + } + ``` + + + + +## New Models / Updated Models +- Support for Vertex AI gemini-2.0-flash-lite & Google AI Studio gemini-2.0-flash-lite [PR](https://github.com/BerriAI/litellm/pull/9523) +- Support for Vertex AI Fine-Tuned LLMs [PR](https://github.com/BerriAI/litellm/pull/9542) +- Nova Canvas image generation support [PR](https://github.com/BerriAI/litellm/pull/9525) +- OpenAI gpt-4o-transcribe support [PR](https://github.com/BerriAI/litellm/pull/9517) +- Added new Vertex AI text embedding model [PR](https://github.com/BerriAI/litellm/pull/9476) + +## LLM Translation +- OpenAI Web Search Tool Call Support [PR](https://github.com/BerriAI/litellm/pull/9465) +- Vertex AI topLogprobs support [PR](https://github.com/BerriAI/litellm/pull/9518) +- Support for sending images and video to Vertex AI multimodal embedding [Doc](https://docs.litellm.ai/docs/providers/vertex#multi-modal-embeddings) +- Support litellm.api_base for Vertex AI + Gemini across completion, embedding, image_generation [PR](https://github.com/BerriAI/litellm/pull/9516) +- Bug fix for returning `response_cost` when using litellm python SDK with LiteLLM Proxy [PR](https://github.com/BerriAI/litellm/commit/6fd18651d129d606182ff4b980e95768fc43ca3d) +- Support for `max_completion_tokens` on Mistral API [PR](https://github.com/BerriAI/litellm/pull/9606) +- Refactored Vertex AI passthrough routes - fixes unpredictable behaviour with auto-setting default_vertex_region on router model add [PR](https://github.com/BerriAI/litellm/pull/9467) + +## Spend Tracking Improvements +- Log 'api_base' on spend logs [PR](https://github.com/BerriAI/litellm/pull/9509) +- Support for Gemini audio token cost tracking [PR](https://github.com/BerriAI/litellm/pull/9535) +- Fixed OpenAI audio input token cost tracking [PR](https://github.com/BerriAI/litellm/pull/9535) + +## UI + +### Model Management +- Allowed team admins to add/update/delete models on UI [PR](https://github.com/BerriAI/litellm/pull/9572) +- Added render supports_web_search on model hub [PR](https://github.com/BerriAI/litellm/pull/9469) + +### Request Logs +- Show API base and model ID on request logs [PR](https://github.com/BerriAI/litellm/pull/9572) +- Allow viewing keyinfo on request logs [PR](https://github.com/BerriAI/litellm/pull/9568) + +### Usage Tab +- Added Daily User Spend Aggregate view - allows UI Usage tab to work > 1m rows [PR](https://github.com/BerriAI/litellm/pull/9538) +- Connected UI to "LiteLLM_DailyUserSpend" spend table [PR](https://github.com/BerriAI/litellm/pull/9603) + +## Logging Integrations +- Fixed StandardLoggingPayload for GCS Pub Sub Logging Integration [PR](https://github.com/BerriAI/litellm/pull/9508) +- Track `litellm_model_name` on `StandardLoggingPayload` [Docs](https://docs.litellm.ai/docs/proxy/logging_spec#standardlogginghiddenparams) + +## Performance / Reliability Improvements +- LiteLLM Redis semantic caching implementation [PR](https://github.com/BerriAI/litellm/pull/9356) +- Gracefully handle exceptions when DB is having an outage [PR](https://github.com/BerriAI/litellm/pull/9533) +- Allow Pods to startup + passing /health/readiness when allow_requests_on_db_unavailable: True and DB is down [PR](https://github.com/BerriAI/litellm/pull/9569) + + +## General Improvements +- Support for exposing MCP tools on litellm proxy [PR](https://github.com/BerriAI/litellm/pull/9426) +- Support discovering Gemini, Anthropic, xAI models by calling their /v1/model endpoint [PR](https://github.com/BerriAI/litellm/pull/9530) +- Fixed route check for non-proxy admins on JWT auth [PR](https://github.com/BerriAI/litellm/pull/9454) +- Added baseline Prisma database migrations [PR](https://github.com/BerriAI/litellm/pull/9565) +- View all wildcard models on /model/info [PR](https://github.com/BerriAI/litellm/pull/9572) + + +## Security +- Bumped next from 14.2.21 to 14.2.25 in UI dashboard [PR](https://github.com/BerriAI/litellm/pull/9458) + +## Complete Git Diff + +[Here's the complete git diff](https://github.com/BerriAI/litellm/compare/v1.63.14-stable.patch1...v1.65.0-stable) diff --git a/docs/my-website/release_notes/v1.65.0/index.md b/docs/my-website/release_notes/v1.65.0/index.md new file mode 100644 index 0000000000..84276c997d --- /dev/null +++ b/docs/my-website/release_notes/v1.65.0/index.md @@ -0,0 +1,34 @@ +--- +title: v1.65.0 - Team Model Add - update +slug: v1.65.0 +date: 2025-03-28T10:00:00 +authors: + - name: Krrish Dholakia + title: CEO, LiteLLM + url: https://www.linkedin.com/in/krish-d/ + image_url: https://media.licdn.com/dms/image/v2/D4D03AQGrlsJ3aqpHmQ/profile-displayphoto-shrink_400_400/B4DZSAzgP7HYAg-/0/1737327772964?e=1749686400&v=beta&t=Hkl3U8Ps0VtvNxX0BNNq24b4dtX5wQaPFp6oiKCIHD8 + - name: Ishaan Jaffer + title: CTO, LiteLLM + url: https://www.linkedin.com/in/reffajnaahsi/ + image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg +tags: [management endpoints, team models, ui] +hide_table_of_contents: false +--- + +import Image from '@theme/IdealImage'; + +v1.65.0 updates the `/model/new` endpoint to prevent non-team admins from creating team models. + +This means that only proxy admins or team admins can create team models. + +## Additional Changes + +- Allows team admins to call `/model/update` to update team models. +- Allows team admins to call `/model/delete` to delete team models. +- Introduces new `user_models_only` param to `/v2/model/info` - only return models added by this user. + + +These changes enable team admins to add and manage models for their team on the LiteLLM UI + API. + + +The request you would send to LiteLLM /chat/completions endpoint.
+How LiteLLM transforms your request for the specified provider.
+Note: Sensitive headers are not shown.
+{transformedRequest}+ +
The request you would send to LiteLLM /chat/completions endpoint.
+How LiteLLM transforms your request for the specified provider.
+Note: Sensitive headers are not shown.
+{transformedRequest}+ +