LiteLLM Minor Fixes & Improvements (11/23/2024) (#6870)

* feat(pass_through_endpoints/): support logging anthropic/gemini pass through calls to langfuse/s3/etc. * fix(utils.py): allow disabling end user cost tracking with new param Allows proxy admin to disable cost tracking for end user - keeps prometheus metrics small * docs(configs.md): add disable_end_user_cost_tracking reference to docs * feat(key_management_endpoints.py): add support for restricting access to `/key/generate` by team/proxy level role Enables admin to restrict key creation, and assign team admins to handle distributing keys * test(test_key_management.py): add unit testing for personal / team key restriction checks * docs: add docs on restricting key creation * docs(finetuned_models.md): add new guide on calling finetuned models * docs(input.md): cleanup anthropic supported params Closes https://github.com/BerriAI/litellm/issues/6856 * test(test_embedding.py): add test for passing extra headers via embedding * feat(cohere/embed): pass client to async embedding * feat(rerank.py): add `/v1/rerank` if missing for cohere base url Closes https://github.com/BerriAI/litellm/issues/6844 * fix(main.py): pass extra_headers param to openai Fixes https://github.com/BerriAI/litellm/issues/6836 * fix(litellm_logging.py): don't disable global callbacks when dynamic callbacks are set Fixes issue where global callbacks - e.g. prometheus were overriden when langfuse was set dynamically * fix(handler.py): fix linting error * fix: fix typing * build: add conftest to proxy_admin_ui_tests/ * test: fix test * fix: fix linting errors * test: fix test * fix: fix pass through testing
2024-11-23 15:17:40 +05:30 · 2024-11-23 15:17:40 +05:30 · 7e9d8b58f6
commit 7e9d8b58f6
parent d81ae45827
35 changed files with 871 additions and 248 deletions
--- a/docs/my-website/docs/completion/input.md
+++ b/docs/my-website/docs/completion/input.md
@ -41,7 +41,7 @@ Use `litellm.get_supported_openai_params()` for an updated list of params for ea

 | Provider | temperature | max_completion_tokens | max_tokens | top_p | stream | stream_options | stop | n | presence_penalty | frequency_penalty | functions | function_call | logit_bias | user | response_format | seed | tools | tool_choice | logprobs | top_logprobs | extra_headers |
 |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-|Anthropic| ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |  |  |   |  |  |  |✅ | ✅ | ✅ | ✅ | ✅ |  |  | ✅ |
+|Anthropic| ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |  |  |   |  |  |  |✅ | ✅ | | ✅ | ✅ |  |  | ✅ |
 |OpenAI| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ | ✅ |
 |Azure OpenAI| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |✅ | ✅ |  |  | ✅ |
 |Replicate | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |  |   |  |   |
--- a/docs/my-website/docs/guides/finetuned_models.md
+++ b/docs/my-website/docs/guides/finetuned_models.md
@ -0,0 +1,74 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+# Calling Finetuned Models
+
+## OpenAI
+
+
+| Model Name                | Function Call                                                          |
+|---------------------------|-----------------------------------------------------------------|
+| fine tuned `gpt-4-0613`    | `response = completion(model="ft:gpt-4-0613", messages=messages)`     |
+| fine tuned `gpt-4o-2024-05-13` | `response = completion(model="ft:gpt-4o-2024-05-13", messages=messages)` |
+| fine tuned `gpt-3.5-turbo-0125` | `response = completion(model="ft:gpt-3.5-turbo-0125", messages=messages)` |
+| fine tuned `gpt-3.5-turbo-1106` | `response = completion(model="ft:gpt-3.5-turbo-1106", messages=messages)` |
+| fine tuned `gpt-3.5-turbo-0613` | `response = completion(model="ft:gpt-3.5-turbo-0613", messages=messages)` |
+
+
+## Vertex AI
+
+Fine tuned models on vertex have a numerical model/endpoint id. 
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from litellm import completion
+import os
+
+## set ENV variables
+os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
+os.environ["VERTEXAI_LOCATION"] = "us-central1"
+
+response = completion(
+  model="vertex_ai/<your-finetuned-model>",  # e.g. vertex_ai/4965075652664360960
+  messages=[{ "content": "Hello, how are you?","role": "user"}],
+  base_model="vertex_ai/gemini-1.5-pro" # the base model - used for routing
+)
+```
+
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+1. Add Vertex Credentials to your env 
+
+```bash
+!gcloud auth application-default login
+```
+
+2. Setup config.yaml 
+
+```yaml
+- model_name: finetuned-gemini
+  litellm_params:
+    model: vertex_ai/<ENDPOINT_ID>
+    vertex_project: <PROJECT_ID>
+    vertex_location: <LOCATION>
+  model_info:
+    base_model: vertex_ai/gemini-1.5-pro # IMPORTANT
+```
+
+3. Test it! 
+
+```bash
+curl --location 'https://0.0.0.0:4000/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--header 'Authorization: <LITELLM_KEY>' \
+--data '{"model": "finetuned-gemini" ,"messages":[{"role": "user", "content":[{"type": "text", "text": "hi"}]}]}'
+```
+
+</TabItem>
+</Tabs>
+
+
--- a/docs/my-website/docs/proxy/configs.md
+++ b/docs/my-website/docs/proxy/configs.md
@ -754,6 +754,8 @@ general_settings:
 | cache_params.s3_endpoint_url | string | Optional - The endpoint URL for the S3 bucket. |
 | cache_params.supported_call_types | array of strings | The types of calls to cache. [Further docs](./caching) |
 | cache_params.mode | string | The mode of the cache. [Further docs](./caching) |
+| disable_end_user_cost_tracking | boolean | If true, turns off end user cost tracking on prometheus metrics + litellm spend logs table on proxy. |
+| key_generation_settings | object | Restricts who can generate keys. [Further docs](./virtual_keys.md#restricting-key-generation) |

 ### general_settings - Reference

--- a/docs/my-website/docs/proxy/self_serve.md
+++ b/docs/my-website/docs/proxy/self_serve.md
@ -217,4 +217,10 @@ litellm_settings:
    max_parallel_requests: 1000 # (Optional[int], optional): Max number of requests that can be made in parallel. Defaults to None.
    tpm_limit: 1000 #(Optional[int], optional): Tpm limit. Defaults to None.
    rpm_limit: 1000 #(Optional[int], optional): Rpm limit. Defaults to None.
-```
+
+  key_generation_settings: # Restricts who can generate keys. [Further docs](./virtual_keys.md#restricting-key-generation)
+    team_key_generation:
+      allowed_team_member_roles: ["admin"]
+    personal_key_generation: # maps to 'Default Team' on UI 
+      allowed_user_roles: ["proxy_admin"]
+```
--- a/docs/my-website/docs/proxy/virtual_keys.md
+++ b/docs/my-website/docs/proxy/virtual_keys.md
@ -811,6 +811,75 @@ litellm_settings:
    team_id: "core-infra"
 ```

+### Restricting Key Generation
+
+Use this to control who can generate keys. Useful when letting others create keys on the UI. 
+
+```yaml
+litellm_settings:
+  key_generation_settings:
+    team_key_generation:
+      allowed_team_member_roles: ["admin"]
+    personal_key_generation: # maps to 'Default Team' on UI 
+      allowed_user_roles: ["proxy_admin"]
+```
+
+#### Spec 
+
+```python
+class TeamUIKeyGenerationConfig(TypedDict):
+    allowed_team_member_roles: List[str]
+
+
+class PersonalUIKeyGenerationConfig(TypedDict):
+    allowed_user_roles: List[LitellmUserRoles] 
+
+
+class StandardKeyGenerationConfig(TypedDict, total=False):
+    team_key_generation: TeamUIKeyGenerationConfig
+    personal_key_generation: PersonalUIKeyGenerationConfig
+
+
+class LitellmUserRoles(str, enum.Enum):
+    """
+    Admin Roles:
+    PROXY_ADMIN: admin over the platform
+    PROXY_ADMIN_VIEW_ONLY: can login, view all own keys, view all spend
+    ORG_ADMIN: admin over a specific organization, can create teams, users only within their organization
+
+    Internal User Roles:
+    INTERNAL_USER: can login, view/create/delete their own keys, view their spend
+    INTERNAL_USER_VIEW_ONLY: can login, view their own keys, view their own spend
+
+
+    Team Roles:
+    TEAM: used for JWT auth
+
+
+    Customer Roles:
+    CUSTOMER: External users -> these are customers
+
+    """
+
+    # Admin Roles
+    PROXY_ADMIN = "proxy_admin"
+    PROXY_ADMIN_VIEW_ONLY = "proxy_admin_viewer"
+
+    # Organization admins
+    ORG_ADMIN = "org_admin"
+
+    # Internal User Roles
+    INTERNAL_USER = "internal_user"
+    INTERNAL_USER_VIEW_ONLY = "internal_user_viewer"
+
+    # Team Roles
+    TEAM = "team"
+
+    # Customer Roles - External users of proxy
+    CUSTOMER = "customer"
+```
+
+
 ## **Next Steps - Set Budgets, Rate Limits per Virtual Key**

 [Follow this doc to set budgets, rate limiters per virtual key with LiteLLM](users)
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -199,6 +199,31 @@ const sidebars = {
        
      ],
    },
+    {
+      type: "category",
+      label: "Guides",
+      items: [
+        "exception_mapping",
+        "completion/provider_specific_params",
+        "guides/finetuned_models",
+        "completion/audio",
+        "completion/vision",
+        "completion/json_mode",
+        "completion/prompt_caching",
+        "completion/predict_outputs",
+        "completion/prefix",
+        "completion/drop_params",
+        "completion/prompt_formatting",
+        "completion/stream",
+        "completion/message_trimming",
+        "completion/function_call",
+        "completion/model_alias",
+        "completion/batching",
+        "completion/mock_requests",
+        "completion/reliable_completions",
+        
+      ]
+    },
    {
      type: "category",
      label: "Supported Endpoints",
@ -214,25 +239,8 @@ const sidebars = {
          },
          items: [
            "completion/input",
-            "completion/provider_specific_params",
-            "completion/json_mode",
-            "completion/prompt_caching",
-            "completion/audio",
-            "completion/vision",
-            "completion/predict_outputs",
-            "completion/prefix",
-            "completion/drop_params",
-            "completion/prompt_formatting",
            "completion/output",
            "completion/usage",
-            "exception_mapping",
-            "completion/stream",
-            "completion/message_trimming",
-            "completion/function_call",
-            "completion/model_alias",
-            "completion/batching",
-            "completion/mock_requests",
-            "completion/reliable_completions",
          ],
        },
        "embedding/supported_embedding",