forked from phoenix/litellm-mirror
LiteLLM Minor Fixes & Improvements (11/23/2024) (#6870)
* feat(pass_through_endpoints/): support logging anthropic/gemini pass through calls to langfuse/s3/etc. * fix(utils.py): allow disabling end user cost tracking with new param Allows proxy admin to disable cost tracking for end user - keeps prometheus metrics small * docs(configs.md): add disable_end_user_cost_tracking reference to docs * feat(key_management_endpoints.py): add support for restricting access to `/key/generate` by team/proxy level role Enables admin to restrict key creation, and assign team admins to handle distributing keys * test(test_key_management.py): add unit testing for personal / team key restriction checks * docs: add docs on restricting key creation * docs(finetuned_models.md): add new guide on calling finetuned models * docs(input.md): cleanup anthropic supported params Closes https://github.com/BerriAI/litellm/issues/6856 * test(test_embedding.py): add test for passing extra headers via embedding * feat(cohere/embed): pass client to async embedding * feat(rerank.py): add `/v1/rerank` if missing for cohere base url Closes https://github.com/BerriAI/litellm/issues/6844 * fix(main.py): pass extra_headers param to openai Fixes https://github.com/BerriAI/litellm/issues/6836 * fix(litellm_logging.py): don't disable global callbacks when dynamic callbacks are set Fixes issue where global callbacks - e.g. prometheus were overriden when langfuse was set dynamically * fix(handler.py): fix linting error * fix: fix typing * build: add conftest to proxy_admin_ui_tests/ * test: fix test * fix: fix linting errors * test: fix test * fix: fix pass through testing
This commit is contained in:
parent
d81ae45827
commit
7e9d8b58f6
35 changed files with 871 additions and 248 deletions
|
@ -41,7 +41,7 @@ Use `litellm.get_supported_openai_params()` for an updated list of params for ea
|
|||
|
||||
| Provider | temperature | max_completion_tokens | max_tokens | top_p | stream | stream_options | stop | n | presence_penalty | frequency_penalty | functions | function_call | logit_bias | user | response_format | seed | tools | tool_choice | logprobs | top_logprobs | extra_headers |
|
||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
||||
|Anthropic| ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ | | | | | | |✅ | ✅ | ✅ | ✅ | ✅ | | | ✅ |
|
||||
|Anthropic| ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ | | | | | | |✅ | ✅ | | ✅ | ✅ | | | ✅ |
|
||||
|OpenAI| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
|Azure OpenAI| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |✅ | ✅ | | | ✅ |
|
||||
|Replicate | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | | | | |
|
||||
|
|
74
docs/my-website/docs/guides/finetuned_models.md
Normal file
74
docs/my-website/docs/guides/finetuned_models.md
Normal file
|
@ -0,0 +1,74 @@
|
|||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
|
||||
# Calling Finetuned Models
|
||||
|
||||
## OpenAI
|
||||
|
||||
|
||||
| Model Name | Function Call |
|
||||
|---------------------------|-----------------------------------------------------------------|
|
||||
| fine tuned `gpt-4-0613` | `response = completion(model="ft:gpt-4-0613", messages=messages)` |
|
||||
| fine tuned `gpt-4o-2024-05-13` | `response = completion(model="ft:gpt-4o-2024-05-13", messages=messages)` |
|
||||
| fine tuned `gpt-3.5-turbo-0125` | `response = completion(model="ft:gpt-3.5-turbo-0125", messages=messages)` |
|
||||
| fine tuned `gpt-3.5-turbo-1106` | `response = completion(model="ft:gpt-3.5-turbo-1106", messages=messages)` |
|
||||
| fine tuned `gpt-3.5-turbo-0613` | `response = completion(model="ft:gpt-3.5-turbo-0613", messages=messages)` |
|
||||
|
||||
|
||||
## Vertex AI
|
||||
|
||||
Fine tuned models on vertex have a numerical model/endpoint id.
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
from litellm import completion
|
||||
import os
|
||||
|
||||
## set ENV variables
|
||||
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
|
||||
os.environ["VERTEXAI_LOCATION"] = "us-central1"
|
||||
|
||||
response = completion(
|
||||
model="vertex_ai/<your-finetuned-model>", # e.g. vertex_ai/4965075652664360960
|
||||
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
||||
base_model="vertex_ai/gemini-1.5-pro" # the base model - used for routing
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
1. Add Vertex Credentials to your env
|
||||
|
||||
```bash
|
||||
!gcloud auth application-default login
|
||||
```
|
||||
|
||||
2. Setup config.yaml
|
||||
|
||||
```yaml
|
||||
- model_name: finetuned-gemini
|
||||
litellm_params:
|
||||
model: vertex_ai/<ENDPOINT_ID>
|
||||
vertex_project: <PROJECT_ID>
|
||||
vertex_location: <LOCATION>
|
||||
model_info:
|
||||
base_model: vertex_ai/gemini-1.5-pro # IMPORTANT
|
||||
```
|
||||
|
||||
3. Test it!
|
||||
|
||||
```bash
|
||||
curl --location 'https://0.0.0.0:4000/v1/chat/completions' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--header 'Authorization: <LITELLM_KEY>' \
|
||||
--data '{"model": "finetuned-gemini" ,"messages":[{"role": "user", "content":[{"type": "text", "text": "hi"}]}]}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
|
@ -754,6 +754,8 @@ general_settings:
|
|||
| cache_params.s3_endpoint_url | string | Optional - The endpoint URL for the S3 bucket. |
|
||||
| cache_params.supported_call_types | array of strings | The types of calls to cache. [Further docs](./caching) |
|
||||
| cache_params.mode | string | The mode of the cache. [Further docs](./caching) |
|
||||
| disable_end_user_cost_tracking | boolean | If true, turns off end user cost tracking on prometheus metrics + litellm spend logs table on proxy. |
|
||||
| key_generation_settings | object | Restricts who can generate keys. [Further docs](./virtual_keys.md#restricting-key-generation) |
|
||||
|
||||
### general_settings - Reference
|
||||
|
||||
|
|
|
@ -217,4 +217,10 @@ litellm_settings:
|
|||
max_parallel_requests: 1000 # (Optional[int], optional): Max number of requests that can be made in parallel. Defaults to None.
|
||||
tpm_limit: 1000 #(Optional[int], optional): Tpm limit. Defaults to None.
|
||||
rpm_limit: 1000 #(Optional[int], optional): Rpm limit. Defaults to None.
|
||||
```
|
||||
|
||||
key_generation_settings: # Restricts who can generate keys. [Further docs](./virtual_keys.md#restricting-key-generation)
|
||||
team_key_generation:
|
||||
allowed_team_member_roles: ["admin"]
|
||||
personal_key_generation: # maps to 'Default Team' on UI
|
||||
allowed_user_roles: ["proxy_admin"]
|
||||
```
|
||||
|
|
|
@ -811,6 +811,75 @@ litellm_settings:
|
|||
team_id: "core-infra"
|
||||
```
|
||||
|
||||
### Restricting Key Generation
|
||||
|
||||
Use this to control who can generate keys. Useful when letting others create keys on the UI.
|
||||
|
||||
```yaml
|
||||
litellm_settings:
|
||||
key_generation_settings:
|
||||
team_key_generation:
|
||||
allowed_team_member_roles: ["admin"]
|
||||
personal_key_generation: # maps to 'Default Team' on UI
|
||||
allowed_user_roles: ["proxy_admin"]
|
||||
```
|
||||
|
||||
#### Spec
|
||||
|
||||
```python
|
||||
class TeamUIKeyGenerationConfig(TypedDict):
|
||||
allowed_team_member_roles: List[str]
|
||||
|
||||
|
||||
class PersonalUIKeyGenerationConfig(TypedDict):
|
||||
allowed_user_roles: List[LitellmUserRoles]
|
||||
|
||||
|
||||
class StandardKeyGenerationConfig(TypedDict, total=False):
|
||||
team_key_generation: TeamUIKeyGenerationConfig
|
||||
personal_key_generation: PersonalUIKeyGenerationConfig
|
||||
|
||||
|
||||
class LitellmUserRoles(str, enum.Enum):
|
||||
"""
|
||||
Admin Roles:
|
||||
PROXY_ADMIN: admin over the platform
|
||||
PROXY_ADMIN_VIEW_ONLY: can login, view all own keys, view all spend
|
||||
ORG_ADMIN: admin over a specific organization, can create teams, users only within their organization
|
||||
|
||||
Internal User Roles:
|
||||
INTERNAL_USER: can login, view/create/delete their own keys, view their spend
|
||||
INTERNAL_USER_VIEW_ONLY: can login, view their own keys, view their own spend
|
||||
|
||||
|
||||
Team Roles:
|
||||
TEAM: used for JWT auth
|
||||
|
||||
|
||||
Customer Roles:
|
||||
CUSTOMER: External users -> these are customers
|
||||
|
||||
"""
|
||||
|
||||
# Admin Roles
|
||||
PROXY_ADMIN = "proxy_admin"
|
||||
PROXY_ADMIN_VIEW_ONLY = "proxy_admin_viewer"
|
||||
|
||||
# Organization admins
|
||||
ORG_ADMIN = "org_admin"
|
||||
|
||||
# Internal User Roles
|
||||
INTERNAL_USER = "internal_user"
|
||||
INTERNAL_USER_VIEW_ONLY = "internal_user_viewer"
|
||||
|
||||
# Team Roles
|
||||
TEAM = "team"
|
||||
|
||||
# Customer Roles - External users of proxy
|
||||
CUSTOMER = "customer"
|
||||
```
|
||||
|
||||
|
||||
## **Next Steps - Set Budgets, Rate Limits per Virtual Key**
|
||||
|
||||
[Follow this doc to set budgets, rate limiters per virtual key with LiteLLM](users)
|
||||
|
|
|
@ -199,6 +199,31 @@ const sidebars = {
|
|||
|
||||
],
|
||||
},
|
||||
{
|
||||
type: "category",
|
||||
label: "Guides",
|
||||
items: [
|
||||
"exception_mapping",
|
||||
"completion/provider_specific_params",
|
||||
"guides/finetuned_models",
|
||||
"completion/audio",
|
||||
"completion/vision",
|
||||
"completion/json_mode",
|
||||
"completion/prompt_caching",
|
||||
"completion/predict_outputs",
|
||||
"completion/prefix",
|
||||
"completion/drop_params",
|
||||
"completion/prompt_formatting",
|
||||
"completion/stream",
|
||||
"completion/message_trimming",
|
||||
"completion/function_call",
|
||||
"completion/model_alias",
|
||||
"completion/batching",
|
||||
"completion/mock_requests",
|
||||
"completion/reliable_completions",
|
||||
|
||||
]
|
||||
},
|
||||
{
|
||||
type: "category",
|
||||
label: "Supported Endpoints",
|
||||
|
@ -214,25 +239,8 @@ const sidebars = {
|
|||
},
|
||||
items: [
|
||||
"completion/input",
|
||||
"completion/provider_specific_params",
|
||||
"completion/json_mode",
|
||||
"completion/prompt_caching",
|
||||
"completion/audio",
|
||||
"completion/vision",
|
||||
"completion/predict_outputs",
|
||||
"completion/prefix",
|
||||
"completion/drop_params",
|
||||
"completion/prompt_formatting",
|
||||
"completion/output",
|
||||
"completion/usage",
|
||||
"exception_mapping",
|
||||
"completion/stream",
|
||||
"completion/message_trimming",
|
||||
"completion/function_call",
|
||||
"completion/model_alias",
|
||||
"completion/batching",
|
||||
"completion/mock_requests",
|
||||
"completion/reliable_completions",
|
||||
],
|
||||
},
|
||||
"embedding/supported_embedding",
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue