forked from phoenix/litellm-mirror
LiteLLM Minor Fixes & Improvements (11/26/2024) (#6913)
* docs(config_settings.md): document all router_settings * ci(config.yml): add router_settings doc test to ci/cd * test: debug test on ci/cd * test: debug ci/cd test * test: fix test * fix(team_endpoints.py): skip invalid team object. don't fail `/team/list` call Causes downstream errors if ui just fails to load team list * test(base_llm_unit_tests.py): add 'response_format={"type": "text"}' test to base_llm_unit_tests adds complete coverage for all 'response_format' values to ci/cd * feat(router.py): support wildcard routes in `get_router_model_info()` Addresses https://github.com/BerriAI/litellm/issues/6914 * build(model_prices_and_context_window.json): add tpm/rpm limits for all gemini models Allows for ratelimit tracking for gemini models even with wildcard routing enabled Addresses https://github.com/BerriAI/litellm/issues/6914 * feat(router.py): add tpm/rpm tracking on success/failure to global_router Addresses https://github.com/BerriAI/litellm/issues/6914 * feat(router.py): support wildcard routes on router.get_model_group_usage() * fix(router.py): fix linting error * fix(router.py): implement get_remaining_tokens_and_requests Addresses https://github.com/BerriAI/litellm/issues/6914 * fix(router.py): fix linting errors * test: fix test * test: fix tests * docs(config_settings.md): add missing dd env vars to docs * fix(router.py): check if hidden params is dict
This commit is contained in:
parent
5d13302e6b
commit
2d2931a215
22 changed files with 878 additions and 131 deletions
|
@ -279,7 +279,31 @@ router_settings:
|
|||
| retry_policy | object | Specifies the number of retries for different types of exceptions. [More information here](reliability) |
|
||||
| allowed_fails | integer | The number of failures allowed before cooling down a model. [More information here](reliability) |
|
||||
| allowed_fails_policy | object | Specifies the number of allowed failures for different error types before cooling down a deployment. [More information here](reliability) |
|
||||
|
||||
| default_max_parallel_requests | Optional[int] | The default maximum number of parallel requests for a deployment. |
|
||||
| default_priority | (Optional[int]) | The default priority for a request. Only for '.scheduler_acompletion()'. Default is None. |
|
||||
| polling_interval | (Optional[float]) | frequency of polling queue. Only for '.scheduler_acompletion()'. Default is 3ms. |
|
||||
| max_fallbacks | Optional[int] | The maximum number of fallbacks to try before exiting the call. Defaults to 5. |
|
||||
| default_litellm_params | Optional[dict] | The default litellm parameters to add to all requests (e.g. `temperature`, `max_tokens`). |
|
||||
| timeout | Optional[float] | The default timeout for a request. |
|
||||
| debug_level | Literal["DEBUG", "INFO"] | The debug level for the logging library in the router. Defaults to "INFO". |
|
||||
| client_ttl | int | Time-to-live for cached clients in seconds. Defaults to 3600. |
|
||||
| cache_kwargs | dict | Additional keyword arguments for the cache initialization. |
|
||||
| routing_strategy_args | dict | Additional keyword arguments for the routing strategy - e.g. lowest latency routing default ttl |
|
||||
| model_group_alias | dict | Model group alias mapping. E.g. `{"claude-3-haiku": "claude-3-haiku-20240229"}` |
|
||||
| num_retries | int | Number of retries for a request. Defaults to 3. |
|
||||
| default_fallbacks | Optional[List[str]] | Fallbacks to try if no model group-specific fallbacks are defined. |
|
||||
| caching_groups | Optional[List[tuple]] | List of model groups for caching across model groups. Defaults to None. - e.g. caching_groups=[("openai-gpt-3.5-turbo", "azure-gpt-3.5-turbo")]|
|
||||
| alerting_config | AlertingConfig | [SDK-only arg] Slack alerting configuration. Defaults to None. [Further Docs](../routing.md#alerting-) |
|
||||
| assistants_config | AssistantsConfig | Set on proxy via `assistant_settings`. [Further docs](../assistants.md) |
|
||||
| set_verbose | boolean | [DEPRECATED PARAM - see debug docs](./debugging.md) If true, sets the logging level to verbose. |
|
||||
| retry_after | int | Time to wait before retrying a request in seconds. Defaults to 0. If `x-retry-after` is received from LLM API, this value is overridden. |
|
||||
| provider_budget_config | ProviderBudgetConfig | Provider budget configuration. Use this to set llm_provider budget limits. example $100/day to OpenAI, $100/day to Azure, etc. Defaults to None. [Further Docs](./provider_budget_routing.md) |
|
||||
| enable_pre_call_checks | boolean | If true, checks if a call is within the model's context window before making the call. [More information here](reliability) |
|
||||
| model_group_retry_policy | Dict[str, RetryPolicy] | [SDK-only arg] Set retry policy for model groups. |
|
||||
| context_window_fallbacks | List[Dict[str, List[str]]] | Fallback models for context window violations. |
|
||||
| redis_url | str | URL for Redis server. **Known performance issue with Redis URL.** |
|
||||
| cache_responses | boolean | Flag to enable caching LLM Responses, if cache set under `router_settings`. If true, caches responses. Defaults to False. |
|
||||
| router_general_settings | RouterGeneralSettings | [SDK-Only] Router general settings - contains optimizations like 'async_only_mode'. [Docs](../routing.md#router-general-settings) |
|
||||
|
||||
### environment variables - Reference
|
||||
|
||||
|
@ -335,6 +359,8 @@ router_settings:
|
|||
| DD_SITE | Site URL for Datadog (e.g., datadoghq.com)
|
||||
| DD_SOURCE | Source identifier for Datadog logs
|
||||
| DD_ENV | Environment identifier for Datadog logs. Only supported for `datadog_llm_observability` callback
|
||||
| DD_SERVICE | Service identifier for Datadog logs. Defaults to "litellm-server"
|
||||
| DD_VERSION | Version identifier for Datadog logs. Defaults to "unknown"
|
||||
| DEBUG_OTEL | Enable debug mode for OpenTelemetry
|
||||
| DIRECT_URL | Direct URL for service endpoint
|
||||
| DISABLE_ADMIN_UI | Toggle to disable the admin UI
|
||||
|
|
|
@ -357,77 +357,6 @@ curl --location 'http://0.0.0.0:4000/v1/model/info' \
|
|||
--data ''
|
||||
```
|
||||
|
||||
|
||||
### Provider specific wildcard routing
|
||||
**Proxy all models from a provider**
|
||||
|
||||
Use this if you want to **proxy all models from a specific provider without defining them on the config.yaml**
|
||||
|
||||
**Step 1** - define provider specific routing on config.yaml
|
||||
```yaml
|
||||
model_list:
|
||||
# provider specific wildcard routing
|
||||
- model_name: "anthropic/*"
|
||||
litellm_params:
|
||||
model: "anthropic/*"
|
||||
api_key: os.environ/ANTHROPIC_API_KEY
|
||||
- model_name: "groq/*"
|
||||
litellm_params:
|
||||
model: "groq/*"
|
||||
api_key: os.environ/GROQ_API_KEY
|
||||
- model_name: "fo::*:static::*" # all requests matching this pattern will be routed to this deployment, example: model="fo::hi::static::hi" will be routed to deployment: "openai/fo::*:static::*"
|
||||
litellm_params:
|
||||
model: "openai/fo::*:static::*"
|
||||
api_key: os.environ/OPENAI_API_KEY
|
||||
```
|
||||
|
||||
Step 2 - Run litellm proxy
|
||||
|
||||
```shell
|
||||
$ litellm --config /path/to/config.yaml
|
||||
```
|
||||
|
||||
Step 3 Test it
|
||||
|
||||
Test with `anthropic/` - all models with `anthropic/` prefix will get routed to `anthropic/*`
|
||||
```shell
|
||||
curl http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "anthropic/claude-3-sonnet-20240229",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello, Claude!"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Test with `groq/` - all models with `groq/` prefix will get routed to `groq/*`
|
||||
```shell
|
||||
curl http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "groq/llama3-8b-8192",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello, Claude!"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Test with `fo::*::static::*` - all requests matching this pattern will be routed to `openai/fo::*:static::*`
|
||||
```shell
|
||||
curl http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "fo::hi::static::hi",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello, Claude!"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
### Load Balancing
|
||||
|
||||
:::info
|
||||
|
|
|
@ -1891,3 +1891,22 @@ router = Router(
|
|||
debug_level="DEBUG" # defaults to INFO
|
||||
)
|
||||
```
|
||||
|
||||
## Router General Settings
|
||||
|
||||
### Usage
|
||||
|
||||
```python
|
||||
router = Router(model_list=..., router_general_settings=RouterGeneralSettings(async_only_mode=True))
|
||||
```
|
||||
|
||||
### Spec
|
||||
```python
|
||||
class RouterGeneralSettings(BaseModel):
|
||||
async_only_mode: bool = Field(
|
||||
default=False
|
||||
) # this will only initialize async clients. Good for memory utils
|
||||
pass_through_all_models: bool = Field(
|
||||
default=False
|
||||
) # if passed a model not llm_router model list, pass through the request to litellm.acompletion/embedding
|
||||
```
|
140
docs/my-website/docs/wildcard_routing.md
Normal file
140
docs/my-website/docs/wildcard_routing.md
Normal file
|
@ -0,0 +1,140 @@
|
|||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
# Provider specific Wildcard routing
|
||||
|
||||
**Proxy all models from a provider**
|
||||
|
||||
Use this if you want to **proxy all models from a specific provider without defining them on the config.yaml**
|
||||
|
||||
## Step 1. Define provider specific routing
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
from litellm import Router
|
||||
|
||||
router = Router(
|
||||
model_list=[
|
||||
{
|
||||
"model_name": "anthropic/*",
|
||||
"litellm_params": {
|
||||
"model": "anthropic/*",
|
||||
"api_key": os.environ["ANTHROPIC_API_KEY"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"model_name": "groq/*",
|
||||
"litellm_params": {
|
||||
"model": "groq/*",
|
||||
"api_key": os.environ["GROQ_API_KEY"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"model_name": "fo::*:static::*", # all requests matching this pattern will be routed to this deployment, example: model="fo::hi::static::hi" will be routed to deployment: "openai/fo::*:static::*"
|
||||
"litellm_params": {
|
||||
"model": "openai/fo::*:static::*",
|
||||
"api_key": os.environ["OPENAI_API_KEY"]
|
||||
}
|
||||
}
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
**Step 1** - define provider specific routing on config.yaml
|
||||
```yaml
|
||||
model_list:
|
||||
# provider specific wildcard routing
|
||||
- model_name: "anthropic/*"
|
||||
litellm_params:
|
||||
model: "anthropic/*"
|
||||
api_key: os.environ/ANTHROPIC_API_KEY
|
||||
- model_name: "groq/*"
|
||||
litellm_params:
|
||||
model: "groq/*"
|
||||
api_key: os.environ/GROQ_API_KEY
|
||||
- model_name: "fo::*:static::*" # all requests matching this pattern will be routed to this deployment, example: model="fo::hi::static::hi" will be routed to deployment: "openai/fo::*:static::*"
|
||||
litellm_params:
|
||||
model: "openai/fo::*:static::*"
|
||||
api_key: os.environ/OPENAI_API_KEY
|
||||
```
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## [PROXY-Only] Step 2 - Run litellm proxy
|
||||
|
||||
```shell
|
||||
$ litellm --config /path/to/config.yaml
|
||||
```
|
||||
|
||||
## Step 3 - Test it
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="sdk" label="SDK">
|
||||
|
||||
```python
|
||||
from litellm import Router
|
||||
|
||||
router = Router(model_list=...)
|
||||
|
||||
# Test with `anthropic/` - all models with `anthropic/` prefix will get routed to `anthropic/*`
|
||||
resp = completion(model="anthropic/claude-3-sonnet-20240229", messages=[{"role": "user", "content": "Hello, Claude!"}])
|
||||
print(resp)
|
||||
|
||||
# Test with `groq/` - all models with `groq/` prefix will get routed to `groq/*`
|
||||
resp = completion(model="groq/llama3-8b-8192", messages=[{"role": "user", "content": "Hello, Groq!"}])
|
||||
print(resp)
|
||||
|
||||
# Test with `fo::*::static::*` - all requests matching this pattern will be routed to `openai/fo::*:static::*`
|
||||
resp = completion(model="fo::hi::static::hi", messages=[{"role": "user", "content": "Hello, Claude!"}])
|
||||
print(resp)
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="proxy" label="PROXY">
|
||||
|
||||
Test with `anthropic/` - all models with `anthropic/` prefix will get routed to `anthropic/*`
|
||||
```bash
|
||||
curl http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "anthropic/claude-3-sonnet-20240229",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello, Claude!"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Test with `groq/` - all models with `groq/` prefix will get routed to `groq/*`
|
||||
```shell
|
||||
curl http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "groq/llama3-8b-8192",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello, Claude!"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Test with `fo::*::static::*` - all requests matching this pattern will be routed to `openai/fo::*:static::*`
|
||||
```shell
|
||||
curl http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "fo::hi::static::hi",
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello, Claude!"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
|
@ -277,7 +277,7 @@ const sidebars = {
|
|||
description: "Learn how to load balance, route, and set fallbacks for your LLM requests",
|
||||
slug: "/routing-load-balancing",
|
||||
},
|
||||
items: ["routing", "scheduler", "proxy/load_balancing", "proxy/reliability", "proxy/tag_routing", "proxy/provider_budget_routing", "proxy/team_based_routing", "proxy/customer_routing"],
|
||||
items: ["routing", "scheduler", "proxy/load_balancing", "proxy/reliability", "proxy/tag_routing", "proxy/provider_budget_routing", "proxy/team_based_routing", "proxy/customer_routing", "wildcard_routing"],
|
||||
},
|
||||
{
|
||||
type: "category",
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue