Commit graph

95 commits

Author SHA1 Message Date
Krrish Dholakia
9860a4e819 docs(routing.md): cleanup docs
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 16s
Helm unit test / unit-test (push) Successful in 16s
2025-03-14 14:34:15 -07:00
Ishaan Jaff
207f41cbea docs fix router default settings 2025-03-05 08:29:21 -08:00
Krrish Dholakia
d0413ec96b docs(routing.md): add section on weighted deployments 2025-02-17 17:02:06 -08:00
Krish Dholakia
70a9ea99f2
Controll fallback prompts client-side (#7334)
* feat(router.py): support passing model-specific messages in fallbacks

* docs(routing.md): separate router timeouts into separate doc

allow for 1 fallbacks doc (across proxy/router)

* docs(routing.md): cleanup router docs

* docs(reliability.md): cleanup docs

* docs(reliability.md): cleaned up fallback doc

just have 1 doc across sdk/proxy

simplifies docs

* docs(reliability.md): add setting model-specific fallback prompts

* fix: fix linting errors

* test: skip test causing openai rate limit errros

* test: fix test

* test: run vertex test first to catch error
2024-12-20 19:09:53 -08:00
Krish Dholakia
0c0498dd60
Litellm dev 12 07 2024 (#7086)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 11s
* fix(main.py): support passing max retries to azure/openai embedding integrations

Fixes https://github.com/BerriAI/litellm/issues/7003

* feat(team_endpoints.py): allow updating team model aliases

Closes https://github.com/BerriAI/litellm/issues/6956

* feat(router.py): allow specifying model id as fallback - skips any cooldown check

Allows a default model to be checked if all models in cooldown

s/o @micahjsmith

* docs(reliability.md): add fallback to specific model to docs

* fix(utils.py): new 'is_prompt_caching_valid_prompt' helper util

Allows user to identify if messages/tools have prompt caching

Related issue: https://github.com/BerriAI/litellm/issues/6784

* feat(router.py): store model id for prompt caching valid prompt

Allows routing to that model id on subsequent requests

* fix(router.py): only cache if prompt is valid prompt caching prompt

prevents storing unnecessary items in cache

* feat(router.py): support routing prompt caching enabled models to previous deployments

Closes https://github.com/BerriAI/litellm/issues/6784

* test: fix linting errors

* feat(databricks/): convert basemodel to dict and exclude none values

allow passing pydantic message to databricks

* fix(utils.py): ensure all chat completion messages are dict

* (feat) Track `custom_llm_provider` in LiteLLMSpendLogs (#7081)

* add custom_llm_provider to SpendLogsPayload

* add custom_llm_provider to SpendLogs

* add custom llm provider to SpendLogs payload

* test_spend_logs_payload

* Add MLflow to the side bar (#7031)

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>

* (bug fix) SpendLogs update DB catch all possible DB errors for retrying  (#7082)

* catch DB_CONNECTION_ERROR_TYPES

* fix DB retry mechanism for SpendLog updates

* use DB_CONNECTION_ERROR_TYPES in auth checks

* fix exp back off for writing SpendLogs

* use _raise_failed_update_spend_exception to ensure errors print as NON blocking

* test_update_spend_logs_multiple_batches_with_failure

* (Feat) Add StructuredOutputs support for Fireworks.AI (#7085)

* fix model cost map fireworks ai "supports_response_schema": true,

* fix supports_response_schema

* fix map openai params fireworks ai

* test_map_response_format

* test_map_response_format

* added deepinfra/Meta-Llama-3.1-405B-Instruct (#7084)

* bump: version 1.53.9 → 1.54.0

* fix deepinfra

* litellm db fixes LiteLLM_UserTable (#7089)

* ci/cd queue new release

* fix llama-3.3-70b-versatile

* refactor - use consistent file naming convention `AI21/` -> `ai21`  (#7090)

* fix refactor - use consistent file naming convention

* ci/cd run again

* fix naming structure

* fix use consistent naming (#7092)

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Co-authored-by: ali sayyah <ali.sayyah2@gmail.com>
2024-12-08 00:30:33 -08:00
Krish Dholakia
2d2931a215
LiteLLM Minor Fixes & Improvements (11/26/2024) (#6913)
* docs(config_settings.md): document all router_settings

* ci(config.yml): add router_settings doc test to ci/cd

* test: debug test on ci/cd

* test: debug ci/cd test

* test: fix test

* fix(team_endpoints.py): skip invalid team object. don't fail `/team/list` call

Causes downstream errors if ui just fails to load team list

* test(base_llm_unit_tests.py): add 'response_format={"type": "text"}' test to base_llm_unit_tests

adds complete coverage for all 'response_format' values to ci/cd

* feat(router.py): support wildcard routes in `get_router_model_info()`

Addresses https://github.com/BerriAI/litellm/issues/6914

* build(model_prices_and_context_window.json): add tpm/rpm limits for all gemini models

Allows for ratelimit tracking for gemini models even with wildcard routing enabled

Addresses https://github.com/BerriAI/litellm/issues/6914

* feat(router.py): add tpm/rpm tracking on success/failure to global_router

Addresses https://github.com/BerriAI/litellm/issues/6914

* feat(router.py): support wildcard routes on router.get_model_group_usage()

* fix(router.py): fix linting error

* fix(router.py): implement get_remaining_tokens_and_requests

Addresses https://github.com/BerriAI/litellm/issues/6914

* fix(router.py): fix linting errors

* test: fix test

* test: fix tests

* docs(config_settings.md): add missing dd env vars to docs

* fix(router.py): check if hidden params is dict
2024-11-28 00:01:38 +05:30
Emmanuel Ferdman
9cf3dcbbf3
Update routing references (#6758)
* Update routing references

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* Update routing references

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

---------

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2024-11-16 08:28:44 -08:00
Ishaan Jaff
bbf4db79c1 docs - show correct rpm - > tpm conversion for Azure 2024-09-27 17:18:55 -07:00
Krish Dholakia
7ed6938a3f
LiteLLM Minor Fixes & Improvements (09/20/2024) (#5807)
* fix(vertex_llm_base.py): Handle api_base = ""

Fixes https://github.com/BerriAI/litellm/issues/5798

* fix(o1_transformation.py): handle stream_options not being supported

https://github.com/BerriAI/litellm/issues/5803

* docs(routing.md): fix docs

Closes https://github.com/BerriAI/litellm/issues/5808

* perf(internal_user_endpoints.py): reduce db calls for getting team_alias for a key

Use the list gotten earlier in `/user/info` endpoint

 Reduces ui keys tab load time to 800ms (prev. 28s+)

* feat(proxy_server.py): support CONFIG_FILE_PATH as env var

Closes https://github.com/BerriAI/litellm/issues/5744

* feat(get_llm_provider_logic.py): add `litellm_proxy/` as a known openai-compatible route

simplifies calling litellm proxy

Reduces confusion when calling models on litellm proxy from litellm sdk

* docs(litellm_proxy.md): cleanup docs

* fix(internal_user_endpoints.py): fix pydantic obj

* test(test_key_generate_prisma.py): fix test
2024-09-20 20:21:32 -07:00
Krish Dholakia
98c34a7e27
LiteLLM Minor Fixes and Improvements (11/09/2024) (#5634)
* fix(caching.py): set ttl for async_increment cache

fixes issue where ttl for redis client was not being set on increment_cache

Fixes https://github.com/BerriAI/litellm/issues/5609

* fix(caching.py): fix increment cache w/ ttl for sync increment cache on redis

Fixes https://github.com/BerriAI/litellm/issues/5609

* fix(router.py): support adding retry policy + allowed fails policy via config.yaml

* fix(router.py): don't cooldown single deployments

No point, as there's no other deployment to loadbalance with.

* fix(user_api_key_auth.py): support setting allowed email domains on jwt tokens

Closes https://github.com/BerriAI/litellm/issues/5605

* docs(token_auth.md): add user upsert + allowed email domain to jwt auth docs

* fix(litellm_pre_call_utils.py): fix dynamic key logging when team id is set

Fixes issue where key logging would not be set if team metadata was not none

* fix(secret_managers/main.py): load environment variables correctly

Fixes issue where os.environ/ was not being loaded correctly

* test(test_router.py): fix test

* feat(spend_tracking_utils.py): support logging additional usage params - e.g. prompt caching values for deepseek

* test: fix tests

* test: fix test

* test: fix test

* test: fix test

* test: fix test
2024-09-11 22:36:06 -07:00
Krrish Dholakia
f5905e1000 docs(routing.md): add proxy loadbalancing tutorial 2024-09-03 07:38:19 -07:00
Krrish Dholakia
7f1531006c docs(routing.md): add weight-based shuffling to docs 2024-08-30 08:24:12 -07:00
Ishaan Jaff
bd7bf7f6b0 fix endpoint name on router 2024-08-16 12:46:43 -07:00
Ishaan Jaff
5bc8b59b11 docs - use consistent name for LiteLLM proxy server 2024-08-03 12:49:35 -07:00
Krrish Dholakia
051dfee421 docs(routing.md): add docs on how to disable cooldowns 2024-07-01 15:05:38 -07:00
Krrish Dholakia
8fbc34e7e9 docs(routing.md): add dynamic cooldowns to docs 2024-06-25 17:01:58 -07:00
Krrish Dholakia
86cb5aa031 docs(routing.md): add quickstart 2024-06-24 22:25:39 -07:00
Ishaan Jaff
91d9d59717 docs - routing 2024-06-20 14:32:52 -07:00
Krrish Dholakia
dca23aaa3e docs(routing.md): improve docs 2024-06-14 21:55:27 -07:00
Krrish Dholakia
a7a63f8392 docs(routing.md): update routing fallback docs with proxy examples 2024-06-14 21:53:45 -07:00
Krrish Dholakia
31a7328279 docs(routing.md): add content policy fallbacks to docs 2024-06-14 21:47:55 -07:00
Ishaan Jaff
9f0ae21ef5 docs - AllowedFailsPolicy 2024-06-01 17:56:57 -07:00
Krrish Dholakia
22b9933096 docs(routing.md): add default_fallbacks to routing.md docs 2024-05-13 21:28:15 -07:00
Ishaan Jaff
3a838934c9 docs - cooldown deployment 2024-05-13 12:50:59 -07:00
Ishaan Jaff
dac8c644fd docs - router show cooldown_time 2024-05-13 12:49:51 -07:00
Krrish Dholakia
0c87bb5adf docs(reliability.md): add region based routing to proxy + sdk docs 2024-05-11 11:34:12 -07:00
Krrish Dholakia
67b4aa28bd docs(routing.md): make clear lowest cost routing is async 2024-05-07 21:34:18 -07:00
Ishaan Jaff
d46544d2bc docs setup alerting on router 2024-05-07 18:26:45 -07:00
Ishaan Jaff
d5f93048cc docs - lowest cost routing 2024-05-07 13:15:30 -07:00
Ishaan Jaff
4c909194c7 docs - lowest - latency routing 2024-05-07 12:43:44 -07:00
Ishaan Jaff
bbf5d79069 docs - set retry policy 2024-05-04 17:52:01 -07:00
Krrish Dholakia
6a2ddc2791 docs(routing.md): add docs on lowest latency routing buffer 2024-04-30 22:41:50 -07:00
Krrish Dholakia
cef2d95bb4 docs(routing.md): add max parallel requests to router docs 2024-04-29 15:37:48 -07:00
Ishaan Jaff
fa83e2da06 docs - fix routing 2024-04-25 13:43:51 -07:00
Krrish Dholakia
c5d880b6fd docs(routing.md): add simple shuffle async support to docs 2024-04-20 15:02:22 -07:00
Krrish Dholakia
d999acd20d docs(routing.md): reorder routing strategies 2024-04-10 22:29:24 -07:00
Krrish Dholakia
83a7a9f0b7 docs(routing.md): add calling via proxy tutorial to router docs 2024-04-10 22:24:29 -07:00
Krrish Dholakia
9f517b2907 docs(routing.md): add async usage based routing to docs 2024-04-10 21:51:36 -07:00
CLARKBENHAM
e96d97d9e5 remove formating changes 2024-04-08 21:31:21 -07:00
CLARKBENHAM
6e20bb13b2 Revert "doc pre_call_check: enables router rate limits for concurrent calls"
This reverts commit 886c859519.
2024-04-08 21:27:38 -07:00
CLARKBENHAM
886c859519 doc pre_call_check: enables router rate limits for concurrent calls 2024-04-08 21:20:59 -07:00
Krrish Dholakia
a917fadf45 docs(routing.md): refactor docs to show how to use pre-call checks and fallback across model groups 2024-04-01 11:21:27 -07:00
Krrish Dholakia
9e9de7f6e2 docs(routing.md): add fallbacks being done in order 2024-03-24 12:13:19 -07:00
Krrish Dholakia
1c60fd0e78 docs(routing.md): add url 2024-03-23 20:03:42 -07:00
Krrish Dholakia
7c74ea8b77 docs(routing.md): add proxy example to pre-call checks in routing docs 2024-03-23 20:00:50 -07:00
Krrish Dholakia
e8e7964025 docs(routing.md): add pre-call checks to docs 2024-03-23 19:10:34 -07:00
Krrish Dholakia
47424b8c90 docs(routing.md): fix routing example on docs 2024-03-11 22:17:04 -07:00
ishaan-jaff
e46980c56c (docs) using litellm router 2024-03-11 21:18:10 -07:00
ishaan-jaff
2b2e62477b (docs) use correct base model for cost 2024-02-20 21:00:43 -08:00
ishaan-jaff
950c753429 (docs) on callbacks tracking api_key, base etc 2024-01-27 08:31:50 -08:00