Commit graph

3423 commits

Author SHA1 Message Date
Ishaan Jaff
005f2fa1aa
docs - batches cost tracking (#7422) 2024-12-25 20:13:26 -08:00
Krish Dholakia
760328b6ad
Litellm dev 12 25 2025 p2 (#7420)
* test: add new test image embedding to base llm unit tests

Addresses https://github.com/BerriAI/litellm/issues/6515

* fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings

Fix https://github.com/BerriAI/litellm/issues/6515

* feat: initial commit for fireworks ai audio transcription support

Relevant issue: https://github.com/BerriAI/litellm/issues/7134

* test: initial fireworks ai test

* feat(fireworks_ai/): implemented fireworks ai audio transcription config

* fix(utils.py): register fireworks ai audio transcription config, in config manager

* fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription'

* refactor(fireworks_ai/): define text completion route with model name handling

moves model name handling to specific fireworks routes, as required by their api

* refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing

* fix: fix linting errors

* fix: fix linting errors

* fix: fix linting errors

* fix: fix linting errors

* fix(handler.py): fix linting errors

* fix(main.py): fix tgai text completion route

* refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request

* refactor: move test_fine_tuning_api out of local_testing

reduces local testing ci/cd time
2024-12-25 18:35:34 -08:00
Ishaan Jaff
157810fcbf
fix docs warning (#7419) 2024-12-25 16:42:14 -08:00
Ishaan Jaff
0ce5f9fe58
(feat) Support Dynamic Params for guardrails (#7415)
* update CustomGuardrail

* unit test custom guardrails

* add dynamic params for aporia

* add dynamic params to bedrock guard

* add dynamic params for all guardrails

* fix linting

* fix should_run_guardrail

* _validate_premium_user

* update guardrail doc

* doc update

* update code q

* should_run_guardrail
2024-12-25 16:07:29 -08:00
Ishaan Jaff
77fa751639 update docs base docker 2024-12-25 15:51:19 -08:00
Ishaan Jaff
c6ca835046 docs files api
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 11s
2024-12-24 20:46:43 -08:00
Ishaan Jaff
47e12802df
(feat) /batches Add support for using /batches endpoints in OAI format (#7402)
* run azure testing on ci/cd

* update docs on azure batches endpoints

* add input azure.jsonl

* refactor - use separate file for batches endpoints

* fixes for passing custom llm provider to /batch endpoints

* pass custom llm provider to files endpoints

* update azure batches doc

* add info for azure batches api

* update batches endpoints

* use simple helper for raising proxy exception

* update config.yml

* fix imports

* update tests

* use existing settings

* update env var used

* update configs

* update config.yml

* update ft testing
2024-12-24 16:58:05 -08:00
Krish Dholakia
c3edfc2c92
LiteLLM Minor Fixes & Improvements (12/23/2024) - p3 (#7394)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 35s
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching

* fix(context_caching/transformation.py): just use last identified cache point

Fixes https://github.com/BerriAI/litellm/issues/6738

* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google

Fixes https://github.com/BerriAI/litellm/issues/6738

* fix(vertex_ai/gemini/): track context caching tokens

* refactor(gemini/): place transformation.py inside `chat/` folder

make it easy for user to know we support the equivalent endpoint

* fix: fix import

* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder

make it easier to see cost calculation logic

* fix: fix linting errors

* fix: fix circular import

* feat(gemini/cost_calculator.py): support gemini context caching cost calculation

generifies anthropic's cost calculation function and uses it across anthropic + gemini

* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching

Closes https://github.com/BerriAI/litellm/issues/6891

* docs(gemini.md): add gemini context caching architecture diagram

make it easier for user to understand how context caching works

* docs(gemini.md): link to relevant gemini context caching code

* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code

* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario

* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation

* test: fix test
2024-12-23 22:02:52 -08:00
Ishaan Jaff
442d309bcd update release notes 2024-12-23 21:48:33 -08:00
Ishaan Jaff
f4a2143b82 update release notes 2024-12-23 21:43:47 -08:00
Ishaan Jaff
1f466ec9cf release notes 2024-12-23 21:38:56 -08:00
Ishaan Jaff
957fbc6dfb docs batches 2024-12-23 21:24:06 -08:00
Ishaan Jaff
43077a88d5 docs add files to supported endpoints 2024-12-23 20:51:34 -08:00
Krish Dholakia
48316520f4
LiteLLM Minor Fixes & Improvements (12/23/2024) - P2 (#7386)
* fix(main.py): support 'mock_timeout=true' param

allows mock requests on proxy to have a time delay, for testing

* fix(main.py): ensure mock timeouts raise litellm.Timeout error

triggers retry/fallbacks

* fix: fix fallback + mock timeout testing

* fix(router.py): always return remaining tpm/rpm limits, if limits are known

allows for rate limit headers to be guaranteed

* docs(timeout.md): add docs on mock timeout = true

* fix(main.py): fix linting errors

* test: fix test
2024-12-23 17:41:27 -08:00
Krish Dholakia
db59e08958
Litellm dev 12 23 2024 p1 (#7383)
* feat(guardrails_endpoint.py): new `/guardrails/list` endpoint

Allow users to view what the available guardrails are

* docs: document new `/guardrails/list` endpoint

* docs(enterprise.md): update docs

* fix(openai/transcription/handler.py): support cost tracking on vtt + srt formats

* fix(openai/transcriptions/handler.py): default to 'verbose_json' response format if 'text' or 'json' response_format received. ensures 'duration' param is received for all audio transcription requests

* fix: fix linting errors

* fix: remove unused import
2024-12-23 16:33:31 -08:00
Krish Dholakia
3671829e39
Complete 'requests' library removal (#7350)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 12s
* refactor: initial commit moving watsonx_text to base_llm_http_handler + clarifying new provider directory structure

* refactor(watsonx/completion/handler.py): move to using base llm http handler

removes 'requests' library usage

* fix(watsonx_text/transformation.py): fix result transformation

migrates to transformation.py, for usage with base llm http handler

* fix(streaming_handler.py): migrate watsonx streaming to transformation.py

ensures streaming works with base llm http handler

* fix(streaming_handler.py): fix streaming linting errors and remove watsonx conditional logic

* fix(watsonx/): fix chat route post completion route refactor

* refactor(watsonx/embed): refactor watsonx to use base llm http handler for embedding calls as well

* refactor(base.py): remove requests library usage from litellm

* build(pyproject.toml): remove requests library usage

* fix: fix linting errors

* fix: fix linting errors

* fix(types/utils.py): fix validation errors for modelresponsestream

* fix(replicate/handler.py): fix linting errors

* fix(litellm_logging.py): handle modelresponsestream object

* fix(streaming_handler.py): fix modelresponsestream args

* fix: remove unused imports

* test: fix test

* fix: fix test

* test: fix test

* test: fix tests

* test: fix test

* test: fix patch target

* test: fix test
2024-12-22 07:21:25 -08:00
Ishaan Jaff
8b1ea40e7b add img to release notes
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 36s
2024-12-21 21:24:16 -08:00
Ishaan Jaff
bb801ad3d9 update release notes 2024-12-21 21:20:13 -08:00
Krish Dholakia
1aed988c11
Litellm docs update (#7365)
* docs: fix release notes url + fix urlk

* docs(index.md): add link to github releases

* docs(index.md): add linkedin social url to release notes
2024-12-21 21:09:50 -08:00
Ishaan Jaff
31f5dcf8e5 docs 2024-12-21 20:52:39 -08:00
Ishaan Jaff
291229ee46 docs add 1.55.8 changelog 2024-12-21 20:51:39 -08:00
Ishaan Jaff
5a8f67c171 release notes v1.55.8 2024-12-21 20:31:54 -08:00
Ishaan Jaff
a3e732de39
(chore) - enforce model budgets on virtual keys as enterprise feature (#7353)
* docs - enforce model budget as enterprise feature

* docs link to correct place
2024-12-21 14:18:53 -08:00
Ishaan Jaff
6107f9f3f3
[Bug fix ]: Triton /infer handler incompatible with batch responses (#7337)
* migrate triton to base llm http handler

* clean up triton handler.py

* use transform functions for triton

* add TritonConfig

* get openai params for triton

* use triton embedding config

* test_completion_triton_generate_api

* test_completion_triton_infer_api

* fix TritonConfig doc string

* use TritonResponseIterator

* fix triton embeddings

* docs triton chat usage
2024-12-20 20:59:40 -08:00
Krish Dholakia
70a9ea99f2
Controll fallback prompts client-side (#7334)
* feat(router.py): support passing model-specific messages in fallbacks

* docs(routing.md): separate router timeouts into separate doc

allow for 1 fallbacks doc (across proxy/router)

* docs(routing.md): cleanup router docs

* docs(reliability.md): cleanup docs

* docs(reliability.md): cleaned up fallback doc

just have 1 doc across sdk/proxy

simplifies docs

* docs(reliability.md): add setting model-specific fallback prompts

* fix: fix linting errors

* test: skip test causing openai rate limit errros

* test: fix test

* test: run vertex test first to catch error
2024-12-20 19:09:53 -08:00
jravi-fireworks
f8cf11f6d5
Fix LiteLLM documentation (#7333)
Co-authored-by: Jetashree Ravi <jetashreeravi@Jetashrees-MBP.attlocal.net>
2024-12-20 15:04:23 -08:00
Krish Dholakia
27a4d08604
Litellm dev 2024 12 19 p3 (#7322)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 13s
* fix(utils.py): remove unsupported optional params (if drop_params=True) before passing into map openai params

Fixes https://github.com/BerriAI/litellm/issues/7242

* test: new test for langfuse prompt management hook

Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296

* feat(main.py): add 'get_chat_completion_prompt' customlogger hook

allows for langfuse prompt management

Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296

* feat(langfuse_prompt_management.py): working e2e langfuse prompt management

works with `langfuse/` route

* feat(main.py): initial tracing for dynamic langfuse params

allows admin to specify langfuse keys by model in model_list

* feat(main.py): support passing langfuse credentials dynamically

* fix(langfuse_prompt_management.py): create langfuse client based on dynamic callback params

allows dynamic langfuse params to work

* fix: fix linting errors

* docs(prompt_management.md): refactor docs for sdk + proxy prompt management tutorial

* docs(prompt_management.md): cleanup doc

* docs: cleanup topnav

* docs(prompt_management.md): update docs to be easier to use

* fix: remove unused imports

* docs(prompt_management.md): add architectural overview doc

* fix(litellm_logging.py): fix dynamic param passing

* fix(langfuse_prompt_management.py): fix linting errors

* fix: fix linting errors

* fix: use typing_extensions for typealias to ensure python3.8 compatibility

* test: use stream_options in test to account for tiktoken diff

* fix: improve import error message, and check run test earlier
2024-12-20 13:30:16 -08:00
Krrish Dholakia
834067c570 docs: refactor admin ui docs 2024-12-20 10:20:07 -08:00
Ishaan Jaff
35076212ef docs infinity rerank api docs 2024-12-19 18:51:55 -08:00
Ishaan Jaff
3a6dba8853 docs base rerank config 2024-12-19 18:33:32 -08:00
Ishaan Jaff
236561cdb8 docs add ref prs 2024-12-19 18:31:38 -08:00
Ishaan Jaff
617ac63d14
(feat) add infinity rerank models (#7321)
* Support Infinity Reranker (custom reranking models) (#7247)

* Support Infinity Reranker

* Clean code

* Included transformation.py

* Clean code

* Added Infinity reranker test

* Clean code

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* transform_rerank_response

* update handler.py

* infinity rerank updates

* ci/cd run again

* add infinity unit tests

* docs add instruction on how to add a new provider for rerank

---------

Co-authored-by: Hao Shan <53949959+haoshan98@users.noreply.github.com>
2024-12-19 18:30:28 -08:00
Ishaan Jaff
6261ec3599
(feat proxy) v2 - model max budgets (#7302)
* clean up unused code

* add _PROXY_VirtualKeyModelMaxBudgetLimiter

* adjust type imports

* working _PROXY_VirtualKeyModelMaxBudgetLimiter

* fix user_api_key_model_max_budget

* fix user_api_key_model_max_budget

* update naming

* update naming

* fix changes to RouterBudgetLimiting

* test_call_with_key_over_model_budget

* test_call_with_key_over_model_budget

* handle _get_request_model_budget_config

* e2e test for test_call_with_key_over_model_budget

* clean up test

* run ci/cd again

* add validate_model_max_budget

* docs fix

* update doc

* add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter

* test_unit_test_max_model_budget_limiter.py
2024-12-18 19:42:46 -08:00
Krish Dholakia
5253f639cd
fix(health.md): add rerank model health check information (#7295)
* fix(health.md): add rerank model health check information

* build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits

* build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true

* docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature

* fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini

* build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models

needed as o1-preview, and o1-mini models don't support 'system message

* fix(o1_transformation.py): translate system message based on if o1 model supports it

* fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview

o1 currently doesn't support streaming, but the other model versions do

Fixes https://github.com/BerriAI/litellm/issues/7292

* fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so

Fixes https://github.com/BerriAI/litellm/issues/7292

* fix: fix linting errors

* fix: update '_transform_messages'

* fix(o1_transformation.py): fix provider passed for supported param checks

* test(base_llm_unit_tests.py): skip test if api takes >5s to respond

* fix(utils.py): return false in 'supports_factory' if can't find value

* fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1

* feat(openai.py): support stream faking natively in openai handler

Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview

 Fixes https://github.com/BerriAI/litellm/issues/7292

* fix(openai.py): use inference param instead of original optional param
2024-12-18 19:18:10 -08:00
Krish Dholakia
6a45ee1ef7
fix(hosted_vllm/transformation.py): return fake api key, if none give… (#7301)
* fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error

Fixes https://github.com/BerriAI/litellm/issues/7291

* test: fix test

* fix(main.py): add hosted_vllm/ support for embeddings endpoint

Closes https://github.com/BerriAI/litellm/issues/7290

* docs(vllm.md): add docs on vllm embeddings usage

* fix(__init__.py): fix sambanova model test

* fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond
2024-12-18 18:41:53 -08:00
Ishaan Jaff
246e3bafc8
(feat - proxy) Add status_code to litellm_proxy_total_requests_metric_total (#7293)
* fix _select_model_name_for_cost_calc docstring

* add STATUS_CODE  to prometheus

* test prometheus unit tests

* test_prometheus_unit_tests.py

* update Proxy Level Tracking Metrics docs

* fix test_proxy_failure_metrics

* fix test_proxy_failure_metrics
2024-12-18 15:55:02 -08:00
Krish Dholakia
2f08341a08
Litellm dev readd prompt caching (#7299)
* fix(router.py): re-add saving model id on prompt caching valid successful deployment

* fix(router.py): introduce optional pre_call_checks

isolate prompt caching logic in a separate file

* fix(prompt_caching_deployment_check.py): fix import

* fix(router.py): new 'async_filter_deployments' event hook

allows custom logger to filter deployments returned to routing strategy

* feat(prompt_caching_deployment_check.py): initial working commit of prompt caching based routing

* fix(cooldown_callbacks.py): fix linting error

* fix(budget_limiter.py): move budget logger to async_filter_deployment hook

* test: add unit test

* test(test_router_helper_utils.py): add unit testing

* fix(budget_limiter.py): fix linting errors

* docs(config_settings.md): add 'optional_pre_call_checks' to router_settings param docs
2024-12-18 15:13:49 -08:00
Ishaan Jaff
523beedb4c update docs 2024-12-18 11:16:30 -08:00
Ishaan Jaff
9c30387e66 tag budgets fixes 2024-12-18 10:28:37 -08:00
Ishaan Jaff
f3c546b79e
(feat) proxy Azure Blob Storage - Add support for AZURE_STORAGE_ACCOUNT_KEY Auth (#7280)
* add upload_to_azure_data_lake_with_azure_account_key

* async_upload_payload_to_azure_blob_storage

* docs add AZURE_STORAGE_ACCOUNT_KEY

* add azure-storage-file-datalake
2024-12-17 17:35:45 -08:00
Krish Dholakia
cd5bdfcb7a
docs(input.md): document 'extra_headers' param support (#7268)
* docs(input.md): document 'extra_headers' param support

* fix: #7239 to move Nova topK parameter to `additionalModelRequestFields` (#7240)

Co-authored-by: Ryan Hoium <rhoium>

---------

Co-authored-by: ryanh-ai <3118399+ryanh-ai@users.noreply.github.com>
2024-12-17 07:19:14 -08:00
Ishaan Jaff
f3b13a9af3
(feat) Add Bedrock knowledge base pass through endpoints (#7267)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 56s
* bugfix: Proxy Routing for Bedrock Knowledgebase URLs are incorrect (#7097)

* Fixing routing bug where bedrock knowledgebase urls were being generated incorrectly

* Preparing for PR

* Preparing for PR

* Preparing for PR

---------

Co-authored-by: Luke Birk <lb0737@att.com>

* fix _is_bedrock_agent_runtime_route

* docs - Query Knowledge Base

* test_is_bedrock_agent_runtime_route

* fix bedrock_proxy_route

---------

Co-authored-by: LBirk <2731718+LBirk@users.noreply.github.com>
Co-authored-by: Luke Birk <lb0737@att.com>
2024-12-16 22:19:34 -08:00
Ishaan Jaff
3c984ed60e
(feat) Add Azure Blob Storage Logging Integration (#7265)
* add path to http handler

* AzureBlobStorageLogger

* test_azure_blob_storage

* use constants for Azure storage

* use helper get_azure_ad_token_from_entrata_id

* azure blob storage support

* get_azure_ad_token_from_azure_storage

* fix import

* azure logging

* docs azure storage

* add docs on azure blobs

* add premium user check

* add azure_storage  as identified logging callback

* async_upload_payload_to_azure_blob_storage

* docs azure storage

* callback_class_str_to_classType
2024-12-16 22:18:22 -08:00
Ishaan Jaff
d1124a736c docs add response format on main pages
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 46s
2024-12-16 08:41:12 -08:00
Ishaan Jaff
5a4b6cd9f5 docs update 2024-12-16 08:06:06 -08:00
Ishaan Jaff
7103198805
(feat) Add Tag-based budgets on litellm router / proxy (#7236)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 46s
* add BudgetConfig

* add _get_tags_from_request_kwargs

* test_tag_budgets_e2e_test_expect_to_fail

* add a check for request tags

* fix _async_get_cache_keys_for_router_budget_limiting

* fix test

* fix _sync_in_memory_spend_with_redis

* _async_get_cache_keys_for_router_budget_limiting

* fix _init_tag_budgets

* fix type casting

* docs show error for tag budget limit hit

* fix _get_tags_from_request_kwargs

* fix undo change
2024-12-14 17:28:36 -08:00
Krish Dholakia
ec36353b41
fix(main.py): fix retries being multiplied when using openai sdk (#7221)
* fix(main.py): fix retries being multiplied when using openai sdk

Closes https://github.com/BerriAI/litellm/pull/7130

* docs(prompt_management.md): add langfuse prompt management doc

* feat(team_endpoints.py): allow teams to add their own models

Enables teams to call their own finetuned models via the proxy

* test: add better enforcement check testing for `/model/new` now that teams can add their own models

* docs(team_model_add.md): tutorial for allowing teams to add their own models

* test: fix test
2024-12-14 11:56:55 -08:00
Ishaan Jaff
163529b40b
(feat - Router / Proxy ) Allow setting budget limits per LLM deployment (#7220)
* fix test_deployment_budget_limits_e2e_test

* refactor async_log_success_event to track spend for provider + deployment

* fix format

* rename class to RouterBudgetLimiting

* rename func

* rename types used for budgets

* add new types for deployment budgets

* add budget limits for deployments

* fix checking budgets set for provider

* update file names

* fix linting error

* _track_provider_remaining_budget_prometheus

* async_filter_deployments

* fix model list passed to router

* update error

* test_deployment_budgets_e2e_test_expect_to_fail

* fix test case

* run deployment budget limits
2024-12-13 19:15:51 -08:00
Krish Dholakia
b150faff90
Litellm dev 12 13 2024 p1 (#7219)
* fix(litellm_logging.py): pass user metadata to langsmith on sdk calls

* fix(litellm_logging.py): pass nested user metadata to logging integration - e.g. langsmith

* fix(exception_mapping_utils.py): catch and clarify watsonx `/text/chat` endpoint not supported error message.

Closes https://github.com/BerriAI/litellm/issues/7213

* fix(watsonx/common_utils.py): accept new 'WATSONX_IAM_URL' env var

allows user to use local watsonx

Fixes https://github.com/BerriAI/litellm/issues/4991

* fix(litellm_logging.py): cleanup unused function

* test: skip bad ibm test
2024-12-13 19:01:28 -08:00
Ishaan Jaff
621c713400
(docs) Document StandardLoggingPayload Spec (#7201)
* add slp spec to docs

* docs slp

* test slp enforcement
2024-12-12 14:00:42 -08:00