litellm-mirror

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-27 11:43:54 +00:00

Author	SHA1	Message	Date
Ishaan Jaff	62753eea69	✨ (Feat) Log Guardrails run, guardrail response on logging integrations (#7445 ) * add guardrail_information to SLP * use standard_logging_guardrail_information * track StandardLoggingGuardrailInformation * use log_guardrail_information * use log_guardrail_information * docs guardrails * docs guardrails * update quick start * fix presidio logging for sync functions * update Guardrail type * enforce add_standard_logging_guardrail_information_to_request_data * update gd docs	2024-12-27 15:01:56 -08:00
Krrish Dholakia	37f998171b	docs(index.md): new release notes All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 39s Details	2024-12-26 22:01:29 -08:00
Krish Dholakia	9d82ff4793	Litellm dev 12 26 2024 p3 (#7434 ) * build(model_prices_and_context_window.json): update groq models to specify 'supports_vision' parameter Closes https://github.com/BerriAI/litellm/issues/7433 * docs(groq.md): add groq vision example to docs Closes https://github.com/BerriAI/litellm/issues/7433 * fix(prometheus.py): refactor self.litellm_proxy_failed_requests_metric to use label factory * feat(prometheus.py): new 'litellm_proxy_failed_requests_by_tag_metric' allows tracking failed requests by tag on proxy * fix(prometheus.py): fix exception logging * feat(prometheus.py): add new 'litellm_request_total_latency_by_tag_metric' enables tracking latency by use-case * feat(prometheus.py): add new llm api latency by tag metric * feat(prometheus.py): new litellm_deployment_latency_per_output_token_by_tag metric allows tracking deployment latency by tag * fix(prometheus.py): refactor 'litellm_requests_metric' to use enum values + label factory * feat(prometheus.py): new litellm_proxy_total_requests_by_tag metric allows tracking total requests by tag * feat(prometheus.py): new metric litellm_deployment_successful_fallbacks_by_tag allows tracking deployment fallbacks by tag * fix(prometheus.py): new 'litellm_deployment_failed_fallbacks_by_tag' metric allows tracking failed fallbacks on deployment by custom tag * test: fix test * test: rename test to run earlier * test: skip flaky test	2024-12-26 21:21:16 -08:00
Krish Dholakia	539f166166	Support budget/rate limit tiers for keys (#7429 ) * feat(proxy/utils.py): get associated litellm budget from db in combined_view for key allows user to create rate limit tiers and associate those to keys * feat(proxy/_types.py): update the value of key-level tpm/rpm/model max budget metrics with the associated budget table values if set allows rate limit tiers to be easily applied to keys * docs(rate_limit_tiers.md): add doc on setting rate limit / budget tiers make feature discoverable * feat(key_management_endpoints.py): return litellm_budget_table value in key generate make it easy for user to know associated budget on key creation * fix(key_management_endpoints.py): document 'budget_id' param in `/key/generate` * docs(key_management_endpoints.py): document budget_id usage * refactor(budget_management_endpoints.py): refactor budget endpoints into separate file - makes it easier to run documentation testing against it * docs(test_api_docs.py): add budget endpoints to ci/cd doc test + add missing param info to docs * fix(customer_endpoints.py): use new pydantic obj name * docs(user_management_heirarchy.md): add simple doc explaining teams/keys/org/users on litellm * Litellm dev 12 26 2024 p2 (#7432) * (Feat) Add logging for `POST v1/fine_tuning/jobs` (#7426) * init commit ft jobs logging * add ft logging * add logging for FineTuningJob * simple FT Job create test * (docs) - show all supported Azure OpenAI endpoints in overview (#7428) * azure batches * update doc * docs azure endpoints * docs endpoints on azure * docs azure batches api * docs azure batches api * fix(key_management_endpoints.py): fix key update to actually work * test(test_key_management.py): add e2e test asserting ui key update call works * fix: proxy/_types - fix linting erros * test: update test --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * fix: test * fix(parallel_request_limiter.py): enforce tpm/rpm limits on key from tiers * fix: fix linting errors * test: fix test * fix: remove unused import * test: update test * docs(customer_endpoints.py): document new model_max_budget param * test: specify unique key alias * docs(budget_management_endpoints.py): document new model_max_budget param * test: fix test * test: fix tests --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>	2024-12-26 19:05:27 -08:00
Ishaan Jaff	12c4e7e695	docs guardrail params (#7430 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 36s Details	2024-12-26 11:08:47 -08:00
Ishaan Jaff	c1d6c35aef	(docs) - show all supported Azure OpenAI endpoints in overview (#7428 ) * azure batches * update doc * docs azure endpoints * docs endpoints on azure * docs azure batches api * docs azure batches api	2024-12-26 09:01:41 -08:00
Krrish Dholakia	2dcde8ce2b	docs: cleanup doc All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 12s Details	2024-12-25 21:23:42 -08:00
Krrish Dholakia	506b6f6517	docs(fireworks_ai.md): add audio transcription to fireworks ai doc	2024-12-25 21:22:51 -08:00
Ishaan Jaff	005f2fa1aa	docs - batches cost tracking (#7422 )	2024-12-25 20:13:26 -08:00
Krish Dholakia	760328b6ad	Litellm dev 12 25 2025 p2 (#7420 ) * test: add new test image embedding to base llm unit tests Addresses https://github.com/BerriAI/litellm/issues/6515 * fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings Fix https://github.com/BerriAI/litellm/issues/6515 * feat: initial commit for fireworks ai audio transcription support Relevant issue: https://github.com/BerriAI/litellm/issues/7134 * test: initial fireworks ai test * feat(fireworks_ai/): implemented fireworks ai audio transcription config * fix(utils.py): register fireworks ai audio transcription config, in config manager * fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription' * refactor(fireworks_ai/): define text completion route with model name handling moves model name handling to specific fireworks routes, as required by their api * refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing * fix: fix linting errors * fix: fix linting errors * fix: fix linting errors * fix: fix linting errors * fix(handler.py): fix linting errors * fix(main.py): fix tgai text completion route * refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request * refactor: move test_fine_tuning_api out of local_testing reduces local testing ci/cd time	2024-12-25 18:35:34 -08:00
Ishaan Jaff	157810fcbf	fix docs warning (#7419 )	2024-12-25 16:42:14 -08:00
Ishaan Jaff	0ce5f9fe58	(feat) Support Dynamic Params for `guardrails` (#7415 ) * update CustomGuardrail * unit test custom guardrails * add dynamic params for aporia * add dynamic params to bedrock guard * add dynamic params for all guardrails * fix linting * fix should_run_guardrail * _validate_premium_user * update guardrail doc * doc update * update code q * should_run_guardrail	2024-12-25 16:07:29 -08:00
Ishaan Jaff	77fa751639	update docs base docker	2024-12-25 15:51:19 -08:00
Ishaan Jaff	c6ca835046	docs files api All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 11s Details	2024-12-24 20:46:43 -08:00
Ishaan Jaff	47e12802df	(feat) `/batches` Add support for using `/batches` endpoints in OAI format (#7402 ) * run azure testing on ci/cd * update docs on azure batches endpoints * add input azure.jsonl * refactor - use separate file for batches endpoints * fixes for passing custom llm provider to /batch endpoints * pass custom llm provider to files endpoints * update azure batches doc * add info for azure batches api * update batches endpoints * use simple helper for raising proxy exception * update config.yml * fix imports * update tests * use existing settings * update env var used * update configs * update config.yml * update ft testing	2024-12-24 16:58:05 -08:00
Krish Dholakia	c3edfc2c92	LiteLLM Minor Fixes & Improvements (12/23/2024) - p3 (#7394 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 35s Details * build(model_prices_and_context_window.json): add gemini-1.5-flash context caching * fix(context_caching/transformation.py): just use last identified cache point Fixes https://github.com/BerriAI/litellm/issues/6738 * fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google Fixes https://github.com/BerriAI/litellm/issues/6738 * fix(vertex_ai/gemini/): track context caching tokens * refactor(gemini/): place transformation.py inside `chat/` folder make it easy for user to know we support the equivalent endpoint * fix: fix import * refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder make it easier to see cost calculation logic * fix: fix linting errors * fix: fix circular import * feat(gemini/cost_calculator.py): support gemini context caching cost calculation generifies anthropic's cost calculation function and uses it across anthropic + gemini * build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching Closes https://github.com/BerriAI/litellm/issues/6891 * docs(gemini.md): add gemini context caching architecture diagram make it easier for user to understand how context caching works * docs(gemini.md): link to relevant gemini context caching code * docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code * fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario * fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation * test: fix test	2024-12-23 22:02:52 -08:00
Ishaan Jaff	442d309bcd	update release notes	2024-12-23 21:48:33 -08:00
Ishaan Jaff	f4a2143b82	update release notes	2024-12-23 21:43:47 -08:00
Ishaan Jaff	1f466ec9cf	release notes	2024-12-23 21:38:56 -08:00
Ishaan Jaff	957fbc6dfb	docs batches	2024-12-23 21:24:06 -08:00
Ishaan Jaff	43077a88d5	docs add files to supported endpoints	2024-12-23 20:51:34 -08:00
Krish Dholakia	48316520f4	LiteLLM Minor Fixes & Improvements (12/23/2024) - P2 (#7386 ) * fix(main.py): support 'mock_timeout=true' param allows mock requests on proxy to have a time delay, for testing * fix(main.py): ensure mock timeouts raise litellm.Timeout error triggers retry/fallbacks * fix: fix fallback + mock timeout testing * fix(router.py): always return remaining tpm/rpm limits, if limits are known allows for rate limit headers to be guaranteed * docs(timeout.md): add docs on mock timeout = true * fix(main.py): fix linting errors * test: fix test	2024-12-23 17:41:27 -08:00
Krish Dholakia	db59e08958	Litellm dev 12 23 2024 p1 (#7383 ) * feat(guardrails_endpoint.py): new `/guardrails/list` endpoint Allow users to view what the available guardrails are * docs: document new `/guardrails/list` endpoint * docs(enterprise.md): update docs * fix(openai/transcription/handler.py): support cost tracking on vtt + srt formats * fix(openai/transcriptions/handler.py): default to 'verbose_json' response format if 'text' or 'json' response_format received. ensures 'duration' param is received for all audio transcription requests * fix: fix linting errors * fix: remove unused import	2024-12-23 16:33:31 -08:00
Krish Dholakia	3671829e39	Complete 'requests' library removal (#7350 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 12s Details * refactor: initial commit moving watsonx_text to base_llm_http_handler + clarifying new provider directory structure * refactor(watsonx/completion/handler.py): move to using base llm http handler removes 'requests' library usage * fix(watsonx_text/transformation.py): fix result transformation migrates to transformation.py, for usage with base llm http handler * fix(streaming_handler.py): migrate watsonx streaming to transformation.py ensures streaming works with base llm http handler * fix(streaming_handler.py): fix streaming linting errors and remove watsonx conditional logic * fix(watsonx/): fix chat route post completion route refactor * refactor(watsonx/embed): refactor watsonx to use base llm http handler for embedding calls as well * refactor(base.py): remove requests library usage from litellm * build(pyproject.toml): remove requests library usage * fix: fix linting errors * fix: fix linting errors * fix(types/utils.py): fix validation errors for modelresponsestream * fix(replicate/handler.py): fix linting errors * fix(litellm_logging.py): handle modelresponsestream object * fix(streaming_handler.py): fix modelresponsestream args * fix: remove unused imports * test: fix test * fix: fix test * test: fix test * test: fix tests * test: fix test * test: fix patch target * test: fix test	2024-12-22 07:21:25 -08:00
Ishaan Jaff	8b1ea40e7b	add img to release notes All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 36s Details	2024-12-21 21:24:16 -08:00
Ishaan Jaff	bb801ad3d9	update release notes	2024-12-21 21:20:13 -08:00
Krish Dholakia	1aed988c11	Litellm docs update (#7365 ) * docs: fix release notes url + fix urlk * docs(index.md): add link to github releases * docs(index.md): add linkedin social url to release notes	2024-12-21 21:09:50 -08:00
Ishaan Jaff	31f5dcf8e5	docs	2024-12-21 20:52:39 -08:00
Ishaan Jaff	291229ee46	docs add 1.55.8 changelog	2024-12-21 20:51:39 -08:00
Ishaan Jaff	5a8f67c171	release notes v1.55.8	2024-12-21 20:31:54 -08:00
Ishaan Jaff	a3e732de39	(chore) - enforce model budgets on virtual keys as enterprise feature (#7353 ) * docs - enforce model budget as enterprise feature * docs link to correct place	2024-12-21 14:18:53 -08:00
Ishaan Jaff	6107f9f3f3	[Bug fix ]: Triton /infer handler incompatible with batch responses (#7337 ) * migrate triton to base llm http handler * clean up triton handler.py * use transform functions for triton * add TritonConfig * get openai params for triton * use triton embedding config * test_completion_triton_generate_api * test_completion_triton_infer_api * fix TritonConfig doc string * use TritonResponseIterator * fix triton embeddings * docs triton chat usage	2024-12-20 20:59:40 -08:00
Krish Dholakia	70a9ea99f2	Controll fallback prompts client-side (#7334 ) * feat(router.py): support passing model-specific messages in fallbacks * docs(routing.md): separate router timeouts into separate doc allow for 1 fallbacks doc (across proxy/router) * docs(routing.md): cleanup router docs * docs(reliability.md): cleanup docs * docs(reliability.md): cleaned up fallback doc just have 1 doc across sdk/proxy simplifies docs * docs(reliability.md): add setting model-specific fallback prompts * fix: fix linting errors * test: skip test causing openai rate limit errros * test: fix test * test: run vertex test first to catch error	2024-12-20 19:09:53 -08:00
jravi-fireworks	f8cf11f6d5	Fix LiteLLM documentation (#7333 ) Co-authored-by: Jetashree Ravi <jetashreeravi@Jetashrees-MBP.attlocal.net>	2024-12-20 15:04:23 -08:00
Krish Dholakia	27a4d08604	Litellm dev 2024 12 19 p3 (#7322 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 13s Details * fix(utils.py): remove unsupported optional params (if drop_params=True) before passing into map openai params Fixes https://github.com/BerriAI/litellm/issues/7242 * test: new test for langfuse prompt management hook Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296 * feat(main.py): add 'get_chat_completion_prompt' customlogger hook allows for langfuse prompt management Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296 * feat(langfuse_prompt_management.py): working e2e langfuse prompt management works with `langfuse/` route * feat(main.py): initial tracing for dynamic langfuse params allows admin to specify langfuse keys by model in model_list * feat(main.py): support passing langfuse credentials dynamically * fix(langfuse_prompt_management.py): create langfuse client based on dynamic callback params allows dynamic langfuse params to work * fix: fix linting errors * docs(prompt_management.md): refactor docs for sdk + proxy prompt management tutorial * docs(prompt_management.md): cleanup doc * docs: cleanup topnav * docs(prompt_management.md): update docs to be easier to use * fix: remove unused imports * docs(prompt_management.md): add architectural overview doc * fix(litellm_logging.py): fix dynamic param passing * fix(langfuse_prompt_management.py): fix linting errors * fix: fix linting errors * fix: use typing_extensions for typealias to ensure python3.8 compatibility * test: use stream_options in test to account for tiktoken diff * fix: improve import error message, and check run test earlier	2024-12-20 13:30:16 -08:00
Krrish Dholakia	834067c570	docs: refactor admin ui docs	2024-12-20 10:20:07 -08:00
Ishaan Jaff	35076212ef	docs infinity rerank api docs	2024-12-19 18:51:55 -08:00
Ishaan Jaff	3a6dba8853	docs base rerank config	2024-12-19 18:33:32 -08:00
Ishaan Jaff	236561cdb8	docs add ref prs	2024-12-19 18:31:38 -08:00
Ishaan Jaff	617ac63d14	(feat) add infinity rerank models (#7321 ) * Support Infinity Reranker (custom reranking models) (#7247) * Support Infinity Reranker * Clean code * Included transformation.py * Clean code * Added Infinity reranker test * Clean code --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * transform_rerank_response * update handler.py * infinity rerank updates * ci/cd run again * add infinity unit tests * docs add instruction on how to add a new provider for rerank --------- Co-authored-by: Hao Shan <53949959+haoshan98@users.noreply.github.com>	2024-12-19 18:30:28 -08:00
Ishaan Jaff	6261ec3599	(feat proxy) v2 - model max budgets (#7302 ) * clean up unused code * add _PROXY_VirtualKeyModelMaxBudgetLimiter * adjust type imports * working _PROXY_VirtualKeyModelMaxBudgetLimiter * fix user_api_key_model_max_budget * fix user_api_key_model_max_budget * update naming * update naming * fix changes to RouterBudgetLimiting * test_call_with_key_over_model_budget * test_call_with_key_over_model_budget * handle _get_request_model_budget_config * e2e test for test_call_with_key_over_model_budget * clean up test * run ci/cd again * add validate_model_max_budget * docs fix * update doc * add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter * test_unit_test_max_model_budget_limiter.py	2024-12-18 19:42:46 -08:00
Krish Dholakia	5253f639cd	fix(health.md): add rerank model health check information (#7295 ) * fix(health.md): add rerank model health check information * build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits * build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true * docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature * fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini * build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models needed as o1-preview, and o1-mini models don't support 'system message * fix(o1_transformation.py): translate system message based on if o1 model supports it * fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview o1 currently doesn't support streaming, but the other model versions do Fixes https://github.com/BerriAI/litellm/issues/7292 * fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so Fixes https://github.com/BerriAI/litellm/issues/7292 * fix: fix linting errors * fix: update '_transform_messages' * fix(o1_transformation.py): fix provider passed for supported param checks * test(base_llm_unit_tests.py): skip test if api takes >5s to respond * fix(utils.py): return false in 'supports_factory' if can't find value * fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1 * feat(openai.py): support stream faking natively in openai handler Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview Fixes https://github.com/BerriAI/litellm/issues/7292 * fix(openai.py): use inference param instead of original optional param	2024-12-18 19:18:10 -08:00
Krish Dholakia	6a45ee1ef7	fix(hosted_vllm/transformation.py): return fake api key, if none give… (#7301 ) * fix(hosted_vllm/transformation.py): return fake api key, if none give. Prevents httpx error Fixes https://github.com/BerriAI/litellm/issues/7291 * test: fix test * fix(main.py): add hosted_vllm/ support for embeddings endpoint Closes https://github.com/BerriAI/litellm/issues/7290 * docs(vllm.md): add docs on vllm embeddings usage * fix(__init__.py): fix sambanova model test * fix(base_llm_unit_tests.py): skip pydantic obj test if model takes >5s to respond	2024-12-18 18:41:53 -08:00
Ishaan Jaff	246e3bafc8	(feat - proxy) Add `status_code` to `litellm_proxy_total_requests_metric_total` (#7293 ) * fix _select_model_name_for_cost_calc docstring * add STATUS_CODE to prometheus * test prometheus unit tests * test_prometheus_unit_tests.py * update Proxy Level Tracking Metrics docs * fix test_proxy_failure_metrics * fix test_proxy_failure_metrics	2024-12-18 15:55:02 -08:00
Krish Dholakia	2f08341a08	Litellm dev readd prompt caching (#7299 ) * fix(router.py): re-add saving model id on prompt caching valid successful deployment * fix(router.py): introduce optional pre_call_checks isolate prompt caching logic in a separate file * fix(prompt_caching_deployment_check.py): fix import * fix(router.py): new 'async_filter_deployments' event hook allows custom logger to filter deployments returned to routing strategy * feat(prompt_caching_deployment_check.py): initial working commit of prompt caching based routing * fix(cooldown_callbacks.py): fix linting error * fix(budget_limiter.py): move budget logger to async_filter_deployment hook * test: add unit test * test(test_router_helper_utils.py): add unit testing * fix(budget_limiter.py): fix linting errors * docs(config_settings.md): add 'optional_pre_call_checks' to router_settings param docs	2024-12-18 15:13:49 -08:00
Ishaan Jaff	523beedb4c	update docs	2024-12-18 11:16:30 -08:00
Ishaan Jaff	9c30387e66	tag budgets fixes	2024-12-18 10:28:37 -08:00
Ishaan Jaff	f3c546b79e	(feat) proxy Azure Blob Storage - Add support for `AZURE_STORAGE_ACCOUNT_KEY` Auth (#7280 ) * add upload_to_azure_data_lake_with_azure_account_key * async_upload_payload_to_azure_blob_storage * docs add AZURE_STORAGE_ACCOUNT_KEY * add azure-storage-file-datalake	2024-12-17 17:35:45 -08:00
Krish Dholakia	cd5bdfcb7a	docs(input.md): document 'extra_headers' param support (#7268 ) * docs(input.md): document 'extra_headers' param support * fix: #7239 to move Nova topK parameter to `additionalModelRequestFields` (#7240) Co-authored-by: Ryan Hoium <rhoium> --------- Co-authored-by: ryanh-ai <3118399+ryanh-ai@users.noreply.github.com>	2024-12-17 07:19:14 -08:00
Ishaan Jaff	f3b13a9af3	(feat) Add Bedrock knowledge base pass through endpoints (#7267 ) All checks were successful Read Version from pyproject.toml / read-version (push) Successful in 56s Details * bugfix: Proxy Routing for Bedrock Knowledgebase URLs are incorrect (#7097) * Fixing routing bug where bedrock knowledgebase urls were being generated incorrectly * Preparing for PR * Preparing for PR * Preparing for PR --------- Co-authored-by: Luke Birk <lb0737@att.com> * fix _is_bedrock_agent_runtime_route * docs - Query Knowledge Base * test_is_bedrock_agent_runtime_route * fix bedrock_proxy_route --------- Co-authored-by: LBirk <2731718+LBirk@users.noreply.github.com> Co-authored-by: Luke Birk <lb0737@att.com>	2024-12-16 22:19:34 -08:00

... 8 9 10 11 12 ...

3381 commits