Commit graph

1901 commits

Author SHA1 Message Date
Krish Dholakia
6005450c8f
LiteLLM Minor Fixes & Improvements (10/09/2024) (#6139)
* fix(utils.py): don't return 'none' response headers

Fixes https://github.com/BerriAI/litellm/issues/6123

* fix(vertex_and_google_ai_studio_gemini.py): support parsing out additional properties and strict value for tool calls

Fixes https://github.com/BerriAI/litellm/issues/6136

* fix(cost_calculator.py): set default character value to none

Fixes https://github.com/BerriAI/litellm/issues/6133#issuecomment-2403290196

* fix(google.py): fix cost per token / cost per char conversion

Fixes https://github.com/BerriAI/litellm/issues/6133#issuecomment-2403370287

* build(model_prices_and_context_window.json): update gemini pricing

Fixes https://github.com/BerriAI/litellm/issues/6133

* build(model_prices_and_context_window.json): update gemini pricing

* fix(litellm_logging.py): fix streaming caching logging when 'turn_off_message_logging' enabled

Stores unredacted response in cache

* build(model_prices_and_context_window.json): update gemini-1.5-flash pricing

* fix(cost_calculator.py): fix default prompt_character count logic

Fixes error in gemini cost calculation

* fix(cost_calculator.py): fix cost calc for tts models
2024-10-10 00:42:11 -07:00
Krish Dholakia
9695c1af10
LiteLLM Minor Fixes & Improvements (10/08/2024) (#6119)
* refactor(cost_calculator.py): move error line to debug - https://github.com/BerriAI/litellm/issues/5683#issuecomment-2398599498

* fix(migrate-hidden-params-to-read-from-standard-logging-payload): Fixes https://github.com/BerriAI/litellm/issues/5546#issuecomment-2399994026

* fix(types/utils.py): mark weight as a litellm param

Fixes https://github.com/BerriAI/litellm/issues/5781

* feat(internal_user_endpoints.py): fix /user/info + show user max budget as default max budget

Fixes https://github.com/BerriAI/litellm/issues/6117

* feat: support returning team member budget in `/user/info`

Sets user max budget in team as max budget on ui

  Closes https://github.com/BerriAI/litellm/issues/6117

* bug fix for optional parameter passing to replicate (#6067)

Signed-off-by: Mandana Vaziri <mvaziri@us.ibm.com>

* fix(o1_transformation.py): handle o1 temperature=0

o1 doesn't support temp=0, allow admin to drop this param

* test: fix test

---------

Signed-off-by: Mandana Vaziri <mvaziri@us.ibm.com>
Co-authored-by: Mandana Vaziri <mvaziri@us.ibm.com>
2024-10-08 21:57:03 -07:00
Krish Dholakia
6729c9ca7f
LiteLLM Minor Fixes & Improvements (10/07/2024) (#6101)
* fix(utils.py): support dropping temperature param for azure o1 models

* fix(main.py): handle azure o1 streaming requests

o1 doesn't support streaming, fake it to ensure code works as expected

* feat(utils.py): expose `hosted_vllm/` endpoint, with tool handling for vllm

Fixes https://github.com/BerriAI/litellm/issues/6088

* refactor(internal_user_endpoints.py): cleanup unused params + update docstring

Closes https://github.com/BerriAI/litellm/issues/6100

* fix(main.py): expose custom image generation api support

Fixes https://github.com/BerriAI/litellm/issues/6097

* fix: fix linting errors

* docs(custom_llm_server.md): add docs on custom api for image gen calls

* fix(types/utils.py): handle dict type

* fix(types/utils.py): fix linting errors
2024-10-07 22:17:22 -07:00
Krish Dholakia
49d8b2be46
fix(utils.py): fix fix pydantic obj to schema creation for vertex en… (#6071)
* fix(utils.py): fix  fix pydantic obj to schema creation for vertex endpoints

Fixes https://github.com/BerriAI/litellm/issues/6027

* test(test_completion.pyu): skip test - avoid hitting gemini rate limits

* fix(common_utils.py): fix ruff linting error
2024-10-06 00:25:55 -04:00
Krish Dholakia
f2c0a31e3c
LiteLLM Minor Fixes & Improvements (10/05/2024) (#6083)
* docs(prompt_caching.md): add prompt caching cost calc example to docs

* docs(prompt_caching.md): add proxy examples to docs

* feat(utils.py): expose new helper `supports_prompt_caching()` to check if a model supports prompt caching

* docs(prompt_caching.md): add docs on checking model support for prompt caching

* build: fix invalid json
2024-10-05 18:59:11 -04:00
Krish Dholakia
fac3b2ee42
Add pyright to ci/cd + Fix remaining type-checking errors (#6082)
* fix: fix type-checking errors

* fix: fix additional type-checking errors

* fix: additional type-checking error fixes

* fix: fix additional type-checking errors

* fix: additional type-check fixes

* fix: fix all type-checking errors + add pyright to ci/cd

* fix: fix incorrect import

* ci(config.yml): use mypy on ci/cd

* fix: fix type-checking errors in utils.py

* fix: fix all type-checking errors on main.py

* fix: fix mypy linting errors

* fix(anthropic/cost_calculator.py): fix linting errors

* fix: fix mypy linting errors

* fix: fix linting errors
2024-10-05 17:04:00 -04:00
Ishaan Jaff
7267852511 linting error fix 2024-10-05 15:03:39 +05:30
Ishaan Jaff
3682f661d8
(feat) add cost tracking for OpenAI prompt caching (#6055)
* add cache_read_input_token_cost for prompt caching models

* add prompt caching for latest models

* add openai cost calculator

* add openai prompt caching test

* fix lint check

* add not on how usage._cache_read_input_tokens is used

* fix cost calc whisper openai

* use output_cost_per_second

* add input_cost_per_second
2024-10-05 14:20:15 +05:30
Ishaan Jaff
670ecda4e2
(fixes) gcs bucket key based logging (#6044)
* fixes for gcs bucket logging

* fix StandardCallbackDynamicParams

* fix - gcs logging when payload is not serializable

* add test_add_callback_via_key_litellm_pre_call_utils_gcs_bucket

* working success callbacks

* linting fixes

* fix linting error

* add type hints to functions

* fixes for dynamic success and failure logging

* fix for test_async_chat_openai_stream
2024-10-04 11:56:10 +05:30
Krish Dholakia
09f0c09ba4
fix(utils.py): return openai streaming prompt caching tokens (#6051)
* fix(utils.py): return openai streaming prompt caching tokens

Closes https://github.com/BerriAI/litellm/issues/6038

* fix(main.py): fix error in finish_reason updates
2024-10-03 22:20:13 -04:00
Ishaan Jaff
d92696a303
(feat) add nvidia nim embeddings (#6032)
* nvidia nim support embedding config

* add nvidia config in init

* nvidia nim embeddings

* docs nvidia nim embeddings

* docs embeddings on nvidia nim

* fix llm translation test
2024-10-03 17:12:14 +05:30
Ishaan Jaff
21e05a0f3e
(feat proxy) add key based logging for GCS bucket (#6031)
* init litellm langfuse / gcs credentials in litellm logging obj

* add gcs key based test

* rename vars

* save standard_callback_dynamic_params in model call details

* add working gcs bucket key based logging

* test_basic_gcs_logging_per_request

* linting fix

* add doc on gcs  bucket team based logging
2024-10-03 15:24:31 +05:30
Krish Dholakia
14165d3648
LiteLLM Minor Fixes & Improvements (10/02/2024) (#6023)
* feat(together_ai/completion): handle together ai completion calls

* fix: handle list of int / list of list of int for text completion calls

* fix(utils.py): check if base model in bedrock converse model list

Fixes https://github.com/BerriAI/litellm/issues/6003

* test(test_optional_params.py): add unit tests for bedrock optional param mapping

Fixes https://github.com/BerriAI/litellm/issues/6003

* feat(utils.py): enable passing dummy tool call for anthropic/bedrock calls if tool_use blocks exist

Fixes https://github.com/BerriAI/litellm/issues/5388

* fixed an issue with tool use of claude models with anthropic and bedrock (#6013)

* fix(utils.py): handle empty schema for anthropic/bedrock

Fixes https://github.com/BerriAI/litellm/issues/6012

* fix: fix linting errors

* fix: fix linting errors

* fix: fix linting errors

* fix(proxy_cli.py): fix import route for app + health checks path (#6026)

* (testing): Enable testing us.anthropic.claude-3-haiku-20240307-v1:0. (#6018)

* fix(proxy_cli.py): fix import route for app + health checks gettsburg.wav

Fixes https://github.com/BerriAI/litellm/issues/5999

---------

Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>

---------

Co-authored-by: Ved Patwardhan <54766411+vedpatwardhan@users.noreply.github.com>
Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
2024-10-02 22:00:28 -04:00
Krish Dholakia
d57be47b0f
Litellm ruff linting enforcement (#5992)
* ci(config.yml): add a 'check_code_quality' step

Addresses https://github.com/BerriAI/litellm/issues/5991

* ci(config.yml): check why circle ci doesn't pick up this test

* ci(config.yml): fix to run 'check_code_quality' tests

* fix(__init__.py): fix unprotected import

* fix(__init__.py): don't remove unused imports

* build(ruff.toml): update ruff.toml to ignore unused imports

* fix: fix: ruff + pyright - fix linting + type-checking errors

* fix: fix linting errors

* fix(lago.py): fix module init error

* fix: fix linting errors

* ci(config.yml): cd into correct dir for checks

* fix(proxy_server.py): fix linting error

* fix(utils.py): fix bare except

causes ruff linting errors

* fix: ruff - fix remaining linting errors

* fix(clickhouse.py): use standard logging object

* fix(__init__.py): fix unprotected import

* fix: ruff - fix linting errors

* fix: fix linting errors

* ci(config.yml): cleanup code qa step (formatting handled in local_testing)

* fix(_health_endpoints.py): fix ruff linting errors

* ci(config.yml): just use ruff in check_code_quality pipeline for now

* build(custom_guardrail.py): include missing file

* style(embedding_handler.py): fix ruff check
2024-10-01 19:44:20 -04:00
Krrish Dholakia
5fbcdd8b11 fix(utils.py): fix updating hidden params 2024-09-28 21:08:15 -07:00
Krrish Dholakia
b0eff0b84f fix(router.py): handle setting response headers during retries 2024-09-28 21:08:15 -07:00
Krrish Dholakia
d64e971d8c fix(azure): return response headers for sync embedding calls 2024-09-28 21:08:15 -07:00
Krrish Dholakia
55d7bc7f32 fix(utils.py): guarantee openai-compatible headers always exist in response
Fixes https://github.com/BerriAI/litellm/issues/5957
2024-09-28 21:08:15 -07:00
Krrish Dholakia
498e14ba59 fix(return-openai-compatible-headers): v0 is openai, azure, anthropic
Fixes https://github.com/BerriAI/litellm/issues/5957
2024-09-28 21:08:15 -07:00
Krish Dholakia
0b30e212da
LiteLLM Minor Fixes & Improvements (09/27/2024) (#5938)
* fix(langfuse.py): prevent double logging requester metadata

Fixes https://github.com/BerriAI/litellm/issues/5935

* build(model_prices_and_context_window.json): add mistral pixtral cost tracking

Closes https://github.com/BerriAI/litellm/issues/5837

* handle streaming for azure ai studio error

* [Perf Proxy] parallel request limiter - use one cache update call (#5932)

* fix parallel request limiter - use one cache update call

* ci/cd run again

* run ci/cd again

* use docker username password

* fix config.yml

* fix config

* fix config

* fix config.yml

* ci/cd run again

* use correct typing for batch set cache

* fix async_set_cache_pipeline

* fix only check user id tpm / rpm limits when limits set

* fix test_openai_azure_embedding_with_oidc_and_cf

* fix(groq/chat/transformation.py): Fixes https://github.com/BerriAI/litellm/issues/5839

* feat(anthropic/chat.py): return 'retry-after' headers from anthropic

Fixes https://github.com/BerriAI/litellm/issues/4387

* feat: raise validation error if message has tool calls without passing `tools` param for anthropic/bedrock

Closes https://github.com/BerriAI/litellm/issues/5747

* [Feature]#5940, add max_workers parameter for the batch_completion (#5947)

* handle streaming for azure ai studio error

* bump: version 1.48.2 → 1.48.3

* docs(data_security.md): add legal/compliance faq's

Make it easier for companies to use litellm

* docs: resolve imports

* [Feature]#5940, add max_workers parameter for the batch_completion method

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local>

* fix(converse_transformation.py): fix default message value

* fix(utils.py): fix get_model_info to handle finetuned models

Fixes issue for standard logging payloads, where model_map_value was null for finetuned openai models

* fix(litellm_pre_call_utils.py): add debug statement for data sent after updating with team/key callbacks

* fix: fix linting errors

* fix(anthropic/chat/handler.py): fix cache creation input tokens

* fix(exception_mapping_utils.py): fix missing imports

* fix(anthropic/chat/handler.py): fix usage block translation

* test: fix test

* test: fix tests

* style(types/utils.py): trigger new build

* test: fix test

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Jose Alberto Arango Sanchez <jose.arangos@udea.edu.co>
Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local>
2024-09-27 22:52:57 -07:00
Krish Dholakia
bd17424c4b
LiteLLM Minor Fixes & Improvements (09/26/2024) (#5925) (#5937)
* LiteLLM Minor Fixes & Improvements (09/26/2024)  (#5925)

* fix(litellm_logging.py): don't initialize prometheus_logger if non premium user

Prevents bad error messages in logs

Fixes https://github.com/BerriAI/litellm/issues/5897

* Add Support for Custom Providers in Vision and Function Call Utils (#5688)

* Add Support for Custom Providers in Vision and Function Call Utils Lookup

* Remove parallel function call due to missing model info param

* Add Unit Tests for Vision and Function Call Changes

* fix-#5920: set header value to string to fix "'int' object has no att… (#5922)

* LiteLLM Minor Fixes & Improvements (09/24/2024) (#5880)

* LiteLLM Minor Fixes & Improvements (09/23/2024)  (#5842)

* feat(auth_utils.py): enable admin to allow client-side credentials to be passed

Makes it easier for devs to experiment with finetuned fireworks ai models

* feat(router.py): allow setting configurable_clientside_auth_params for a model

Closes https://github.com/BerriAI/litellm/issues/5843

* build(model_prices_and_context_window.json): fix anthropic claude-3-5-sonnet max output token limit

Fixes https://github.com/BerriAI/litellm/issues/5850

* fix(azure_ai/): support content list for azure ai

Fixes https://github.com/BerriAI/litellm/issues/4237

* fix(litellm_logging.py): always set saved_cache_cost

Set to 0 by default

* fix(fireworks_ai/cost_calculator.py): add fireworks ai default pricing

handles calling 405b+ size models

* fix(slack_alerting.py): fix error alerting for failed spend tracking

Fixes regression with slack alerting error monitoring

* fix(vertex_and_google_ai_studio_gemini.py): handle gemini no candidates in streaming chunk error

* docs(bedrock.md): add llama3-1 models

* test: fix tests

* fix(azure_ai/chat): fix transformation for azure ai calls

* feat(azure_ai/embed): Add azure ai embeddings support

Closes https://github.com/BerriAI/litellm/issues/5861

* fix(azure_ai/embed): enable async embedding

* feat(azure_ai/embed): support azure ai multimodal embeddings

* fix(azure_ai/embed): support async multi modal embeddings

* feat(together_ai/embed): support together ai embedding calls

* feat(rerank/main.py): log source documents for rerank endpoints to langfuse

improves rerank endpoint logging

* fix(langfuse.py): support logging `/audio/speech` input to langfuse

* test(test_embedding.py): fix test

* test(test_completion_cost.py): fix helper util

* fix-#5920: set header value to string to fix "'int' object has no attribute 'encode'"

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* Revert "fix-#5920: set header value to string to fix "'int' object has no att…" (#5926)

This reverts commit a554ae2695.

* build(model_prices_and_context_window.json): add azure ai cohere rerank model pricing

Enables cost tracking for azure ai cohere rerank models

* fix(litellm_logging.py): fix debug log to be clearer

Closes https://github.com/BerriAI/litellm/issues/5909

* test(test_utils.py): fix test name

* fix(azure_ai/cost_calculator.py): support cost tracking for azure ai rerank models

* fix(azure_ai): fix azure ai base model cost tracking for rerank endpoints

* fix(converse_handler.py): support new llama 3-2 models

Fixes https://github.com/BerriAI/litellm/issues/5901

* fix(litellm_logging.py): ensure response is redacted for standard message logging

Fixes https://github.com/BerriAI/litellm/issues/5890#issuecomment-2378242360

* fix(cost_calculator.py): use 'get_model_info' for cohere rerank cost calculation

allows user to set custom cost for model

* fix(config.yml): fix docker hub auht

* build(config.yml): add docker auth to all tests

* fix(db/create_views.py): fix linting error

* fix(main.py): fix circular import

* fix(azure_ai/__init__.py): fix circular import

* fix(main.py): fix import

* fix: fix linting errors

* test: fix test

* fix(proxy_server.py): pass premium user value on startup

used for prometheus init

---------

Co-authored-by: Cole Murray <colemurray.cs@gmail.com>
Co-authored-by: bravomark <62681807+bravomark@users.noreply.github.com>

* handle streaming for azure ai studio error

* [Perf Proxy] parallel request limiter - use one cache update call (#5932)

* fix parallel request limiter - use one cache update call

* ci/cd run again

* run ci/cd again

* use docker username password

* fix config.yml

* fix config

* fix config

* fix config.yml

* ci/cd run again

* use correct typing for batch set cache

* fix async_set_cache_pipeline

* fix only check user id tpm / rpm limits when limits set

* fix test_openai_azure_embedding_with_oidc_and_cf

* test: fix test

* test(test_rerank.py): fix test

---------

Co-authored-by: Cole Murray <colemurray.cs@gmail.com>
Co-authored-by: bravomark <62681807+bravomark@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2024-09-27 17:54:13 -07:00
Krish Dholakia
a1d9e96b31
LiteLLM Minor Fixes & Improvements (09/25/2024) (#5893)
* fix(langfuse.py): support new langfuse prompt_chat class init params

* fix(langfuse.py): handle new init values on prompt chat + prompt text templates

fixes error caused during langfuse logging

* docs(openai_compatible.md): clarify `openai/` handles correct routing for `/v1/completions` route

Fixes https://github.com/BerriAI/litellm/issues/5876

* fix(utils.py): handle unmapped gemini model optional param translation

Fixes https://github.com/BerriAI/litellm/issues/5888

* fix(o1_transformation.py): fix o-1 validation, to not raise error if temperature=1

Fixes https://github.com/BerriAI/litellm/issues/5884

* fix(prisma_client.py): refresh iam token

Fixes https://github.com/BerriAI/litellm/issues/5896

* fix: pass drop params where required

* fix(utils.py): pass drop_params correctly

* fix(types/vertex_ai.py): fix generation config

* test(test_max_completion_tokens.py): fix test

* fix(vertex_and_google_ai_studio_gemini.py): fix map openai params
2024-09-26 16:41:44 -07:00
Krish Dholakia
16c0307eab
LiteLLM Minor Fixes & Improvements (09/24/2024) (#5880)
* LiteLLM Minor Fixes & Improvements (09/23/2024)  (#5842)

* feat(auth_utils.py): enable admin to allow client-side credentials to be passed

Makes it easier for devs to experiment with finetuned fireworks ai models

* feat(router.py): allow setting configurable_clientside_auth_params for a model

Closes https://github.com/BerriAI/litellm/issues/5843

* build(model_prices_and_context_window.json): fix anthropic claude-3-5-sonnet max output token limit

Fixes https://github.com/BerriAI/litellm/issues/5850

* fix(azure_ai/): support content list for azure ai

Fixes https://github.com/BerriAI/litellm/issues/4237

* fix(litellm_logging.py): always set saved_cache_cost

Set to 0 by default

* fix(fireworks_ai/cost_calculator.py): add fireworks ai default pricing

handles calling 405b+ size models

* fix(slack_alerting.py): fix error alerting for failed spend tracking

Fixes regression with slack alerting error monitoring

* fix(vertex_and_google_ai_studio_gemini.py): handle gemini no candidates in streaming chunk error

* docs(bedrock.md): add llama3-1 models

* test: fix tests

* fix(azure_ai/chat): fix transformation for azure ai calls

* feat(azure_ai/embed): Add azure ai embeddings support

Closes https://github.com/BerriAI/litellm/issues/5861

* fix(azure_ai/embed): enable async embedding

* feat(azure_ai/embed): support azure ai multimodal embeddings

* fix(azure_ai/embed): support async multi modal embeddings

* feat(together_ai/embed): support together ai embedding calls

* feat(rerank/main.py): log source documents for rerank endpoints to langfuse

improves rerank endpoint logging

* fix(langfuse.py): support logging `/audio/speech` input to langfuse

* test(test_embedding.py): fix test

* test(test_completion_cost.py): fix helper util
2024-09-25 22:11:57 -07:00
Krish Dholakia
8039b95aaf
LiteLLM Minor Fixes & Improvements (09/21/2024) (#5819)
* fix(router.py): fix error message

* Litellm disable keys (#5814)

* build(schema.prisma): allow blocking/unblocking keys

Fixes https://github.com/BerriAI/litellm/issues/5328

* fix(key_management_endpoints.py): fix pop

* feat(auth_checks.py): allow admin to enable/disable virtual keys

Closes https://github.com/BerriAI/litellm/issues/5328

* docs(vertex.md): add auth section for vertex ai

Addresses - https://github.com/BerriAI/litellm/issues/5768#issuecomment-2365284223

* build(model_prices_and_context_window.json): show which models support prompt_caching

Closes https://github.com/BerriAI/litellm/issues/5776

* fix(router.py): allow setting default priority for requests

* fix(router.py): add 'retry-after' header for concurrent request limit errors

Fixes https://github.com/BerriAI/litellm/issues/5783

* fix(router.py): correctly raise and use retry-after header from azure+openai

Fixes https://github.com/BerriAI/litellm/issues/5783

* fix(user_api_key_auth.py): fix valid token being none

* fix(auth_checks.py): fix model dump for cache management object

* fix(user_api_key_auth.py): pass prisma_client to obj

* test(test_otel.py): update test for new key check

* test: fix test
2024-09-21 18:51:53 -07:00
Ishaan Jaff
b987d99486 handle hf rate limit error 2024-09-21 18:00:18 -07:00
Ishaan Jaff
1d630b61ad
[Feat] Add fireworks AI embedding (#5812)
* add fireworks embedding models

* add fireworks ai

* fireworks ai embeddings support

* is_fireworks_embedding_model

* working fireworks embeddings

* fix health check * models

* fix embedding get optional params

* fix linting errors

* fix pick_cheapest_chat_model_from_llm_provider

* add fireworks ai litellm provider

* docs fireworks embedding models

* fixes for when azure ad token  is passed
2024-09-20 22:23:28 -07:00
Krish Dholakia
3933fba41f
LiteLLM Minor Fixes & Improvements (09/19/2024) (#5793)
* fix(model_prices_and_context_window.json): add cost tracking for more vertex llama3.1 model

8b and 70b models

* fix(proxy/utils.py): handle data being none on pre-call hooks

* fix(proxy/): create views on initial proxy startup

fixes base case, where user starts proxy for first time

 Fixes https://github.com/BerriAI/litellm/issues/5756

* build(config.yml): fix vertex version for test

* feat(ui/): support enabling/disabling slack alerting

Allows admin to turn on/off slack alerting through ui

* feat(rerank/main.py): support langfuse logging

* fix(proxy/utils.py): fix linting errors

* fix(langfuse.py): log clean metadata

* test(tests): replace deprecated openai model
2024-09-20 08:19:52 -07:00
Krish Dholakia
d46660ea0f
LiteLLM Minor Fixes & Improvements (09/18/2024) (#5772)
* fix(proxy_server.py): fix azure key vault logic to not require client id/secret

* feat(cost_calculator.py): support fireworks ai cost tracking

* build(docker-compose.yml): add lines for mounting config.yaml to docker compose

Closes https://github.com/BerriAI/litellm/issues/5739

* fix(input.md): update docs to clarify litellm supports content as a list of dictionaries

Fixes https://github.com/BerriAI/litellm/issues/5755

* fix(input.md): update input.md to include all message values

* fix(image_handling.py): follow image url redirects

Fixes https://github.com/BerriAI/litellm/issues/5763

* fix(router.py): Fix model key/base leak in error message

Fixes https://github.com/BerriAI/litellm/issues/5762

* fix(http_handler.py): fix linting error

* fix(azure.py): fix logging to show azure_ad_token being used

Fixes https://github.com/BerriAI/litellm/issues/5767

* fix(_redis.py): add redis sentinel support

Closes https://github.com/BerriAI/litellm/issues/4381

* feat(_redis.py): add redis sentinel support

Closes https://github.com/BerriAI/litellm/issues/4381

* test(test_completion_cost.py): fix test

* Databricks Integration: Integrate Databricks SDK as optional mechanism for fetching API base and token, if unspecified (#5746)

* LiteLLM Minor Fixes & Improvements (09/16/2024)  (#5723)

* coverage (#5713)

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* Move (#5714)

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* fix(litellm_logging.py): fix logging client re-init (#5710)

Fixes https://github.com/BerriAI/litellm/issues/5695

* fix(presidio.py): Fix logging_hook response and add support for additional presidio variables in guardrails config

Fixes https://github.com/BerriAI/litellm/issues/5682

* feat(o1_handler.py): fake streaming for openai o1 models

Fixes https://github.com/BerriAI/litellm/issues/5694

* docs: deprecated traceloop integration in favor of native otel (#5249)

* fix: fix linting errors

* fix: fix linting errors

* fix(main.py): fix o1 import

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>

* feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view (#5730)

* feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view

Supports having `MonthlyGlobalSpend` view be a material view, and exposes an endpoint to refresh it

* fix(custom_logger.py): reset calltype

* fix: fix linting errors

* fix: fix linting error

* fix

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* fix: fix import

* Fix

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* fix

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* DB test

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* Coverage

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* progress

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* fix

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* fix

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* fix

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* fix test name

Signed-off-by: dbczumar <corey.zumar@databricks.com>

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>

* test: fix test

* test(test_databricks.py): fix test

* fix(databricks/chat.py): handle custom endpoint (e.g. sagemaker)

* Apply code scanning fix for clear-text logging of sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* fix(__init__.py): fix known fireworks ai models

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2024-09-19 13:25:29 -07:00
Krish Dholakia
dd602753c0
Litellm fix router testing (#5748)
* test: fix testing - azure changed content policy error logic

* test: fix tests to use mock responses

* test(test_image_generation.py): handle api instability

* test(test_image_generation.py): handle azure api instability

* fix(utils.py): fix unbounded variable error

* fix(utils.py): fix unbounded variable error

* test: refactor test to use mock response

* test: mark flaky azure tests
2024-09-17 18:02:23 -07:00
Krish Dholakia
234185ec13
LiteLLM Minor Fixes & Improvements (09/16/2024) (#5723) (#5731)
* LiteLLM Minor Fixes & Improvements (09/16/2024)  (#5723)

* coverage (#5713)

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* Move (#5714)

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* fix(litellm_logging.py): fix logging client re-init (#5710)

Fixes https://github.com/BerriAI/litellm/issues/5695

* fix(presidio.py): Fix logging_hook response and add support for additional presidio variables in guardrails config

Fixes https://github.com/BerriAI/litellm/issues/5682

* feat(o1_handler.py): fake streaming for openai o1 models

Fixes https://github.com/BerriAI/litellm/issues/5694

* docs: deprecated traceloop integration in favor of native otel (#5249)

* fix: fix linting errors

* fix: fix linting errors

* fix(main.py): fix o1 import

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>

* feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view (#5730)

* feat(spend_management_endpoints.py): expose `/global/spend/refresh` endpoint for updating material view

Supports having `MonthlyGlobalSpend` view be a material view, and exposes an endpoint to refresh it

* fix(custom_logger.py): reset calltype

* fix: fix linting errors

* fix: fix linting error

* fix: fix import

* test(test_databricks.py): fix databricks tests

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
Co-authored-by: Nir Gazit <nirga@users.noreply.github.com>
2024-09-17 08:05:52 -07:00
Krish Dholakia
da77706c26
Litellm stable dev (#5711)
* feat(aws_base_llm.py): prevents recreating boto3 credentials during high traffic

Leads to 100ms perf boost in local testing

* fix(base_aws_llm.py): fix credential caching check to see if token is set

* refactor(bedrock/chat): separate converse api and invoke api + isolate converse api transformation logic

Make it easier to see how requests are transformed for /converse

* fix: fix imports

* fix(bedrock/embed): fix reordering of headers

* fix(base_aws_llm.py): fix get credential logic

* fix(converse_handler.py): fix ai21 streaming response
2024-09-14 23:22:59 -07:00
Ishaan Jaff
c8d15544c8
[Fix] Router cooldown logic - use % thresholds instead of allowed fails to cooldown deployments (#5698)
* move cooldown logic to it's own helper

* add new track deployment metrics folder

* increment success, fails for deployment in current minute

* fix cooldown logic

* fix test_aaarouter_dynamic_cooldown_message_retry_time

* fix test_single_deployment_no_cooldowns_test_prod_mock_completion_calls

* clean up get from deployment test

* fix _async_get_healthy_deployments

* add mock InternalServerError

* test deployment failing 25% requests

* add test_high_traffic_cooldowns_one_bad_deployment

* fix vertex load test

* add test for rate limit error models in cool down

* change default cooldown time

* fix cooldown message time

* fix cooldown on 429 error

* fix doc string for _should_cooldown_deployment

* fix sync cooldown logic router
2024-09-14 18:01:19 -07:00
Ishaan Jaff
85acdb9193
[Feat] Add max_completion_tokens param (#5691)
* add max_completion_tokens

* add max_completion_tokens

* add max_completion_tokens support for OpenAI models

* add max_completion_tokens param

* add max_completion_tokens for bedrock converse models

* add test for converse maxTokens

* fix openai o1 param mapping test

* move test optional params

* add max_completion_tokens for anthropic api

* fix conftest

* add max_completion tokens for vertex ai partner models

* add max_completion_tokens for fireworks ai

* add max_completion_tokens for hf rest api

* add test for param mapping

* add param mapping for vertex, gemini + testing

* predibase is the most unstable and unusable llm api in prod, can't handle our ci/cd

* add max_completion_tokens to openai supported params

* fix fireworks ai param mapping
2024-09-14 14:57:01 -07:00
Krish Dholakia
60709a0753
LiteLLM Minor Fixes and Improvements (09/13/2024) (#5689)
* refactor: cleanup unused variables + fix pyright errors

* feat(health_check.py): Closes https://github.com/BerriAI/litellm/issues/5686

* fix(o1_reasoning.py): add stricter check for o-1 reasoning model

* refactor(mistral/): make it easier to see mistral transformation logic

* fix(openai.py): fix openai o-1 model param mapping

Fixes https://github.com/BerriAI/litellm/issues/5685

* feat(main.py): infer finetuned gemini model from base model

Fixes https://github.com/BerriAI/litellm/issues/5678

* docs(vertex.md): update docs to call finetuned gemini models

* feat(proxy_server.py): allow admin to hide proxy model aliases

Closes https://github.com/BerriAI/litellm/issues/5692

* docs(load_balancing.md): add docs on hiding alias models from proxy config

* fix(base.py): don't raise notimplemented error

* fix(user_api_key_auth.py): fix model max budget check

* fix(router.py): fix elif

* fix(user_api_key_auth.py): don't set team_id to empty str

* fix(team_endpoints.py): fix response type

* test(test_completion.py): handle predibase error

* test(test_proxy_server.py): fix test

* fix(o1_transformation.py): fix max_completion_token mapping

* test(test_image_generation.py): mark flaky test
2024-09-14 10:02:55 -07:00
Ishaan Jaff
64a828b455
Merge pull request #5573 from BerriAI/litellm_add_rerank_spend_tracking
[Feat] Add cost tracking for cohere rerank
2024-09-06 18:11:30 -07:00
Krish Dholakia
72e961af3c
LiteLLM Minor Fixes and Improvements (08/06/2024) (#5567)
* fix(utils.py): return citations for perplexity streaming

Fixes https://github.com/BerriAI/litellm/issues/5535

* fix(anthropic/chat.py): support fallbacks for anthropic streaming (#5542)

* fix(anthropic/chat.py): support fallbacks for anthropic streaming

Fixes https://github.com/BerriAI/litellm/issues/5512

* fix(anthropic/chat.py): use module level http client if none given (prevents early client closure)

* fix: fix linting errors

* fix(http_handler.py): fix raise_for_status error handling

* test: retry flaky test

* fix otel type

* fix(bedrock/embed): fix error raising

* test(test_openai_batches_and_files.py): skip azure batches test (for now) quota exceeded

* fix(test_router.py): skip azure batch route test (for now) - hit batch quota limits

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* All `model_group_alias` should show up in `/models`, `/model/info` , `/model_group/info` (#5539)

* fix(router.py): support returning model_alias model names in `/v1/models`

* fix(proxy_server.py): support returning model alias'es on `/model/info`

* feat(router.py): support returning model group alias for `/model_group/info`

* fix(proxy_server.py): fix linting errors

* fix(proxy_server.py): fix linting errors

* build(model_prices_and_context_window.json): add amazon titan text premier pricing information

Closes https://github.com/BerriAI/litellm/issues/5560

* feat(litellm_logging.py): log standard logging response object for pass through endpoints. Allows bedrock /invoke agent calls to be correctly logged to langfuse + s3

* fix(success_handler.py): fix linting error

* fix(success_handler.py): fix linting errors

* fix(team_endpoints.py): Allows admin to update team member budgets

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2024-09-06 17:16:24 -07:00
Ishaan Jaff
1852e1cd9a basic cohere rerank logging 2024-09-06 15:18:16 -07:00
Ishaan Jaff
795db8672a fix log /audio to langfuse 2024-09-05 16:26:37 -07:00
Ishaan Jaff
25887c1846 fix import error 2024-09-05 10:09:44 -07:00
Ishaan Jaff
81ee1653af use correct type hints for audio transcriptions 2024-09-05 09:12:27 -07:00
Krish Dholakia
1e7e538261
LiteLLM Minor fixes + improvements (08/04/2024) (#5505)
* Minor IAM AWS OIDC Improvements (#5246)

* AWS IAM: Temporary tokens are valid across all regions after being issued, so it is wasteful to request one for each region.

* AWS IAM: Include an inline policy, to help reduce misuse of overly permissive IAM roles.

* (test_bedrock_completion.py): Ensure we are testing cross AWS region OIDC flow.

* fix(router.py): log rejected requests

Fixes https://github.com/BerriAI/litellm/issues/5498

* refactor: don't use verbose_logger.exception, if exception is raised

User might already have handling for this. But alerting systems in prod will raise this as an unhandled error.

* fix(datadog.py): support setting datadog source as an env var

Fixes https://github.com/BerriAI/litellm/issues/5508

* docs(logging.md): add dd_source to datadog docs

* fix(proxy_server.py): expose `/customer/list` endpoint for showing all customers

* (bedrock): Fix usage with Cloudflare AI Gateway, and proxies in general. (#5509)

* feat(anthropic.py): support 'cache_control' param for content when it is a string

* Revert "(bedrock): Fix usage with Cloudflare AI Gateway, and proxies in gener…" (#5519)

This reverts commit 3fac0349c2.

* refactor: ci/cd run again

---------

Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
2024-09-04 22:16:55 -07:00
Ishaan Jaff
9ed4dad5ac test get llm provider 2024-09-04 09:22:19 -07:00
Krish Dholakia
be3c7b401e
LiteLLM Minor fixes + improvements (08/03/2024) (#5488)
* fix(internal_user_endpoints.py): set budget_reset_at for /user/update

* fix(vertex_and_google_ai_studio_gemini.py): handle accumulated json

Fixes https://github.com/BerriAI/litellm/issues/5479

* fix(vertex_ai_and_gemini.py): fix assistant message function call when content is not None

Fixes https://github.com/BerriAI/litellm/issues/5490

* fix(proxy_server.py): generic state uuid for okta sso

* fix(lago.py): improve debug logs

Debugging for https://github.com/BerriAI/litellm/issues/5477

* docs(bedrock.md): add bedrock cross-region inferencing to docs

* fix(azure.py): return azure response headers on aembedding call

* feat(azure.py): return azure response headers for `/audio/transcription`

* fix(types/utils.py): standardize deepseek / anthropic prompt caching usage information

Closes https://github.com/BerriAI/litellm/issues/5285

* docs(usage.md): add docs on litellm usage object

* test(test_completion.py): mark flaky test
2024-09-03 21:21:34 -07:00
Ishaan Jaff
b0178a85cf refactor get_secret 2024-09-03 10:42:12 -07:00
Krish Dholakia
9f3fa29624
feat(router.py): Support Loadbalancing batch azure api endpoints (#5469)
* feat(router.py): initial commit for loadbalancing azure batch api endpoints

Closes https://github.com/BerriAI/litellm/issues/5396

* fix(router.py): working `router.acreate_file()`

* feat(router.py): working router.acreate_batch endpoint

* feat(router.py): expose router.aretrieve_batch function

Make it easy for user to retrieve the batch information

* feat(router.py): support 'router.alist_batches' endpoint

Adds support for getting all batches across all endpoints

* feat(router.py): working loadbalancing on `/v1/files`

* feat(proxy_server.py): working loadbalancing on `/v1/batches`

* feat(proxy_server.py): working loadbalancing on Retrieve + List batch
2024-09-02 21:32:55 -07:00
Ishaan Jaff
c1adb0b7f2
Merge branch 'main' into litellm_track_imagen_spend_logs 2024-09-02 21:21:15 -07:00
Ishaan Jaff
9fcab392a4 fix get llm provider for imagen 2024-09-02 17:49:36 -07:00
Ishaan Jaff
666dd6aca2
Merge pull request #5478 from BerriAI/litellm_Add_ai21
[Feat] Add AI21 /chat API
2024-09-02 16:20:37 -07:00
Krish Dholakia
f9e6507cd1
LiteLLM Minor Fixes + Improvements (#5474)
* feat(proxy/_types.py): add lago billing to callbacks ui

Closes https://github.com/BerriAI/litellm/issues/5472

* fix(anthropic.py): return anthropic prompt caching information

Fixes https://github.com/BerriAI/litellm/issues/5364

* feat(bedrock/chat.py): support 'json_schema' for bedrock models

Closes https://github.com/BerriAI/litellm/issues/5434

* fix(bedrock/embed/embeddings.py): support async embeddings for amazon titan models

* fix: linting fixes

* fix: handle key errors

* fix(bedrock/chat.py): fix bedrock ai21 streaming object

* feat(bedrock/embed): support bedrock embedding optional params

* fix(databricks.py): fix usage chunk

* fix(internal_user_endpoints.py): apply internal user defaults, if user role updated

Fixes issue where user update wouldn't apply defaults

* feat(slack_alerting.py): provide multiple slack channels for a given alert type

multiple channels might be interested in receiving an alert for a given type

* docs(alerting.md): add multiple channel alerting to docs
2024-09-02 14:29:57 -07:00
Ishaan Jaff
f28e94827c docs update ai21 doc 2024-09-02 12:26:53 -07:00