* fix move s3 to use customLogger
* add basic s3 logging test
* add s3 to custom logger compatible
* use batch logger for s3
* s3 set flush interval and batch size
* fix s3 logging
* add notes on s3 logging
* fix s3 logging
* add basic s3 logging test
* fix s3 type errors
* add test for sync logging on s3
* feat(azure/realtime): initial working commit for proxy azure openai realtime endpoint support
Adds support for passing /v1/realtime calls via litellm proxy
* feat(realtime_api/main.py): abstraction for handling openai realtime api calls
* feat(router.py): add `arealtime()` endpoint in router for realtime api calls
Allows using `model_list` in proxy for realtime as well
* fix: make realtime api a private function
Structure might change based on feedback. Make that clear to users.
* build(requirements.txt): add websockets to the requirements.txt
* feat(openai/realtime): add openai /v1/realtime api support
* nvidia nim support embedding config
* add nvidia config in init
* nvidia nim embeddings
* docs nvidia nim embeddings
* docs embeddings on nvidia nim
* fix llm translation test
* use vertex llm as base class for embeddings
* use correct vertex class in main.py
* set_headers in vertex llm base
* add types for vertex embedding requests
* add embedding handler for vertex
* use async mode for vertex embedding tests
* use vertexAI textEmbeddingConfig
* fix linting
* add sync and async mode testing for vertex ai embeddings
* LiteLLM Minor Fixes & Improvements (09/23/2024) (#5842)
* feat(auth_utils.py): enable admin to allow client-side credentials to be passed
Makes it easier for devs to experiment with finetuned fireworks ai models
* feat(router.py): allow setting configurable_clientside_auth_params for a model
Closes https://github.com/BerriAI/litellm/issues/5843
* build(model_prices_and_context_window.json): fix anthropic claude-3-5-sonnet max output token limit
Fixes https://github.com/BerriAI/litellm/issues/5850
* fix(azure_ai/): support content list for azure ai
Fixes https://github.com/BerriAI/litellm/issues/4237
* fix(litellm_logging.py): always set saved_cache_cost
Set to 0 by default
* fix(fireworks_ai/cost_calculator.py): add fireworks ai default pricing
handles calling 405b+ size models
* fix(slack_alerting.py): fix error alerting for failed spend tracking
Fixes regression with slack alerting error monitoring
* fix(vertex_and_google_ai_studio_gemini.py): handle gemini no candidates in streaming chunk error
* docs(bedrock.md): add llama3-1 models
* test: fix tests
* fix(azure_ai/chat): fix transformation for azure ai calls
* feat(azure_ai/embed): Add azure ai embeddings support
Closes https://github.com/BerriAI/litellm/issues/5861
* fix(azure_ai/embed): enable async embedding
* feat(azure_ai/embed): support azure ai multimodal embeddings
* fix(azure_ai/embed): support async multi modal embeddings
* feat(together_ai/embed): support together ai embedding calls
* feat(rerank/main.py): log source documents for rerank endpoints to langfuse
improves rerank endpoint logging
* fix(langfuse.py): support logging `/audio/speech` input to langfuse
* test(test_embedding.py): fix test
* test(test_completion_cost.py): fix helper util
* LiteLLM Minor Fixes & Improvements (09/23/2024) (#5842)
* feat(auth_utils.py): enable admin to allow client-side credentials to be passed
Makes it easier for devs to experiment with finetuned fireworks ai models
* feat(router.py): allow setting configurable_clientside_auth_params for a model
Closes https://github.com/BerriAI/litellm/issues/5843
* build(model_prices_and_context_window.json): fix anthropic claude-3-5-sonnet max output token limit
Fixes https://github.com/BerriAI/litellm/issues/5850
* fix(azure_ai/): support content list for azure ai
Fixes https://github.com/BerriAI/litellm/issues/4237
* fix(litellm_logging.py): always set saved_cache_cost
Set to 0 by default
* fix(fireworks_ai/cost_calculator.py): add fireworks ai default pricing
handles calling 405b+ size models
* fix(slack_alerting.py): fix error alerting for failed spend tracking
Fixes regression with slack alerting error monitoring
* fix(vertex_and_google_ai_studio_gemini.py): handle gemini no candidates in streaming chunk error
* docs(bedrock.md): add llama3-1 models
* test: fix tests
* fix(azure_ai/chat): fix transformation for azure ai calls
* use /user/list endpoint on admin ui
* sso insert user with role when user does not exist
* add sso sign in test
* linting fix
* rename self serve doc
* add doc for self serve flow
* test - sso sign in default values
* add test for /user/list endpoint
* fix(vertex_llm_base.py): Handle api_base = ""
Fixes https://github.com/BerriAI/litellm/issues/5798
* fix(o1_transformation.py): handle stream_options not being supported
https://github.com/BerriAI/litellm/issues/5803
* docs(routing.md): fix docs
Closes https://github.com/BerriAI/litellm/issues/5808
* perf(internal_user_endpoints.py): reduce db calls for getting team_alias for a key
Use the list gotten earlier in `/user/info` endpoint
Reduces ui keys tab load time to 800ms (prev. 28s+)
* feat(proxy_server.py): support CONFIG_FILE_PATH as env var
Closes https://github.com/BerriAI/litellm/issues/5744
* feat(get_llm_provider_logic.py): add `litellm_proxy/` as a known openai-compatible route
simplifies calling litellm proxy
Reduces confusion when calling models on litellm proxy from litellm sdk
* docs(litellm_proxy.md): cleanup docs
* fix(internal_user_endpoints.py): fix pydantic obj
* test(test_key_generate_prisma.py): fix test
* feat(aws_base_llm.py): prevents recreating boto3 credentials during high traffic
Leads to 100ms perf boost in local testing
* fix(base_aws_llm.py): fix credential caching check to see if token is set
* refactor(bedrock/chat): separate converse api and invoke api + isolate converse api transformation logic
Make it easier to see how requests are transformed for /converse
* fix: fix imports
* fix(bedrock/embed): fix reordering of headers
* fix(base_aws_llm.py): fix get credential logic
* fix(converse_handler.py): fix ai21 streaming response
* fix(main.py): pass default azure api version as alternative in completion call
Fixes api error caused due to api version
Closes https://github.com/BerriAI/litellm/issues/5584
* Fixed gemini-1.5-flash pricing (#5590)
* add /key/list endpoint
* bump: version 1.44.21 → 1.44.22
* docs architecture
* Fixed gemini-1.5-flash pricing
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
* fix(bedrock/chat.py): fix converse api stop sequence param mapping
Fixes https://github.com/BerriAI/litellm/issues/5592
* fix(databricks/cost_calculator.py): handle databricks model name changes
Fixes https://github.com/BerriAI/litellm/issues/5597
* fix(azure.py): support azure api version 2024-08-01-preview
Closes https://github.com/BerriAI/litellm/issues/5377
* fix(proxy/_types.py): allow dev keys to call cohere /rerank endpoint
Fixes issue where only admin could call rerank endpoint
* fix(azure.py): check if model is gpt-4o
* fix(proxy/_types.py): support /v1/rerank on non-admin routes as well
* fix(cost_calculator.py): fix split on `/` logic in cost calculator
---------
Co-authored-by: F1bos <44951186+F1bos@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
* fix(vertex_ai): Fixes issue where multimodal message without text was failing vertex calls
Fixes https://github.com/BerriAI/litellm/issues/5515
* fix(azure.py): move to using httphandler for oidc token calls
Fixes issue where ssl certificates weren't being picked up as expected
Closes https://github.com/BerriAI/litellm/issues/5522
* feat: Allows admin to set a default_max_internal_user_budget in config, and allow setting more specific values as env vars
* fix(proxy_server.py): fix read for max_internal_user_budget
* build(model_prices_and_context_window.json): add regional gpt-4o-2024-08-06 pricing
Closes https://github.com/BerriAI/litellm/issues/5540
* test: skip re-test
* feat(router.py): initial commit for loadbalancing azure batch api endpoints
Closes https://github.com/BerriAI/litellm/issues/5396
* fix(router.py): working `router.acreate_file()`
* feat(router.py): working router.acreate_batch endpoint
* feat(router.py): expose router.aretrieve_batch function
Make it easy for user to retrieve the batch information
* feat(router.py): support 'router.alist_batches' endpoint
Adds support for getting all batches across all endpoints
* feat(router.py): working loadbalancing on `/v1/files`
* feat(proxy_server.py): working loadbalancing on `/v1/batches`
* feat(proxy_server.py): working loadbalancing on Retrieve + List batch
* feat(proxy/_types.py): add lago billing to callbacks ui
Closes https://github.com/BerriAI/litellm/issues/5472
* fix(anthropic.py): return anthropic prompt caching information
Fixes https://github.com/BerriAI/litellm/issues/5364
* feat(bedrock/chat.py): support 'json_schema' for bedrock models
Closes https://github.com/BerriAI/litellm/issues/5434
* fix(bedrock/embed/embeddings.py): support async embeddings for amazon titan models
* fix: linting fixes
* fix: handle key errors
* fix(bedrock/chat.py): fix bedrock ai21 streaming object
* feat(bedrock/embed): support bedrock embedding optional params
* fix(databricks.py): fix usage chunk
* fix(internal_user_endpoints.py): apply internal user defaults, if user role updated
Fixes issue where user update wouldn't apply defaults
* feat(slack_alerting.py): provide multiple slack channels for a given alert type
multiple channels might be interested in receiving an alert for a given type
* docs(alerting.md): add multiple channel alerting to docs
* Azure Service Principal with Secret authentication workflow. (#5131)
* Implement Azure Service Principal with Secret authentication workflow.
* Use `ClientSecretCredential` instead of `DefaultAzureCredential`.
* Move imports into the function.
* Add type hint for `azure_ad_token_provider`.
* Add unit test for router initialization and sample completion using Azure Service Principal with Secret authentication workflow.
* Add unit test for router initialization with neither API key nor using Azure Service Principal with Secret authentication workflow.
* fix(client_initializtion_utils.py): fix typing + overrides
* test: fix linting errors
* fix(client_initialization_utils.py): fix client init azure ad token logic
* fix(router_client_initialization.py): add flag check for reading azure ad token from environment
* test(test_streaming.py): skip end of life bedrock model
* test(test_router_client_init.py): add correct flag to test
---------
Co-authored-by: kzych-inpost <142029278+kzych-inpost@users.noreply.github.com>
* refactor(bedrock): initial commit to refactor bedrock to a folder
Improve code readability + maintainability
* refactor: more refactor work
* fix: fix imports
* feat(bedrock/embeddings.py): support translating embedding into amazon embedding formats
* fix: fix linting errors
* test: skip test on end of life model
* fix(cohere/embed.py): fix linting error
* fix(cohere/embed.py): fix typing
* fix(cohere/embed.py): fix post-call logging for cohere embedding call
* test(test_embeddings.py): fix error message assertion in test
* fix(utils.py): support 'drop_params' for embedding requests
Fixes https://github.com/BerriAI/litellm/issues/5444
* feat(anthropic/cost_calculation.py): Support calculating cost for prompt caching on anthropic
* feat(types/utils.py): allows us to migrate to openai's equivalent, once that comes out
* fix: fix linting errors
* test: mark flaky test