* fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini process url
* refactor(router.py): refactor '_prompt_management_factory' to use logging obj get_chat_completion logic
deduplicates code
* fix(litellm_logging.py): update 'get_chat_completion_prompt' to update logging object messages
* docs(prompt_management.md): update prompt management to be in beta
given feedback - this still needs to be revised (e.g. passing in user message, not ignoring)
* refactor(prompt_management_base.py): introduce base class for prompt management
allows consistent behaviour across prompt management integrations
* feat(prompt_management_base.py): support adding client message to template message + refactor langfuse prompt management to use prompt management base
* fix(litellm_logging.py): log prompt id + prompt variables to langfuse if set
allows tracking what prompt was used for what purpose
* feat(litellm_logging.py): log prompt management metadata in standard logging payload + use in langfuse
allows logging prompt id / prompt variables to langfuse
* test: fix test
* fix(router.py): cleanup unused imports
* fix: fix linting error
* fix: fix trace param typing
* fix: fix linting errors
* fix: fix code qa check
* test(test_utils.py): initial test for valid models
Addresses https://github.com/BerriAI/litellm/issues/7525
* fix: test
* feat(fireworks_ai/transformation.py): support retrieving valid models from fireworks ai endpoint
* refactor(fireworks_ai/): support checking model info on `/v1/models` route
* docs(set_keys.md): update docs to clarify check llm provider api usage
* fix(watsonx/common_utils.py): support 'WATSONX_ZENAPIKEY' for iam auth
* fix(watsonx): read in watsonx token from env var
* fix: fix linting errors
* fix(utils.py): fix provider config check
* style: cleanup unused imports
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* fix(types/utils.py): handle none logprobs
Fixes https://github.com/BerriAI/litellm/issues/328
* fix(exception_mapping_utils.py): fix error str unbound error
* refactor(azure_ai/): move to openai_like chat completion handler
allows for easy swapping of api base url's (e.g. ai.services.com)
Fixes https://github.com/BerriAI/litellm/issues/7275
* refactor(azure_ai/): move to base llm http handler
* fix(azure_ai/): handle differing api endpoints
* fix(azure_ai/): make sure all unit tests are passing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting error
* fix: fix linting errors
* fix(azure_ai/transformation.py): handle extra body param
* fix(azure_ai/transformation.py): fix max retries param handling
* fix: fix test
* test(test_azure_o1.py): fix test
* fix(llm_http_handler.py): support handling azure ai unprocessable entity error
* fix(llm_http_handler.py): handle sync invalid param error for azure ai
* fix(azure_ai/): streaming support with base_llm_http_handler
* fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai
* fix: fix linting errors
* fix(llm_http_handler.py): fix linting error
* fix(azure_ai/): handle cohere tool call invalid index param error
* docs(sidebar.js): docs for support model access groups for wildcard routes
* feat(key_management_endpoints.py): add check if user is premium_user when adding model access group for wildcard route
* refactor(docs/): make control model access a root-level doc in proxy sidebar
easier to discover how to control model access on litellm
* docs: more cleanup
* feat(fireworks_ai/): add document inlining support
Enables user to call non-vision models with images/pdfs/etc.
* test(test_fireworks_ai_translation.py): add unit testing for fireworks ai transform inline helper util
* docs(docs/): add document inlining details to fireworks ai docs
* feat(fireworks_ai/): allow user to dynamically disable auto add transform inline
allows client-side disabling of this feature for proxy users
* feat(fireworks_ai/): return 'supports_vision' and 'supports_pdf_input' true on all fireworks ai models
now true as fireworks ai supports document inlining
* test: fix tests
* fix(router.py): add unit testing for _is_model_access_group_for_wildcard_route
* fix(proxy_track_cost_callback.py): log to db if only end user param given
* fix: allows for jwt-auth based end user id spend tracking to work
* fix(utils.py): fix 'get_end_user_id_for_cost_tracking' to use 'user_api_key_end_user_id'
more stable - works with jwt-auth based end user tracking as well
* test(test_jwt.py): add e2e unit test to confirm end user cost tracking works for spend logs
* test: update test to use end_user api key hash param
* fix(langfuse.py): support end user cost tracking via jwt auth + langfuse
logs end user to langfuse if decoded from jwt token
* fix: fix linting errors
* test: fix test
* test: fix test
* fix: fix end user id extraction
* fix: run test earlier
* fix(key_management_endpoints.py): override metadata field value on update
allow user to override tags
* feat(__init__.py): expose new disable_end_user_cost_tracking_prometheus_only metric
allow disabling end user cost tracking on prometheus - fixes cardinality issue
* fix(litellm_pre_call_utils.py): add key/team level enforced params
Fixes https://github.com/BerriAI/litellm/issues/6652
* fix(key_management_endpoints.py): allow user to pass in `enforced_params` as a top level param on /key/generate and /key/update
* docs(enterprise.md): add docs on enforcing required params for llm requests
* Add support of Galadriel API (#7005)
* fix(router.py): robust retry after handling
set retry after time to 0 if >0 healthy deployments. handle base case = 1 deployment
* test(test_router.py): fix test
* feat(bedrock/): add support for 'nova' models
also adds explicit 'converse/' route for simpler routing
* fix: fix 'supports_pdf_input'
return if model supports pdf input on get_model_info
* feat(converse_transformation.py): support bedrock pdf input
* docs(document_understanding.md): add document understanding to docs
* fix(litellm_pre_call_utils.py): fix linting error
* fix(init.py): fix passing of bedrock converse models
* feat(bedrock/converse): support 'response_format={"type": "json_object"}'
* fix(converse_handler.py): fix linting error
* fix(base_llm_unit_tests.py): fix test
* fix: fix test
* test: fix test
* test: fix test
* test: remove duplicate test
---------
Co-authored-by: h4n0 <4738254+h4n0@users.noreply.github.com>
* feat(pass_through_endpoints/): support logging anthropic/gemini pass through calls to langfuse/s3/etc.
* fix(utils.py): allow disabling end user cost tracking with new param
Allows proxy admin to disable cost tracking for end user - keeps prometheus metrics small
* docs(configs.md): add disable_end_user_cost_tracking reference to docs
* feat(key_management_endpoints.py): add support for restricting access to `/key/generate` by team/proxy level role
Enables admin to restrict key creation, and assign team admins to handle distributing keys
* test(test_key_management.py): add unit testing for personal / team key restriction checks
* docs: add docs on restricting key creation
* docs(finetuned_models.md): add new guide on calling finetuned models
* docs(input.md): cleanup anthropic supported params
Closes https://github.com/BerriAI/litellm/issues/6856
* test(test_embedding.py): add test for passing extra headers via embedding
* feat(cohere/embed): pass client to async embedding
* feat(rerank.py): add `/v1/rerank` if missing for cohere base url
Closes https://github.com/BerriAI/litellm/issues/6844
* fix(main.py): pass extra_headers param to openai
Fixes https://github.com/BerriAI/litellm/issues/6836
* fix(litellm_logging.py): don't disable global callbacks when dynamic callbacks are set
Fixes issue where global callbacks - e.g. prometheus were overriden when langfuse was set dynamically
* fix(handler.py): fix linting error
* fix: fix typing
* build: add conftest to proxy_admin_ui_tests/
* test: fix test
* fix: fix linting errors
* test: fix test
* fix: fix pass through testing
* Fix Vertex AI function calling invoke: use JSON format instead of protobuf text format. (#6702)
* test: test tool_call conversion when arguments is empty dict
Fixes https://github.com/BerriAI/litellm/issues/6833
* fix(openai_like/handler.py): return more descriptive error message
Fixes https://github.com/BerriAI/litellm/issues/6812
* test: skip overloaded model
* docs(anthropic.md): update anthropic docs to show how to route to any new model
* feat(groq/): fake stream when 'response_format' param is passed
Groq doesn't support streaming when response_format is set
* feat(groq/): add response_format support for groq
Closes https://github.com/BerriAI/litellm/issues/6845
* fix(o1_handler.py): remove fake streaming for o1
Closes https://github.com/BerriAI/litellm/issues/6801
* build(model_prices_and_context_window.json): add groq llama3.2b model pricing
Closes https://github.com/BerriAI/litellm/issues/6807
* fix(utils.py): fix handling ollama response format param
Fixes https://github.com/BerriAI/litellm/issues/6848#issuecomment-2491215485
* docs(sidebars.js): refactor chat endpoint placement
* fix: fix linting errors
* test: fix test
* test: fix test
* fix(openai_like/handler): handle max retries
* fix(streaming_handler.py): fix streaming check for openai-compatible providers
* test: update test
* test: correctly handle model is overloaded error
* test: update test
* test: fix test
* test: mark flaky test
---------
Co-authored-by: Guowang Li <Guowang@users.noreply.github.com>
* fix(__init__.py): add 'watsonx_text' as mapped llm api route
Fixes https://github.com/BerriAI/litellm/issues/6663
* fix(opentelemetry.py): fix passing parallel tool calls to otel
Fixes https://github.com/BerriAI/litellm/issues/6677
* refactor(test_opentelemetry_unit_tests.py): create a base set of unit tests for all logging integrations - test for parallel tool call handling
reduces bugs in repo
* fix(__init__.py): update provider-model mapping to include all known provider-model mappings
Fixes https://github.com/BerriAI/litellm/issues/6669
* feat(anthropic): support passing document in llm api call
* docs(anthropic.md): add pdf anthropic call to docs + expose new 'supports_pdf_input' function
* fix(factory.py): fix linting error
* feat: initial commit for watsonx chat endpoint support
Closes https://github.com/BerriAI/litellm/issues/6562
* feat(watsonx/chat/handler.py): support tool calling for watsonx
Closes https://github.com/BerriAI/litellm/issues/6562
* fix(streaming_utils.py): return empty chunk instead of failing if streaming value is invalid dict
ensures streaming works for ibm watsonx
* fix(openai_like/chat/handler.py): ensure asynchttphandler is passed correctly for openai like calls
* fix: ensure exception mapping works well for watsonx calls
* fix(openai_like/chat/handler.py): handle async streaming correctly
* feat(main.py): Make it clear when a user is passing an invalid message
add validation for user content message
Closes https://github.com/BerriAI/litellm/issues/6565
* fix: cleanup
* fix(utils.py): loosen validation check, to just make sure content types are valid
make litellm robust to future content updates
* fix: fix linting erro
* fix: fix linting errors
* fix(utils.py): make validation check more flexible
* test: handle langfuse list index out of range error
* Litellm dev 11 02 2024 (#6561)
* fix(dual_cache.py): update in-memory check for redis batch get cache
Fixes latency delay for async_batch_redis_cache
* fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set
* feat(user_api_key_auth.py): add parent otel component for auth
allows us to isolate how much latency is added by auth checks
* perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task)
reduces latency by 200ms
* feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter)
Reduces latency by 400-800ms
* fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls
reduces latency by 50-100ms
* fix: fix linting error
* fix(_service_logger.py): fix import
* fix(user_api_key_auth.py): fix service logging
* fix(dual_cache.py): don't pass 'self'
* fix: fix python3.8 error
* fix: fix init]
* bump: version 1.51.4 → 1.51.5
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* Litellm dev 11 02 2024 (#6561)
* fix(dual_cache.py): update in-memory check for redis batch get cache
Fixes latency delay for async_batch_redis_cache
* fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set
* feat(user_api_key_auth.py): add parent otel component for auth
allows us to isolate how much latency is added by auth checks
* perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task)
reduces latency by 200ms
* feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter)
Reduces latency by 400-800ms
* fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls
reduces latency by 50-100ms
* fix: fix linting error
* fix(_service_logger.py): fix import
* fix(user_api_key_auth.py): fix service logging
* fix(dual_cache.py): don't pass 'self'
* fix: fix python3.8 error
* fix: fix init]
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* fix ImageObject conversion (#6584)
* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* fix allow using 15 seconds for premium license check
* testing fix bedrock deprecated cohere.command-text-v14
* (feat) add `Predicted Outputs` for OpenAI (#6594)
* bump openai to openai==1.54.0
* add 'prediction' param
* testing fix bedrock deprecated cohere.command-text-v14
* test test_openai_prediction_param.py
* test_openai_prediction_param_with_caching
* doc Predicted Outputs
* doc Predicted Output
* (fix) Vertex Improve Performance when using `image_url` (#6593)
* fix transformation vertex
* test test_process_gemini_image
* test_image_completion_request
* testing fix - bedrock has deprecated cohere.command-text-v14
* fix vertex pdf
* bump: version 1.51.5 → 1.52.0
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check
* fix(lowest_tpm_rpm_v2.py): return headers in correct format
* test: update test
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* test: remove eol model
* fix(proxy_server.py): fix db config loading logic
* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten
* test: skip test if required env var is missing
* test: fix test
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
* test: mark flaky test
* test: handle anthropic api instability
* test: update test
* test: bump num retries on langfuse tests - their api is quite bad
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
* fix(utils.py): support passing dynamic api base to validate_environment
Returns True if just api base is required and api base is passed
* fix(litellm_pre_call_utils.py): feature flag sending client headers to llm api
Fixes https://github.com/BerriAI/litellm/issues/6410
* fix(anthropic/chat/transformation.py): return correct error message
* fix(http_handler.py): add error response text in places where we expect it
* fix(factory.py): handle base case of no non-system messages to bedrock
Fixes https://github.com/BerriAI/litellm/issues/6411
* feat(cohere/embed): Support cohere image embeddings
Closes https://github.com/BerriAI/litellm/issues/6413
* fix(__init__.py): fix linting error
* docs(supported_embedding.md): add image embedding example to docs
* feat(cohere/embed): use cohere embedding returned usage for cost calc
* build(model_prices_and_context_window.json): add embed-english-v3.0 details (image cost + 'supports_image_input' flag)
* fix(cohere_transformation.py): fix linting error
* test(test_proxy_server.py): cleanup test
* test: cleanup test
* fix: fix linting errors
* feat(custom_logger.py): expose new `async_dataset_hook` for modifying/rejecting argilla items before logging
Allows user more control on what gets logged to argilla for annotations
* feat(google_ai_studio_endpoints.py): add new `/azure/*` pass through route
enables pass-through for azure provider
* feat(utils.py): support checking ollama `/api/show` endpoint for retrieving ollama model info
Fixes https://github.com/BerriAI/litellm/issues/6322
* fix(user_api_key_auth.py): add `/key/delete` to an allowed_ui_routes
Fixes https://github.com/BerriAI/litellm/issues/6236
* fix(user_api_key_auth.py): remove type ignore
* fix(user_api_key_auth.py): route ui vs. api token checks differently
Fixes https://github.com/BerriAI/litellm/issues/6238
* feat(internal_user_endpoints.py): support setting models as a default internal user param
Closes https://github.com/BerriAI/litellm/issues/6239
* fix(user_api_key_auth.py): fix exception string
* fix(user_api_key_auth.py): fix error string
* fix: fix test