* fix base aws llm
* fix auth with aws role
* test aws base llm
* fix base aws llm init
* run ci/cd again
* fix get_credentials
* ci/cd run again
* _auth_with_aws_role
* feat(langfuse.py): log the used prompt when prompt management used
* test: fix test
* docs(self_serve.md): add doc on restricting personal key creation on ui
* feat(s3.py): support s3 logging with team alias prefixes (if available)
New preview feature
* fix(main.py): remove old if block - simplify to just await if coroutine returned
fixes lm_studio async embedding error
* fix(langfuse.py): handle get prompt check
* fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini process url
* refactor(router.py): refactor '_prompt_management_factory' to use logging obj get_chat_completion logic
deduplicates code
* fix(litellm_logging.py): update 'get_chat_completion_prompt' to update logging object messages
* docs(prompt_management.md): update prompt management to be in beta
given feedback - this still needs to be revised (e.g. passing in user message, not ignoring)
* refactor(prompt_management_base.py): introduce base class for prompt management
allows consistent behaviour across prompt management integrations
* feat(prompt_management_base.py): support adding client message to template message + refactor langfuse prompt management to use prompt management base
* fix(litellm_logging.py): log prompt id + prompt variables to langfuse if set
allows tracking what prompt was used for what purpose
* feat(litellm_logging.py): log prompt management metadata in standard logging payload + use in langfuse
allows logging prompt id / prompt variables to langfuse
* test: fix test
* fix(router.py): cleanup unused imports
* fix: fix linting error
* fix: fix trace param typing
* fix: fix linting errors
* fix: fix code qa check
* fix(custom_logger.py): expose new 'async_get_chat_completion_prompt' event hook
* fix(custom_logger.py): langfuse_prompt_management.py
remove 'headers' from custom logger 'async_get_chat_completion_prompt' and 'get_chat_completion_prompt' event hooks
* feat(router.py): expose new function for prompt management based routing
* feat(router.py): partial working router prompt factory logic
allows load balanced model to be used for model name w/ langfuse prompt management call
* feat(router.py): fix prompt management with load balanced model group
* feat(langfuse_prompt_management.py): support reading in openai params from langfuse
enables user to define optional params on langfuse vs. client code
* test(test_Router.py): add unit test for router based langfuse prompt management
* fix: fix linting errors
* refactor(prometheus.py): refactor to remove `_tag` metrics and incorporate in regular metrics
* fix(prometheus.py): handle label values not set in enum values
* feat(prometheus.py): working e2e custom metadata labels
* docs(prometheus.md): update docs to clarify how custom metrics would work
* test(test_prometheus_unit_tests.py): fix test
* test: add unit testing
* fix(prometheus.py): refactor litellm_input_tokens_metric to use label factory
makes adding new metrics easier
* feat(prometheus.py): add 'request_model' to 'litellm_input_tokens_metric'
* refactor(prometheus.py): refactor 'litellm_output_tokens_metric' to use label factory
makes adding new metrics easier
* feat(prometheus.py): emit requested model in 'litellm_output_tokens_metric'
* feat(prometheus.py): support tracking success events with custom metrics
* refactor(prometheus.py): refactor '_set_latency_metrics' to just use the initially created enum values dictionary
reduces scope for missing values
* feat(prometheus.py): refactor all tags to support custom metadata tags
enables metadata tags to be used across for e2e tracking
* fix(prometheus.py): fix requested model on success event enum_values
* test: fix test
* test: fix test
* test: handle filenotfound error
* docs(prometheus.md): add new values to prometheus
* docs(prometheus.md): document adding custom metrics on prometheus
* bump: version 1.56.5 → 1.56.6
* fix(langfuse_prompt_management.py): migrate dynamic logging to langfuse custom logger compatible class
* fix(langfuse_prompt_management.py): support failure callback logging to langfuse as well
* feat(proxy_server.py): support setting custom tokenizer on config.yaml
Allows customizing value for `/utils/token_counter`
* fix(proxy_server.py): fix linting errors
* test: skip if file not found
* style: cleanup unused import
* docs(configs.md): add docs on setting custom tokenizer
* feat(main.py): mock_response() - support 'litellm.ContextWindowExceededError' in mock response
enabled quicker router/fallback/proxy debug on context window errors
* feat(exception_mapping_utils.py): extract special litellm errors from error str if calling `litellm_proxy/` as provider
Closes https://github.com/BerriAI/litellm/issues/7259
* fix(user_api_key_auth.py): specify 'Received Proxy Server Request' is span kind server
Closes https://github.com/BerriAI/litellm/issues/7298
* build(model_prices_and_context_window.json): update groq models to specify 'supports_vision' parameter
Closes https://github.com/BerriAI/litellm/issues/7433
* docs(groq.md): add groq vision example to docs
Closes https://github.com/BerriAI/litellm/issues/7433
* fix(prometheus.py): refactor self.litellm_proxy_failed_requests_metric to use label factory
* feat(prometheus.py): new 'litellm_proxy_failed_requests_by_tag_metric'
allows tracking failed requests by tag on proxy
* fix(prometheus.py): fix exception logging
* feat(prometheus.py): add new 'litellm_request_total_latency_by_tag_metric'
enables tracking latency by use-case
* feat(prometheus.py): add new llm api latency by tag metric
* feat(prometheus.py): new litellm_deployment_latency_per_output_token_by_tag metric
allows tracking deployment latency by tag
* fix(prometheus.py): refactor 'litellm_requests_metric' to use enum values + label factory
* feat(prometheus.py): new litellm_proxy_total_requests_by_tag metric
allows tracking total requests by tag
* feat(prometheus.py): new metric litellm_deployment_successful_fallbacks_by_tag
allows tracking deployment fallbacks by tag
* fix(prometheus.py): new 'litellm_deployment_failed_fallbacks_by_tag' metric
allows tracking failed fallbacks on deployment by custom tag
* test: fix test
* test: rename test to run earlier
* test: skip flaky test
* feat(proxy/utils.py): get associated litellm budget from db in combined_view for key
allows user to create rate limit tiers and associate those to keys
* feat(proxy/_types.py): update the value of key-level tpm/rpm/model max budget metrics with the associated budget table values if set
allows rate limit tiers to be easily applied to keys
* docs(rate_limit_tiers.md): add doc on setting rate limit / budget tiers
make feature discoverable
* feat(key_management_endpoints.py): return litellm_budget_table value in key generate
make it easy for user to know associated budget on key creation
* fix(key_management_endpoints.py): document 'budget_id' param in `/key/generate`
* docs(key_management_endpoints.py): document budget_id usage
* refactor(budget_management_endpoints.py): refactor budget endpoints into separate file - makes it easier to run documentation testing against it
* docs(test_api_docs.py): add budget endpoints to ci/cd doc test + add missing param info to docs
* fix(customer_endpoints.py): use new pydantic obj name
* docs(user_management_heirarchy.md): add simple doc explaining teams/keys/org/users on litellm
* Litellm dev 12 26 2024 p2 (#7432)
* (Feat) Add logging for `POST v1/fine_tuning/jobs` (#7426)
* init commit ft jobs logging
* add ft logging
* add logging for FineTuningJob
* simple FT Job create test
* (docs) - show all supported Azure OpenAI endpoints in overview (#7428)
* azure batches
* update doc
* docs azure endpoints
* docs endpoints on azure
* docs azure batches api
* docs azure batches api
* fix(key_management_endpoints.py): fix key update to actually work
* test(test_key_management.py): add e2e test asserting ui key update call works
* fix: proxy/_types - fix linting erros
* test: update test
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
* fix: test
* fix(parallel_request_limiter.py): enforce tpm/rpm limits on key from tiers
* fix: fix linting errors
* test: fix test
* fix: remove unused import
* test: update test
* docs(customer_endpoints.py): document new model_max_budget param
* test: specify unique key alias
* docs(budget_management_endpoints.py): document new model_max_budget param
* test: fix test
* test: fix tests
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
* refactor(prometheus.py): refactor to use a factory method for setting label values
allows for enforcing end user id disabling on prometheus e2e
* fix: fix linting error
* fix(prometheus.py): ensure label factory drops end-user value if disabled by user
* fix(prometheus.py): specify service_type in end user tracking get
* test: fix test
* test: add unit test for prometheus factory
* test: improve test (cover flag not set scenario)
* test(test_prometheus.py): e2e test covering if 'end_user_id' shows up in testing if disabled
scrapes the `/metrics` endpoint and scans text to check if id appears in emitted metrics
* fix(prometheus.py): stringify status code before logging it
* fix(utils.py): default custom_llm_provider=None for 'supports_response_schema'
Closes https://github.com/BerriAI/litellm/issues/7397
* refactor(langfuse/): call langfuse logger inside customlogger compatible langfuse class, refactor langfuse logger to use verbose_logger.debug instead of print_verbose
* refactor(litellm_pre_call_utils.py): move config based team callbacks inside dynamic team callback logic
enables simpler unit testing for config-based team callbacks
* fix(proxy/_types.py): handle teamcallbackmetadata - none values
drop none values if present. if all none, use default dict to avoid downstream errors
* test(test_proxy_utils.py): add unit test preventing future issues - asserts team_id in config state not popped off across calls
Fixes https://github.com/BerriAI/litellm/issues/6787
* fix(langfuse_prompt_management.py): add success + failure logging event support
* fix: fix linting error
* test: fix test
* test: fix test
* test: override o1 prompt caching - openai currently not working
* test: fix test
* fix(invoke_handler.py): fix mock response iterator to handle tool calling
returns tool call if returned by model response
* fix(prometheus.py): add new 'tokens_by_tag' metric on prometheus
allows tracking 'token usage' by task
* feat(prometheus.py): add input + output token tracking by tag
* feat(prometheus.py): add tag based deployment failure tracking
allows admin to track failure by use-case
* fix(prometheus.py): support streaming end user litellm_proxy_total_requests_metric tracking
* fix(prometheus.py): add 'requested_model' and 'end_user_id' to 'litellm_request_total_latency_metric_bucket'
enables latency tracking by end user + requested model
* fix(prometheus.py): add end user, user and requested model metrics to 'litellm_llm_api_latency_metric'
* test: update prometheus unit tests
* test(test_prometheus.py): update tests
* test(test_prometheus.py): fix test
* test: reorder test
* fix(proxy_track_cost_callback.py): log to db if only end user param given
* fix: allows for jwt-auth based end user id spend tracking to work
* fix(utils.py): fix 'get_end_user_id_for_cost_tracking' to use 'user_api_key_end_user_id'
more stable - works with jwt-auth based end user tracking as well
* test(test_jwt.py): add e2e unit test to confirm end user cost tracking works for spend logs
* test: update test to use end_user api key hash param
* fix(langfuse.py): support end user cost tracking via jwt auth + langfuse
logs end user to langfuse if decoded from jwt token
* fix: fix linting errors
* test: fix test
* test: fix test
* fix: fix end user id extraction
* fix: run test earlier
* add unit test for test_datadog_static_methods
* docs dd vars
* test_datadog_payload_environment_variables
* test_datadog_static_methods
* docs env vars
* fix table