* test(test_utils.py): initial test for valid models
Addresses https://github.com/BerriAI/litellm/issues/7525
* fix: test
* feat(fireworks_ai/transformation.py): support retrieving valid models from fireworks ai endpoint
* refactor(fireworks_ai/): support checking model info on `/v1/models` route
* docs(set_keys.md): update docs to clarify check llm provider api usage
* fix(watsonx/common_utils.py): support 'WATSONX_ZENAPIKEY' for iam auth
* fix(watsonx): read in watsonx token from env var
* fix: fix linting errors
* fix(utils.py): fix provider config check
* style: cleanup unused imports
* fix(redact_messages.py): fix redact messages for non-model response input to be dictionary
fixes issue with otel logging when message redaction is enabled
* fix(proxy_server.py): fix langfuse key leak in exception string
* test: fix test
* test: fix test
* test: fix tests
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* fix(types/utils.py): handle none logprobs
Fixes https://github.com/BerriAI/litellm/issues/328
* fix(exception_mapping_utils.py): fix error str unbound error
* refactor(azure_ai/): move to openai_like chat completion handler
allows for easy swapping of api base url's (e.g. ai.services.com)
Fixes https://github.com/BerriAI/litellm/issues/7275
* refactor(azure_ai/): move to base llm http handler
* fix(azure_ai/): handle differing api endpoints
* fix(azure_ai/): make sure all unit tests are passing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting error
* fix: fix linting errors
* fix(azure_ai/transformation.py): handle extra body param
* fix(azure_ai/transformation.py): fix max retries param handling
* fix: fix test
* test(test_azure_o1.py): fix test
* fix(llm_http_handler.py): support handling azure ai unprocessable entity error
* fix(llm_http_handler.py): handle sync invalid param error for azure ai
* fix(azure_ai/): streaming support with base_llm_http_handler
* fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai
* fix: fix linting errors
* fix(llm_http_handler.py): fix linting error
* fix(azure_ai/): handle cohere tool call invalid index param error
* fix(internal_user_endpoints.py): fix team list sort - handle team_alias being set + None
* fix(key_management_endpoints.py): allow team admin to create key for member via admin ui
Fixes https://github.com/BerriAI/litellm/issues/7482
* fix(proxy_server.py): allow querying info on specific model group via `/model_group/info`
allows client-side user to get model info from proxy
* fix(proxy_server.py): add docstring on `/model_group/info` showing how to filter by model name
* test(test_proxy_utils.py): add unit test for returning model group info filtered
* fix(proxy_server.py): fix query param
* fix(test_Get_model_info.py): handle no whitelisted bedrock modells
* fix(langfuse_prompt_management.py): migrate dynamic logging to langfuse custom logger compatible class
* fix(langfuse_prompt_management.py): support failure callback logging to langfuse as well
* feat(proxy_server.py): support setting custom tokenizer on config.yaml
Allows customizing value for `/utils/token_counter`
* fix(proxy_server.py): fix linting errors
* test: skip if file not found
* style: cleanup unused import
* docs(configs.md): add docs on setting custom tokenizer
* test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model
* fix(base_llm_unit_tests.py): handle azure o1 preview response format tests
skip as o1 on azure doesn't support tool calling yet
* fix: initial commit of azure o1 handler using openai caller
simplifies calling + allows fake streaming logic alr. implemented for openai to just work
* feat(azure/o1_handler.py): fake o1 streaming for azure o1 models
azure does not currently support streaming for o1
* feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info
enables user to toggle on when azure allows o1 streaming without needing to bump versions
* style(router.py): remove 'give feedback/get help' messaging when router is used
Prevents noisy messaging
Closes https://github.com/BerriAI/litellm/issues/5942
* test: fix azure o1 test
* test: fix tests
* fix: fix test
* refactor(utils.py): migrate amazon titan config to base config
* refactor(utils.py): refactor bedrock meta invoke model translation to use base config
* refactor(utils.py): move bedrock ai21 to base config
* refactor(utils.py): move bedrock cohere to base config
* refactor(utils.py): move bedrock mistral to use base config
* refactor(utils.py): move all provider optional param translations to using a config
* docs(clientside_auth.md): clarify how to pass vertex region to litellm proxy
* fix(utils.py): handle scenario where custom llm provider is none / empty
* fix: fix get config
* test(test_otel_load_tests.py): widen perf margin
* fix(utils.py): fix get provider config check to handle custom llm's
* fix(utils.py): fix check
* docs(sidebar.js): docs for support model access groups for wildcard routes
* feat(key_management_endpoints.py): add check if user is premium_user when adding model access group for wildcard route
* refactor(docs/): make control model access a root-level doc in proxy sidebar
easier to discover how to control model access on litellm
* docs: more cleanup
* feat(fireworks_ai/): add document inlining support
Enables user to call non-vision models with images/pdfs/etc.
* test(test_fireworks_ai_translation.py): add unit testing for fireworks ai transform inline helper util
* docs(docs/): add document inlining details to fireworks ai docs
* feat(fireworks_ai/): allow user to dynamically disable auto add transform inline
allows client-side disabling of this feature for proxy users
* feat(fireworks_ai/): return 'supports_vision' and 'supports_pdf_input' true on all fireworks ai models
now true as fireworks ai supports document inlining
* test: fix tests
* fix(router.py): add unit testing for _is_model_access_group_for_wildcard_route
* fix(azure_ai/transformation.py): route ai.services.azure calls to the azure provider route
requires token to be passed in as 'api-key'
Closes https://github.com/BerriAI/litellm/issues/7275
* fix(key_management_endpoints.py): enforce user is member of team, if team_id set and team_id exists in team table
* fix(key_management_endpoints.py): handle assigned_user_id = none
* feat(create_key_button.tsx): allow assigning keys to other users
allows proxy admin to easily assign other people keys
* build(create_key_button.tsx): fix error message display
don't swallow the error message for key creation failure
* build(create_key_button.tsx): allow proxy admin to edit team id
* build(create_key_button.tsx): allow proxy admin to assign keys to other users
* build(edit_user.tsx): clarify how 'user budgets' are applied
* test: remove dup test
* fix(key_management_endpoints.py): don't raise error if team not in db
'
* test: fix test
* feat(main.py): mock_response() - support 'litellm.ContextWindowExceededError' in mock response
enabled quicker router/fallback/proxy debug on context window errors
* feat(exception_mapping_utils.py): extract special litellm errors from error str if calling `litellm_proxy/` as provider
Closes https://github.com/BerriAI/litellm/issues/7259
* fix(user_api_key_auth.py): specify 'Received Proxy Server Request' is span kind server
Closes https://github.com/BerriAI/litellm/issues/7298
* fix(model_dashboard.tsx): support setting model_info params - e.g. mode on ui
Closes https://github.com/BerriAI/litellm/issues/5270
* fix(lowest_tpm_rpm_v2.py): deployment rpm over limit check
fixes selection error when getting potential deployments below known tpm/rpm limit
Fixes https://github.com/BerriAI/litellm/issues/7395
* fix(test_tpm_rpm_routing_v2.py): add unit test for https://github.com/BerriAI/litellm/issues/7395
* fix(lowest_tpm_rpm_v2.py): fix tpm key name in dict post rpm update
* test: rename test to run earlier
* test: skip flaky test
* build(model_prices_and_context_window.json): update groq models to specify 'supports_vision' parameter
Closes https://github.com/BerriAI/litellm/issues/7433
* docs(groq.md): add groq vision example to docs
Closes https://github.com/BerriAI/litellm/issues/7433
* fix(prometheus.py): refactor self.litellm_proxy_failed_requests_metric to use label factory
* feat(prometheus.py): new 'litellm_proxy_failed_requests_by_tag_metric'
allows tracking failed requests by tag on proxy
* fix(prometheus.py): fix exception logging
* feat(prometheus.py): add new 'litellm_request_total_latency_by_tag_metric'
enables tracking latency by use-case
* feat(prometheus.py): add new llm api latency by tag metric
* feat(prometheus.py): new litellm_deployment_latency_per_output_token_by_tag metric
allows tracking deployment latency by tag
* fix(prometheus.py): refactor 'litellm_requests_metric' to use enum values + label factory
* feat(prometheus.py): new litellm_proxy_total_requests_by_tag metric
allows tracking total requests by tag
* feat(prometheus.py): new metric litellm_deployment_successful_fallbacks_by_tag
allows tracking deployment fallbacks by tag
* fix(prometheus.py): new 'litellm_deployment_failed_fallbacks_by_tag' metric
allows tracking failed fallbacks on deployment by custom tag
* test: fix test
* test: rename test to run earlier
* test: skip flaky test
* feat(proxy/utils.py): get associated litellm budget from db in combined_view for key
allows user to create rate limit tiers and associate those to keys
* feat(proxy/_types.py): update the value of key-level tpm/rpm/model max budget metrics with the associated budget table values if set
allows rate limit tiers to be easily applied to keys
* docs(rate_limit_tiers.md): add doc on setting rate limit / budget tiers
make feature discoverable
* feat(key_management_endpoints.py): return litellm_budget_table value in key generate
make it easy for user to know associated budget on key creation
* fix(key_management_endpoints.py): document 'budget_id' param in `/key/generate`
* docs(key_management_endpoints.py): document budget_id usage
* refactor(budget_management_endpoints.py): refactor budget endpoints into separate file - makes it easier to run documentation testing against it
* docs(test_api_docs.py): add budget endpoints to ci/cd doc test + add missing param info to docs
* fix(customer_endpoints.py): use new pydantic obj name
* docs(user_management_heirarchy.md): add simple doc explaining teams/keys/org/users on litellm
* Litellm dev 12 26 2024 p2 (#7432)
* (Feat) Add logging for `POST v1/fine_tuning/jobs` (#7426)
* init commit ft jobs logging
* add ft logging
* add logging for FineTuningJob
* simple FT Job create test
* (docs) - show all supported Azure OpenAI endpoints in overview (#7428)
* azure batches
* update doc
* docs azure endpoints
* docs endpoints on azure
* docs azure batches api
* docs azure batches api
* fix(key_management_endpoints.py): fix key update to actually work
* test(test_key_management.py): add e2e test asserting ui key update call works
* fix: proxy/_types - fix linting erros
* test: update test
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
* fix: test
* fix(parallel_request_limiter.py): enforce tpm/rpm limits on key from tiers
* fix: fix linting errors
* test: fix test
* fix: remove unused import
* test: update test
* docs(customer_endpoints.py): document new model_max_budget param
* test: specify unique key alias
* docs(budget_management_endpoints.py): document new model_max_budget param
* test: fix test
* test: fix tests
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
* test: add new test image embedding to base llm unit tests
Addresses https://github.com/BerriAI/litellm/issues/6515
* fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings
Fix https://github.com/BerriAI/litellm/issues/6515
* feat: initial commit for fireworks ai audio transcription support
Relevant issue: https://github.com/BerriAI/litellm/issues/7134
* test: initial fireworks ai test
* feat(fireworks_ai/): implemented fireworks ai audio transcription config
* fix(utils.py): register fireworks ai audio transcription config, in config manager
* fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription'
* refactor(fireworks_ai/): define text completion route with model name handling
moves model name handling to specific fireworks routes, as required by their api
* refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix: fix linting errors
* fix(handler.py): fix linting errors
* fix(main.py): fix tgai text completion route
* refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request
* refactor: move test_fine_tuning_api out of local_testing
reduces local testing ci/cd time
* fix(invoke_handler.py): fix mock response iterator to handle tool calling
returns tool call if returned by model response
* fix(prometheus.py): add new 'tokens_by_tag' metric on prometheus
allows tracking 'token usage' by task
* feat(prometheus.py): add input + output token tracking by tag
* feat(prometheus.py): add tag based deployment failure tracking
allows admin to track failure by use-case
* fix(prometheus.py): support streaming end user litellm_proxy_total_requests_metric tracking
* fix(prometheus.py): add 'requested_model' and 'end_user_id' to 'litellm_request_total_latency_metric_bucket'
enables latency tracking by end user + requested model
* fix(prometheus.py): add end user, user and requested model metrics to 'litellm_llm_api_latency_metric'
* test: update prometheus unit tests
* test(test_prometheus.py): update tests
* test(test_prometheus.py): fix test
* test: reorder test
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching
* fix(context_caching/transformation.py): just use last identified cache point
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google
Fixes https://github.com/BerriAI/litellm/issues/6738
* fix(vertex_ai/gemini/): track context caching tokens
* refactor(gemini/): place transformation.py inside `chat/` folder
make it easy for user to know we support the equivalent endpoint
* fix: fix import
* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder
make it easier to see cost calculation logic
* fix: fix linting errors
* fix: fix circular import
* feat(gemini/cost_calculator.py): support gemini context caching cost calculation
generifies anthropic's cost calculation function and uses it across anthropic + gemini
* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching
Closes https://github.com/BerriAI/litellm/issues/6891
* docs(gemini.md): add gemini context caching architecture diagram
make it easier for user to understand how context caching works
* docs(gemini.md): link to relevant gemini context caching code
* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code
* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario
* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation
* test: fix test
* fix(main.py): support 'mock_timeout=true' param
allows mock requests on proxy to have a time delay, for testing
* fix(main.py): ensure mock timeouts raise litellm.Timeout error
triggers retry/fallbacks
* fix: fix fallback + mock timeout testing
* fix(router.py): always return remaining tpm/rpm limits, if limits are known
allows for rate limit headers to be guaranteed
* docs(timeout.md): add docs on mock timeout = true
* fix(main.py): fix linting errors
* test: fix test
* fix(proxy_server.py): enforce team id based model add only works if enterprise user
* fix(auth_checks.py): enforce common_checks can only be imported by user_api_key_auth.py
* fix(auth_checks.py): insert not premium user error message on failed common checks run
* fix(utils.py): e2e azure tts cost tracking working
moves tts response obj to include hidden params (allows for litellm call id, etc. to be sent in response headers) ; fixes spend_Tracking_utils logging payload to account for non-base model use-case
Fixes https://github.com/BerriAI/litellm/issues/7223
* fix: fix linting errors
* build(model_prices_and_context_window.json): add bedrock llama 3.3
Closes https://github.com/BerriAI/litellm/issues/7329
* fix(openai.py): fix return type for sync openai httpx response
* test: update test
* fix(spend_tracking_utils.py): fix if check
* fix(spend_tracking_utils.py): fix if check
* test: improve debugging for test
* fix: fix import
* fix(proxy_track_cost_callback.py): log to db if only end user param given
* fix: allows for jwt-auth based end user id spend tracking to work
* fix(utils.py): fix 'get_end_user_id_for_cost_tracking' to use 'user_api_key_end_user_id'
more stable - works with jwt-auth based end user tracking as well
* test(test_jwt.py): add e2e unit test to confirm end user cost tracking works for spend logs
* test: update test to use end_user api key hash param
* fix(langfuse.py): support end user cost tracking via jwt auth + langfuse
logs end user to langfuse if decoded from jwt token
* fix: fix linting errors
* test: fix test
* test: fix test
* fix: fix end user id extraction
* fix: run test earlier
* feat(router.py): support passing model-specific messages in fallbacks
* docs(routing.md): separate router timeouts into separate doc
allow for 1 fallbacks doc (across proxy/router)
* docs(routing.md): cleanup router docs
* docs(reliability.md): cleanup docs
* docs(reliability.md): cleaned up fallback doc
just have 1 doc across sdk/proxy
simplifies docs
* docs(reliability.md): add setting model-specific fallback prompts
* fix: fix linting errors
* test: skip test causing openai rate limit errros
* test: fix test
* test: run vertex test first to catch error
* fix(proxy_server.py): only update k,v pair if v is not empty/null
Fixes https://github.com/BerriAI/litellm/issues/6787
* test(test_router.py): cleanup duplicate calls
* test: add new test stream options drop params test
* test: update optional params / stream options test to test for vertex ai mistral route specifically
Addresses https://github.com/BerriAI/litellm/issues/7309
* fix(proxy_server.py): fix linting errors
* fix: fix linting errors