litellm-mirror

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-26 11:14:04 +00:00

Author	SHA1	Message	Date
Ishaan Jaff	b6d1ab6152	test_completion_mistral_api_mistral_large_function_call	2025-01-16 22:27:48 -08:00
Ishaan Jaff	2177bdc836	(datadog llm observability) - fixes + improvements for using `datadog llm observability` logging integration (#7824 ) * dd llm obs fixes * _ensure_string_content * fix _get_dd_llm_obs_payload_metadata	2025-01-16 22:02:24 -08:00
Krish Dholakia	000d3152a8	Litellm dev 01 14 2025 p1 (#7771 ) * First-class Aim Guardrails support (#7738) * initial aim support * add tests * docs(langsmith_integration.md): cleanup * style: cleanup unused imports --------- Co-authored-by: Tomer Bin <117278227+hxtomer@users.noreply.github.com>	2025-01-14 16:18:21 -08:00
Krish Dholakia	8ee79dd5d9	[BETA] Add OpenAI `/images/variations` + Topaz API support (#7700 ) * feat(main.py): initial commit for `/image/variations` endpoint support * refactor(base_llm/): introduce new base llm base config for image variation endpoints * refactor(openai/image_variations/transformation.py): implement openai image variation transformation handler * fix: test * feat(openai/): working openai `/image/variation` endpoint calls via sdk * feat(topaz/): topaz sync image variation call support Addresses https://github.com/BerriAI/litellm/issues/7593 ' * fix(topaz/transformation.py): fix linting errors * fix(openai/image_variations/handler.py): fix passing json data * fix(main.py): image_variation/ support async image variation route - `aimage_variation` * fix(test_get_model_info.py): fix test * fix: cleanup unused imports * feat(openai/): add async `/image/variations` endpoint support * feat(topaz/): support async `/image/variations` calls * fix: test * fix(utils.py): fix get_model_info_helper for no model info w/ provider config handles situation where model info is not known but provider config exists * test(test_router_fallbacks.py): mark flaky test * fix: fix unused imports * test: bump otel load test perf threshold - accounts for current load tests hitting same server	2025-01-11 23:27:46 -08:00
Krish Dholakia	953c021aa7	Litellm dev 01 10 2025 p3 (#7682 ) * feat(langfuse.py): log the used prompt when prompt management used * test: fix test * docs(self_serve.md): add doc on restricting personal key creation on ui * feat(s3.py): support s3 logging with team alias prefixes (if available) New preview feature * fix(main.py): remove old if block - simplify to just await if coroutine returned fixes lm_studio async embedding error * fix(langfuse.py): handle get prompt check	2025-01-10 21:56:42 -08:00
Ishaan Jaff	9e2b1101c0	(litellm sdk - perf improvement) - use O(1) set lookups for checking llm providers / models (#7672 ) * fix get model info logic to use O(1) lookups * perf - use O(1) lookup for get llm provider	2025-01-10 14:16:30 -08:00
Krish Dholakia	75c3ddfc9e	fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini p… (#7660 ) * fix(vertex_ai/gemini/transformation.py): handle 'http://' in gemini process url * refactor(router.py): refactor '_prompt_management_factory' to use logging obj get_chat_completion logic deduplicates code * fix(litellm_logging.py): update 'get_chat_completion_prompt' to update logging object messages * docs(prompt_management.md): update prompt management to be in beta given feedback - this still needs to be revised (e.g. passing in user message, not ignoring) * refactor(prompt_management_base.py): introduce base class for prompt management allows consistent behaviour across prompt management integrations * feat(prompt_management_base.py): support adding client message to template message + refactor langfuse prompt management to use prompt management base * fix(litellm_logging.py): log prompt id + prompt variables to langfuse if set allows tracking what prompt was used for what purpose * feat(litellm_logging.py): log prompt management metadata in standard logging payload + use in langfuse allows logging prompt id / prompt variables to langfuse * test: fix test * fix(router.py): cleanup unused imports * fix: fix linting error * fix: fix trace param typing * fix: fix linting errors * fix: fix code qa check	2025-01-10 07:31:59 -08:00
Krish Dholakia	6d8cfeaf14	LiteLLM Minor Fixes & Improvements (01/08/2025) - p2 (#7643 ) * fix(streaming_chunk_builder_utils.py): add test for groq tool calling + streaming + combine chunks Addresses https://github.com/BerriAI/litellm/issues/7621 * fix(streaming_utils.py): fix modelresponseiterator for openai like chunk parser ensures chunk parser uses the correct tool call id when translating the chunk Fixes https://github.com/BerriAI/litellm/issues/7621 * build(model_hub.tsx): display cost pricing on model hub * build(model_hub.tsx): show cost per token pricing + complete model information * fix(types/utils.py): fix usage object handling	2025-01-08 19:45:19 -08:00
Ishaan Jaff	81d1826c25	[Feature]: - allow print alert log to console (#7534 ) * update send_to_webhook * test_print_alerting_payload_warning * add alerting_args spec * test_alerting.py	2025-01-03 17:48:13 -08:00
Krish Dholakia	796913fb30	Fix langfuse prompt management on proxy (#7535 ) * fix(types/utils.py): support langfuse + humanloop routes on llm router * fix(main.py): remove acompletion elif block just await if coroutine returned	2025-01-03 12:42:37 -08:00
Ishaan Jaff	3a454ee2ce	(perf) use `aiohttp` for `custom_openai` (#7514 ) * use aiohttp handler * BaseLLMAIOHTTPHandler * use CustomOpenAIChatConfig * CustomOpenAIChatConfig * CustomOpenAIChatConfig * fix linting * AiohttpOpenAIChatConfig * fix order * aiohttp_openai	2025-01-02 22:15:17 -08:00
Krish Dholakia	02ff7b0a8a	Litellm dev 01 01 2025 p1 (#7498 ) * refactor(prometheus.py): refactor to remove `_tag` metrics and incorporate in regular metrics * fix(prometheus.py): handle label values not set in enum values * feat(prometheus.py): working e2e custom metadata labels * docs(prometheus.md): update docs to clarify how custom metrics would work * test(test_prometheus_unit_tests.py): fix test * test: add unit testing	2025-01-01 18:59:28 -08:00
Krish Dholakia	b0f570ee16	Litellm dev 12 30 2024 p2 (#7495 ) * test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model * fix(base_llm_unit_tests.py): handle azure o1 preview response format tests skip as o1 on azure doesn't support tool calling yet * fix: initial commit of azure o1 handler using openai caller simplifies calling + allows fake streaming logic alr. implemented for openai to just work * feat(azure/o1_handler.py): fake o1 streaming for azure o1 models azure does not currently support streaming for o1 * feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info enables user to toggle on when azure allows o1 streaming without needing to bump versions * style(router.py): remove 'give feedback/get help' messaging when router is used Prevents noisy messaging Closes https://github.com/BerriAI/litellm/issues/5942 * fix(types/utils.py): handle none logprobs Fixes https://github.com/BerriAI/litellm/issues/328 * fix(exception_mapping_utils.py): fix error str unbound error * refactor(azure_ai/): move to openai_like chat completion handler allows for easy swapping of api base url's (e.g. ai.services.com) Fixes https://github.com/BerriAI/litellm/issues/7275 * refactor(azure_ai/): move to base llm http handler * fix(azure_ai/): handle differing api endpoints * fix(azure_ai/): make sure all unit tests are passing * fix: fix linting errors * fix: fix linting errors * fix: fix linting error * fix: fix linting errors * fix(azure_ai/transformation.py): handle extra body param * fix(azure_ai/transformation.py): fix max retries param handling * fix: fix test * test(test_azure_o1.py): fix test * fix(llm_http_handler.py): support handling azure ai unprocessable entity error * fix(llm_http_handler.py): handle sync invalid param error for azure ai * fix(azure_ai/): streaming support with base_llm_http_handler * fix(llm_http_handler.py): working sync stream calls with unprocessable entity handling for azure ai * fix: fix linting errors * fix(llm_http_handler.py): fix linting error * fix(azure_ai/): handle cohere tool call invalid index param error	2025-01-01 18:57:29 -08:00
Ishaan Jaff	0cbecbe185	(Feat) - LiteLLM Use `UsernamePasswordCredential` for Azure OpenAI (#7496 ) * add get_azure_ad_token_from_username_password * docs azure use username / password for auth * update doc * get_azure_ad_token_from_username_password * test test_get_azure_ad_token_from_username_password	2025-01-01 14:11:27 -08:00
Ishaan Jaff	0b4d529af8	(feat) POST `/fine_tuning/jobs` support passing vertex specific hyper params (#7490 ) * update convert_openai_request_to_vertex * test_create_vertex_fine_tune_jobs_mocked * fix order of methods * update LiteLLMFineTuningJobCreate * update OpenAIFineTuningHyperparameters * update vertex hyper params in response * _transform_openai_hyperparameters_to_vertex_hyperparameters * supervised_tuning_spec["hyperParameters"] fix * fix mapping for ft params testing * docs fine tuning apis * fix test_convert_basic_openai_request_to_vertex_request * update hyperparams for create fine tuning * fix linting * test_create_vertex_fine_tune_jobs_mocked_with_hyperparameters * run ci/cd again * test_convert_basic_openai_request_to_vertex_request	2025-01-01 07:44:48 -08:00
Krish Dholakia	c46c1e6ea0	Prometheus - custom metrics support + other improvements (#7489 ) * fix(prometheus.py): refactor litellm_input_tokens_metric to use label factory makes adding new metrics easier * feat(prometheus.py): add 'request_model' to 'litellm_input_tokens_metric' * refactor(prometheus.py): refactor 'litellm_output_tokens_metric' to use label factory makes adding new metrics easier * feat(prometheus.py): emit requested model in 'litellm_output_tokens_metric' * feat(prometheus.py): support tracking success events with custom metrics * refactor(prometheus.py): refactor '_set_latency_metrics' to just use the initially created enum values dictionary reduces scope for missing values * feat(prometheus.py): refactor all tags to support custom metadata tags enables metadata tags to be used across for e2e tracking * fix(prometheus.py): fix requested model on success event enum_values * test: fix test * test: fix test * test: handle filenotfound error * docs(prometheus.md): add new values to prometheus * docs(prometheus.md): document adding custom metrics on prometheus * bump: version 1.56.5 → 1.56.6	2025-01-01 07:41:50 -08:00
Ishaan Jaff	a39cac313c	(Feat) - Add PagerDuty Alerting Integration (#7478 ) * define basic types * fix verbose_logger.exception statement * fix basic alerting * test pager duty alerting * test_pagerduty_alerting_high_failure_rate * PagerDutyAlerting * async_log_failure_event * use pre_call_hook * add _request_is_completed helper util * update AlertingConfig * rename PagerDutyInternalEvent * _send_alert_if_thresholds_crossed * use pagerduty as _custom_logger_compatible_callbacks_literal * fix slack alerting imports * fix imports in slack alerting * PagerDutyAlerting * fix _load_alerting_settings * test_pagerduty_hanging_request_alerting * working pager duty alerting * fix linting * doc pager duty alerting * update hanging_response_handler * fix import location * update failure_threshold * update async_pre_call_hook * docs pagerduty * test - callback_class_str_to_classType * fix linting errors * fix linting + testing error * PagerDutyAlerting * test_pagerduty_hanging_request_alerting * fix unused imports * docs pager duty * @pytest.mark.flaky(retries=6, delay=2) * test_model_info_bedrock_converse_enforcement	2025-01-01 07:12:51 -08:00
Krish Dholakia	03fa654b97	Litellm dev 12 31 2024 p1 (#7488 ) * fix(internal_user_endpoints.py): fix team list sort - handle team_alias being set + None * fix(key_management_endpoints.py): allow team admin to create key for member via admin ui Fixes https://github.com/BerriAI/litellm/issues/7482 * fix(proxy_server.py): allow querying info on specific model group via `/model_group/info` allows client-side user to get model info from proxy * fix(proxy_server.py): add docstring on `/model_group/info` showing how to filter by model name * test(test_proxy_utils.py): add unit test for returning model group info filtered * fix(proxy_server.py): fix query param * fix(test_Get_model_info.py): handle no whitelisted bedrock modells	2024-12-31 23:21:51 -08:00
Krish Dholakia	39a11ad272	Fix team-based logging to langfuse + allow custom tokenizer on `/token_counter` endpoint (#7493 ) * fix(langfuse_prompt_management.py): migrate dynamic logging to langfuse custom logger compatible class * fix(langfuse_prompt_management.py): support failure callback logging to langfuse as well * feat(proxy_server.py): support setting custom tokenizer on config.yaml Allows customizing value for `/utils/token_counter` * fix(proxy_server.py): fix linting errors * test: skip if file not found * style: cleanup unused import * docs(configs.md): add docs on setting custom tokenizer	2024-12-31 23:18:41 -08:00
Krish Dholakia	77c13df55d	HumanLoop integration for Prompt Management (#7479 ) * feat(humanloop.py): initial commit for humanloop prompt management integration Closes https://github.com/BerriAI/litellm/issues/213 * feat(humanloop.py): working e2e humanloop prompt management integration Closes https://github.com/BerriAI/litellm/issues/213 * fix(humanloop.py): fix linting errors * fix: fix linting erro * fix: fix test * test: handle filenotfound error	2024-12-30 22:26:03 -08:00
Krish Dholakia	0178e75cd9	Litellm dev 12 30 2024 p1 (#7480 ) * test(azure_openai_o1.py): initial commit with testing for azure openai o1 preview model * fix(base_llm_unit_tests.py): handle azure o1 preview response format tests skip as o1 on azure doesn't support tool calling yet * fix: initial commit of azure o1 handler using openai caller simplifies calling + allows fake streaming logic alr. implemented for openai to just work * feat(azure/o1_handler.py): fake o1 streaming for azure o1 models azure does not currently support streaming for o1 * feat(o1_transformation.py): support overriding 'should_fake_stream' on azure/o1 via 'supports_native_streaming' param on model info enables user to toggle on when azure allows o1 streaming without needing to bump versions * style(router.py): remove 'give feedback/get help' messaging when router is used Prevents noisy messaging Closes https://github.com/BerriAI/litellm/issues/5942 * test: fix azure o1 test * test: fix tests * fix: fix test	2024-12-30 21:52:52 -08:00
Ishaan Jaff	eac201598b	test_rerank_response_assertions (#7476 )	2024-12-30 10:12:56 -08:00
Krish Dholakia	9150722a00	Litellm dev 12 28 2024 p2 (#7458 ) * docs(sidebar.js): docs for support model access groups for wildcard routes * feat(key_management_endpoints.py): add check if user is premium_user when adding model access group for wildcard route * refactor(docs/): make control model access a root-level doc in proxy sidebar easier to discover how to control model access on litellm * docs: more cleanup * feat(fireworks_ai/): add document inlining support Enables user to call non-vision models with images/pdfs/etc. * test(test_fireworks_ai_translation.py): add unit testing for fireworks ai transform inline helper util * docs(docs/): add document inlining details to fireworks ai docs * feat(fireworks_ai/): allow user to dynamically disable auto add transform inline allows client-side disabling of this feature for proxy users * feat(fireworks_ai/): return 'supports_vision' and 'supports_pdf_input' true on all fireworks ai models now true as fireworks ai supports document inlining * test: fix tests * fix(router.py): add unit testing for _is_model_access_group_for_wildcard_route	2024-12-28 19:38:06 -08:00
Krish Dholakia	ebc28b1921	Litellm dev 12 28 2024 p3 (#7464 ) * feat(deepgram/): initial e2e support for deepgram stt Uses deepgram's `/listen` endpoint to transcribe speech to text Closes https://github.com/BerriAI/litellm/issues/4875 * fix: fix linting errors * test: fix test	2024-12-28 19:18:58 -08:00
Ishaan Jaff	6ec5ed8b3c	✨ (Feat) Log Guardrails run, guardrail response on logging integrations (#7445 ) * add guardrail_information to SLP * use standard_logging_guardrail_information * track StandardLoggingGuardrailInformation * use log_guardrail_information * use log_guardrail_information * docs guardrails * docs guardrails * update quick start * fix presidio logging for sync functions * update Guardrail type * enforce add_standard_logging_guardrail_information_to_request_data * update gd docs	2024-12-27 15:01:56 -08:00
Ishaan Jaff	e65cc581b3	(feat) `/guardrails/list` show guardrail info params (#7442 ) * add GuardrailInfoResponse * add list_guardrails * test_get_guardrails_list_response	2024-12-27 14:35:00 -08:00
Krish Dholakia	f30260343b	Litellm dev 12 26 2024 p3 (#7434 ) * build(model_prices_and_context_window.json): update groq models to specify 'supports_vision' parameter Closes https://github.com/BerriAI/litellm/issues/7433 * docs(groq.md): add groq vision example to docs Closes https://github.com/BerriAI/litellm/issues/7433 * fix(prometheus.py): refactor self.litellm_proxy_failed_requests_metric to use label factory * feat(prometheus.py): new 'litellm_proxy_failed_requests_by_tag_metric' allows tracking failed requests by tag on proxy * fix(prometheus.py): fix exception logging * feat(prometheus.py): add new 'litellm_request_total_latency_by_tag_metric' enables tracking latency by use-case * feat(prometheus.py): add new llm api latency by tag metric * feat(prometheus.py): new litellm_deployment_latency_per_output_token_by_tag metric allows tracking deployment latency by tag * fix(prometheus.py): refactor 'litellm_requests_metric' to use enum values + label factory * feat(prometheus.py): new litellm_proxy_total_requests_by_tag metric allows tracking total requests by tag * feat(prometheus.py): new metric litellm_deployment_successful_fallbacks_by_tag allows tracking deployment fallbacks by tag * fix(prometheus.py): new 'litellm_deployment_failed_fallbacks_by_tag' metric allows tracking failed fallbacks on deployment by custom tag * test: fix test * test: rename test to run earlier * test: skip flaky test	2024-12-26 21:21:16 -08:00
Krish Dholakia	d6a2beb342	Support budget/rate limit tiers for keys (#7429 ) * feat(proxy/utils.py): get associated litellm budget from db in combined_view for key allows user to create rate limit tiers and associate those to keys * feat(proxy/_types.py): update the value of key-level tpm/rpm/model max budget metrics with the associated budget table values if set allows rate limit tiers to be easily applied to keys * docs(rate_limit_tiers.md): add doc on setting rate limit / budget tiers make feature discoverable * feat(key_management_endpoints.py): return litellm_budget_table value in key generate make it easy for user to know associated budget on key creation * fix(key_management_endpoints.py): document 'budget_id' param in `/key/generate` * docs(key_management_endpoints.py): document budget_id usage * refactor(budget_management_endpoints.py): refactor budget endpoints into separate file - makes it easier to run documentation testing against it * docs(test_api_docs.py): add budget endpoints to ci/cd doc test + add missing param info to docs * fix(customer_endpoints.py): use new pydantic obj name * docs(user_management_heirarchy.md): add simple doc explaining teams/keys/org/users on litellm * Litellm dev 12 26 2024 p2 (#7432) * (Feat) Add logging for `POST v1/fine_tuning/jobs` (#7426) * init commit ft jobs logging * add ft logging * add logging for FineTuningJob * simple FT Job create test * (docs) - show all supported Azure OpenAI endpoints in overview (#7428) * azure batches * update doc * docs azure endpoints * docs endpoints on azure * docs azure batches api * docs azure batches api * fix(key_management_endpoints.py): fix key update to actually work * test(test_key_management.py): add e2e test asserting ui key update call works * fix: proxy/_types - fix linting erros * test: update test --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * fix: test * fix(parallel_request_limiter.py): enforce tpm/rpm limits on key from tiers * fix: fix linting errors * test: fix test * fix: remove unused import * test: update test * docs(customer_endpoints.py): document new model_max_budget param * test: specify unique key alias * docs(budget_management_endpoints.py): document new model_max_budget param * test: fix test * test: fix tests --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>	2024-12-26 19:05:27 -08:00
Ishaan Jaff	aff0f974bc	(Feat) Add logging for `POST v1/fine_tuning/jobs` (#7426 ) * init commit ft jobs logging * add ft logging * add logging for FineTuningJob * simple FT Job create test	2024-12-26 08:58:47 -08:00
Krish Dholakia	8567342bd4	Litellm dev 12 25 2024 p3 (#7421 ) * refactor(prometheus.py): refactor to use a factory method for setting label values allows for enforcing end user id disabling on prometheus e2e * fix: fix linting error * fix(prometheus.py): ensure label factory drops end-user value if disabled by user * fix(prometheus.py): specify service_type in end user tracking get * test: fix test * test: add unit test for prometheus factory * test: improve test (cover flag not set scenario) * test(test_prometheus.py): e2e test covering if 'end_user_id' shows up in testing if disabled scrapes the `/metrics` endpoint and scans text to check if id appears in emitted metrics * fix(prometheus.py): stringify status code before logging it	2024-12-25 18:54:24 -08:00
Krish Dholakia	7af1f8a0c7	Litellm dev 12 25 2025 p2 (#7420 ) * test: add new test image embedding to base llm unit tests Addresses https://github.com/BerriAI/litellm/issues/6515 * fix(bedrock/embed/multimodal-embeddings): strip data prefix from image urls for bedrock multimodal embeddings Fix https://github.com/BerriAI/litellm/issues/6515 * feat: initial commit for fireworks ai audio transcription support Relevant issue: https://github.com/BerriAI/litellm/issues/7134 * test: initial fireworks ai test * feat(fireworks_ai/): implemented fireworks ai audio transcription config * fix(utils.py): register fireworks ai audio transcription config, in config manager * fix(utils.py): add fireworks ai param translation to 'get_optional_params_transcription' * refactor(fireworks_ai/): define text completion route with model name handling moves model name handling to specific fireworks routes, as required by their api * refactor(fireworks_ai/chat): define transform_Request - allows fixing model if accounts/ is missing * fix: fix linting errors * fix: fix linting errors * fix: fix linting errors * fix: fix linting errors * fix(handler.py): fix linting errors * fix(main.py): fix tgai text completion route * refactor(together_ai/completion): refactors together ai text completion route to just use provider transform request * refactor: move test_fine_tuning_api out of local_testing reduces local testing ci/cd time	2024-12-25 18:35:34 -08:00
Ishaan Jaff	5612103ea3	(feat) Support Dynamic Params for `guardrails` (#7415 ) * update CustomGuardrail * unit test custom guardrails * add dynamic params for aporia * add dynamic params to bedrock guard * add dynamic params for all guardrails * fix linting * fix should_run_guardrail * _validate_premium_user * update guardrail doc * doc update * update code q * should_run_guardrail	2024-12-25 16:07:29 -08:00
Krish Dholakia	f929a1f309	Litellm dev 12 24 2024 p4 (#7407 ) * fix(invoke_handler.py): fix mock response iterator to handle tool calling returns tool call if returned by model response * fix(prometheus.py): add new 'tokens_by_tag' metric on prometheus allows tracking 'token usage' by task * feat(prometheus.py): add input + output token tracking by tag * feat(prometheus.py): add tag based deployment failure tracking allows admin to track failure by use-case	2024-12-24 20:24:06 -08:00
Ishaan Jaff	e98f1d16fd	(feat) `/batches` - track `user_api_key_alias`, `user_api_key_team_alias` etc for /batch requests (#7401 ) * run azure testing on ci/cd * update docs on azure batches endpoints * add input azure.jsonl * refactor - use separate file for batches endpoints * fixes for passing custom llm provider to /batch endpoints * pass custom llm provider to files endpoints * update azure batches doc * add info for azure batches api * update batches endpoints * use simple helper for raising proxy exception * update config.yml * fix imports * add type hints to get_litellm_params * update get_litellm_params * update get_litellm_params * update get slp * QOL - stop double logging a create batch operations on custom loggers * re use slp from og event * _create_standard_logging_object_for_completed_batch * fix linting errors * reduce num changes in PR * update BATCH_STATUS_POLL_MAX_ATTEMPTS	2024-12-24 17:44:28 -08:00
Krish Dholakia	7403d7b046	Add 'end_user', 'user' and 'requested_model' on more prometheus metrics (#7399 ) * fix(prometheus.py): support streaming end user litellm_proxy_total_requests_metric tracking * fix(prometheus.py): add 'requested_model' and 'end_user_id' to 'litellm_request_total_latency_metric_bucket' enables latency tracking by end user + requested model * fix(prometheus.py): add end user, user and requested model metrics to 'litellm_llm_api_latency_metric' * test: update prometheus unit tests * test(test_prometheus.py): update tests * test(test_prometheus.py): fix test * test: reorder test	2024-12-24 14:08:30 -08:00
Krish Dholakia	8fe1356406	LiteLLM Minor Fixes & Improvements (12/23/2024) - p3 (#7394 ) * build(model_prices_and_context_window.json): add gemini-1.5-flash context caching * fix(context_caching/transformation.py): just use last identified cache point Fixes https://github.com/BerriAI/litellm/issues/6738 * fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google Fixes https://github.com/BerriAI/litellm/issues/6738 * fix(vertex_ai/gemini/): track context caching tokens * refactor(gemini/): place transformation.py inside `chat/` folder make it easy for user to know we support the equivalent endpoint * fix: fix import * refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder make it easier to see cost calculation logic * fix: fix linting errors * fix: fix circular import * feat(gemini/cost_calculator.py): support gemini context caching cost calculation generifies anthropic's cost calculation function and uses it across anthropic + gemini * build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching Closes https://github.com/BerriAI/litellm/issues/6891 * docs(gemini.md): add gemini context caching architecture diagram make it easier for user to understand how context caching works * docs(gemini.md): link to relevant gemini context caching code * docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code * fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario * fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation * test: fix test	2024-12-23 22:02:52 -08:00
Ishaan Jaff	9d66976162	(feat) Add basic logging support for `/batches` endpoints (#7381 ) * add basic logging for create`batch` * add create_batch as a call type * add basic dd logging for batches * basic batch creation logging on DD	2024-12-23 17:45:03 -08:00
Krish Dholakia	51f9f75c85	LiteLLM Minor Fixes & Improvements (12/23/2024) - P2 (#7386 ) * fix(main.py): support 'mock_timeout=true' param allows mock requests on proxy to have a time delay, for testing * fix(main.py): ensure mock timeouts raise litellm.Timeout error triggers retry/fallbacks * fix: fix fallback + mock timeout testing * fix(router.py): always return remaining tpm/rpm limits, if limits are known allows for rate limit headers to be guaranteed * docs(timeout.md): add docs on mock timeout = true * fix(main.py): fix linting errors * test: fix test	2024-12-23 17:41:27 -08:00
Krish Dholakia	71f659d26b	Complete 'requests' library removal (#7350 ) * refactor: initial commit moving watsonx_text to base_llm_http_handler + clarifying new provider directory structure * refactor(watsonx/completion/handler.py): move to using base llm http handler removes 'requests' library usage * fix(watsonx_text/transformation.py): fix result transformation migrates to transformation.py, for usage with base llm http handler * fix(streaming_handler.py): migrate watsonx streaming to transformation.py ensures streaming works with base llm http handler * fix(streaming_handler.py): fix streaming linting errors and remove watsonx conditional logic * fix(watsonx/): fix chat route post completion route refactor * refactor(watsonx/embed): refactor watsonx to use base llm http handler for embedding calls as well * refactor(base.py): remove requests library usage from litellm * build(pyproject.toml): remove requests library usage * fix: fix linting errors * fix: fix linting errors * fix(types/utils.py): fix validation errors for modelresponsestream * fix(replicate/handler.py): fix linting errors * fix(litellm_logging.py): handle modelresponsestream object * fix(streaming_handler.py): fix modelresponsestream args * fix: remove unused imports * test: fix test * fix: fix test * test: fix test * test: fix tests * test: fix test * test: fix patch target * test: fix test	2024-12-22 07:21:25 -08:00
Krish Dholakia	4f3ddebf81	Litellm dev 2024 12 20 p1 (#7335 ) * fix(utils.py): e2e azure tts cost tracking working moves tts response obj to include hidden params (allows for litellm call id, etc. to be sent in response headers) ; fixes spend_Tracking_utils logging payload to account for non-base model use-case Fixes https://github.com/BerriAI/litellm/issues/7223 * fix: fix linting errors * build(model_prices_and_context_window.json): add bedrock llama 3.3 Closes https://github.com/BerriAI/litellm/issues/7329 * fix(openai.py): fix return type for sync openai httpx response * test: update test * fix(spend_tracking_utils.py): fix if check * fix(spend_tracking_utils.py): fix if check * test: improve debugging for test * fix: fix import	2024-12-20 21:22:31 -08:00
Krish Dholakia	61b4c41c3c	Litellm dev 12 20 2024 p3 (#7339 ) * fix(proxy_track_cost_callback.py): log to db if only end user param given * fix: allows for jwt-auth based end user id spend tracking to work * fix(utils.py): fix 'get_end_user_id_for_cost_tracking' to use 'user_api_key_end_user_id' more stable - works with jwt-auth based end user tracking as well * test(test_jwt.py): add e2e unit test to confirm end user cost tracking works for spend logs * test: update test to use end_user api key hash param * fix(langfuse.py): support end user cost tracking via jwt auth + langfuse logs end user to langfuse if decoded from jwt token * fix: fix linting errors * test: fix test * test: fix test * fix: fix end user id extraction * fix: run test earlier	2024-12-20 21:13:32 -08:00
Krish Dholakia	b026230b0a	Litellm dev 2024 12 19 p3 (#7322 ) * fix(utils.py): remove unsupported optional params (if drop_params=True) before passing into map openai params Fixes https://github.com/BerriAI/litellm/issues/7242 * test: new test for langfuse prompt management hook Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296 * feat(main.py): add 'get_chat_completion_prompt' customlogger hook allows for langfuse prompt management Addresses https://github.com/BerriAI/litellm/issues/3893#issuecomment-2549080296 * feat(langfuse_prompt_management.py): working e2e langfuse prompt management works with `langfuse/` route * feat(main.py): initial tracing for dynamic langfuse params allows admin to specify langfuse keys by model in model_list * feat(main.py): support passing langfuse credentials dynamically * fix(langfuse_prompt_management.py): create langfuse client based on dynamic callback params allows dynamic langfuse params to work * fix: fix linting errors * docs(prompt_management.md): refactor docs for sdk + proxy prompt management tutorial * docs(prompt_management.md): cleanup doc * docs: cleanup topnav * docs(prompt_management.md): update docs to be easier to use * fix: remove unused imports * docs(prompt_management.md): add architectural overview doc * fix(litellm_logging.py): fix dynamic param passing * fix(langfuse_prompt_management.py): fix linting errors * fix: fix linting errors * fix: use typing_extensions for typealias to ensure python3.8 compatibility * test: use stream_options in test to account for tiktoken diff * fix: improve import error message, and check run test earlier	2024-12-20 13:30:16 -08:00
Ishaan Jaff	6641e75e0c	(feat) add infinity rerank models (#7321 ) * Support Infinity Reranker (custom reranking models) (#7247) * Support Infinity Reranker * Clean code * Included transformation.py * Clean code * Added Infinity reranker test * Clean code --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * transform_rerank_response * update handler.py * infinity rerank updates * ci/cd run again * add infinity unit tests * docs add instruction on how to add a new provider for rerank --------- Co-authored-by: Hao Shan <53949959+haoshan98@users.noreply.github.com>	2024-12-19 18:30:28 -08:00
Ishaan Jaff	b0738fd439	(code refactor) - Add `BaseRerankConfig`. Use `BaseRerankConfig` for `cohere/rerank` and `azure_ai/rerank` (#7319 ) * add base rerank config * working sync cohere rerank * update rerank types * update base rerank config * remove old rerank * add new cohere handler.py * add cohere rerank transform * add get_provider_rerank_config * add rerank to base llm http handler * add rerank utils * add arerank to llm http handler.py * add AzureAIRerankConfig * updates rerank config * update test rerank * fix unused imports * update get_provider_rerank_config * test_basic_rerank_caching * fix unused import * test rerank	2024-12-19 17:03:34 -08:00
Ishaan Jaff	6220e17ebf	(feat proxy) v2 - model max budgets (#7302 ) * clean up unused code * add _PROXY_VirtualKeyModelMaxBudgetLimiter * adjust type imports * working _PROXY_VirtualKeyModelMaxBudgetLimiter * fix user_api_key_model_max_budget * fix user_api_key_model_max_budget * update naming * update naming * fix changes to RouterBudgetLimiting * test_call_with_key_over_model_budget * test_call_with_key_over_model_budget * handle _get_request_model_budget_config * e2e test for test_call_with_key_over_model_budget * clean up test * run ci/cd again * add validate_model_max_budget * docs fix * update doc * add e2e testing for _PROXY_VirtualKeyModelMaxBudgetLimiter * test_unit_test_max_model_budget_limiter.py	2024-12-18 19:42:46 -08:00
Krish Dholakia	1a4910f6c0	fix(health.md): add rerank model health check information (#7295 ) * fix(health.md): add rerank model health check information * build(model_prices_and_context_window.json): add gemini 2.0 for google ai studio - pricing + commercial rate limits * build(model_prices_and_context_window.json): add gemini-2.0 supports audio output = true * docs(team_model_add.md): clarify allowing teams to add models is an enterprise feature * fix(o1_transformation.py): add support for 'n', 'response_format' and 'stop' params for o1 and 'stream_options' param for o1-mini * build(model_prices_and_context_window.json): add 'supports_system_message' to supporting openai models needed as o1-preview, and o1-mini models don't support 'system message * fix(o1_transformation.py): translate system message based on if o1 model supports it * fix(o1_transformation.py): return 'stream' param support if o1-mini/o1-preview o1 currently doesn't support streaming, but the other model versions do Fixes https://github.com/BerriAI/litellm/issues/7292 * fix(o1_transformation.py): return tool calling/response_format in supported params if model map says so Fixes https://github.com/BerriAI/litellm/issues/7292 * fix: fix linting errors * fix: update '_transform_messages' * fix(o1_transformation.py): fix provider passed for supported param checks * test(base_llm_unit_tests.py): skip test if api takes >5s to respond * fix(utils.py): return false in 'supports_factory' if can't find value * fix(o1_transformation.py): always return stream + stream_options as supported params + handle stream options being passed in for azure o1 * feat(openai.py): support stream faking natively in openai handler Allows o1 calls to be faked for just the "o1" model, allows native streaming for o1-mini, o1-preview Fixes https://github.com/BerriAI/litellm/issues/7292 * fix(openai.py): use inference param instead of original optional param	2024-12-18 19:18:10 -08:00
Ishaan Jaff	70883bc1b8	(feat - proxy) Add `status_code` to `litellm_proxy_total_requests_metric_total` (#7293 ) * fix _select_model_name_for_cost_calc docstring * add STATUS_CODE to prometheus * test prometheus unit tests * test_prometheus_unit_tests.py * update Proxy Level Tracking Metrics docs * fix test_proxy_failure_metrics * fix test_proxy_failure_metrics	2024-12-18 15:55:02 -08:00
Krish Dholakia	050499ec8f	Litellm dev readd prompt caching (#7299 ) * fix(router.py): re-add saving model id on prompt caching valid successful deployment * fix(router.py): introduce optional pre_call_checks isolate prompt caching logic in a separate file * fix(prompt_caching_deployment_check.py): fix import * fix(router.py): new 'async_filter_deployments' event hook allows custom logger to filter deployments returned to routing strategy * feat(prompt_caching_deployment_check.py): initial working commit of prompt caching based routing * fix(cooldown_callbacks.py): fix linting error * fix(budget_limiter.py): move budget logger to async_filter_deployment hook * test: add unit test * test(test_router_helper_utils.py): add unit testing * fix(budget_limiter.py): fix linting errors * docs(config_settings.md): add 'optional_pre_call_checks' to router_settings param docs	2024-12-18 15:13:49 -08:00
Krish Dholakia	f966e279a6	LiteLLM Minor Fixes & Improvements (12/16/2024) - p1 (#7263 ) * fix(factory.py): skip empty text blocks for bedrock user messages Fixes https://github.com/BerriAI/litellm/issues/7169 * Add support for Gemini 2.0 GoogleSearch tool (#7257) * Add support for google_search tool in gemini 2.0 * Add/modify tests * Fix grounding check * Remove 2.0 grounding test; exclude experimental model in VERTEX_MODELS_TO_NOT_TEST * Swap order of tools * DFix formatting * fix(get_api_base.py): return api base in streaming response Fixes https://github.com/BerriAI/litellm/issues/7249 Closes https://github.com/BerriAI/litellm/pull/7250 * fix(cost_calculator.py): only set base model to model if not none Fixes https://github.com/BerriAI/litellm/issues/7223 * fix(cost_calculator.py): enforce stricter order when picking model for cost calculation * fix(cost_calculator.py): fix '_select_model_name_for_cost_calc' to return model name with region name prefix if provided * fix(utils.py): fix 'get_model_info()' to handle edge case where model name starts with custom llm provider AND custom llm provider is given * fix(cost_calculator.py): handle `custom_llm_provider-` scenario * fix(cost_calculator.py): e2e working tts cost tracking ensures initial message is passed in, to cost calculator * fix(factory.py): suppress linting errors * fix(cost_calculator.py): strip llm provider from model name after selecting cost calc model * fix(litellm_logging.py): store initial request in 'input' field + accept base_model to be passed in litellm_params directly * test: handle none env var value in flaky test * fix(litellm_logging.py): fix linting errors --------- Co-authored-by: Sam B <samlingx@gmail.com>	2024-12-17 15:33:36 -08:00
Ishaan Jaff	2459f9735d	(feat) Add Tag-based budgets on litellm router / proxy (#7236 ) * add BudgetConfig * add _get_tags_from_request_kwargs * test_tag_budgets_e2e_test_expect_to_fail * add a check for request tags * fix _async_get_cache_keys_for_router_budget_limiting * fix test * fix _sync_in_memory_spend_with_redis * _async_get_cache_keys_for_router_budget_limiting * fix _init_tag_budgets * fix type casting * docs show error for tag budget limit hit * fix _get_tags_from_request_kwargs * fix undo change	2024-12-14 17:28:36 -08:00

1 2 3 4 5 ...

432 commits