Commit graph

80 commits

Author SHA1 Message Date
Ishaan Jaff
104e4cb1bc
[Feat] Add infinity embedding support (contributor pr) (#10196)
* Feature - infinity support for #8764 (#10009)

* Added support for infinity embeddings

* Added test cases

* Fixed tests and api base

* Updated docs and tests

* Removed unused import

* Updated signature

* Added support for infinity embeddings

* Added test cases

* Fixed tests and api base

* Updated docs and tests

* Removed unused import

* Updated signature

* Updated validate params

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* fix InfinityEmbeddingConfig

---------

Co-authored-by: Prathamesh Saraf <pratamesh1867@gmail.com>
2025-04-21 20:01:29 -07:00
Krish Dholakia
36308a31be
Gemini-2.5-flash - support reasoning cost calc + return reasoning content (#10141)
* build(model_prices_and_context_window.json): add vertex ai gemini-2.5-flash pricing

* build(model_prices_and_context_window.json): add gemini reasoning token pricing

* fix(vertex_and_google_ai_studio_gemini.py): support counting thinking tokens for gemini

allows accurate cost calc

* fix(utils.py): add reasoning token cost calc to generic cost calc

ensures gemini-2.5-flash cost calculation is accurate

* build(model_prices_and_context_window.json): mark gemini-2.5-flash as 'supports_reasoning'

* feat(gemini/): support 'thinking' + 'reasoning_effort' params + new unit tests

allow controlling thinking effort for gemini-2.5-flash models

* test: update unit testing

* feat(vertex_and_google_ai_studio_gemini.py): return reasoning content if given in gemini response

* test: update model name

* fix: fix ruff check

* test(test_spend_management_endpoints.py): update tests to be less sensitive to new keys / updates to usage object

* fix(vertex_and_google_ai_studio_gemini.py): fix translation
2025-04-19 09:20:52 -07:00
Krish Dholakia
2ed593e052
Updated cohere v2 passthrough (#9997)
* Add cohere `/v2/chat` pass-through cost tracking support (#8235)

* feat(cohere_passthrough_handler.py): initial working commit with cohere passthrough cost tracking

* fix(v2_transformation.py): support cohere /v2/chat endpoint

* fix: fix linting errors

* fix: fix import

* fix(v2_transformation.py): fix linting error

* test: handle openai exception change
2025-04-14 19:51:01 -07:00
Krrish Dholakia
aa2489d74f build(.gitignore): update gitignore
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 15s
Helm unit test / unit-test (push) Successful in 46s
2025-03-29 11:37:00 -07:00
NickGrab
b72fbdde74
Merge branch 'main' into litellm_8864-feature-vertex-anyOf-support 2025-03-28 10:25:04 -07:00
Nicholas Grabar
f68cc26f15 8864 Add support for anyOf union type while handling null fields 2025-03-25 22:37:28 -07:00
Tyler Hutcherson
7864cd1f76 update redisvl dependency 2025-03-24 08:42:11 -04:00
Krrish Dholakia
665fdfc788 feat(prisma_client.py): initial commit add prisma migration support to proxy 2025-03-19 14:26:59 -07:00
Krish Dholakia
88eedb22b9
vertex ai anthropic thinking param support (#8853)
* fix(vertex_llm_base.py): handle credentials passed in as dictionary

* fix(router.py): support vertex credentials as json dict

* test(test_vertex.py): allows easier testing

mock anthropic thinking response for vertex ai

* test(vertex_ai_partner_models/): don't remove "@" from model

breaks anthropic cost calculation

* test: move testing

* fix: fix linting error

* fix: fix linting error

* fix(vertex_ai_partner_models/main.py): split @ for codestral model

* test: fix test

* fix: fix stripping "@" on mistral models

* fix: fix test

* test: fix test
2025-02-26 21:37:18 -08:00
Krish Dholakia
b4e5c0de69
Improve rpm check on keys (#8301)
* fix(parallel_request_limiter.py): initial commit that solves the rpm limit check on keys

Fixes https://github.com/BerriAI/litellm/issues/6938

* fix(parallel_request_limiter.py): simpler approach - just increment RPM in pre call hook instead of on success

* fix(parallel_request_limiter.py): pass testing

* fix: fix linting error

* fix(parallel_request_limiter.py): fix parallel request check for keys
2025-02-05 20:23:08 -08:00
Krrish Dholakia
f70e8adc10 build: update .gitignore 2025-02-04 21:43:09 -08:00
Krish Dholakia
c17342ac5b
fix(openai/): allows 'reasoning_effort' param to be passed correctly (#8227)
* fix(openai/): allows 'reasoning_effort' param to be passed correctly

Fixes https://github.com/BerriAI/litellm/issues/8217

* test: update test to handle gemini token counter change

* fix(factory.py): fix bedrock http:// handling

* test: fix test

* test: update testing for new openai sdk
2025-02-03 22:39:10 -08:00
Krish Dholakia
c8494abdea
test(base_llm_unit_tests.py): add test to ensure drop params is respe… (#8224)
* test(base_llm_unit_tests.py): add test to ensure drop params is respected

* fix(types/prometheus.py): use typing_extensions for python3.8 compatibility

* build: add cherry picked commits
2025-02-03 16:04:44 -08:00
Ishaan Jaff
7f2742334c
(UI) - allow assigning wildcard models to a team / key (#8041)
* fix message.error

* fix add return_wildcard_routes

* ui edit modelAvailableCall

* fetchAvailableModelsForTeamOrKey

* ui set all models for a team

* ui define common helpers

* edit create key button

* fix viewing model display names

* fix editing team models

* update gitignore

* add jest testing for ui

* Revert "add jest testing for ui"

This reverts commit 98f9a3ebfd.
2025-01-27 18:06:22 -08:00
Krish Dholakia
744b7709ca
Litellm dev 01 23 2025 p2 (#7962)
* fix(ui/): revert user team key view

* fix(view_key_table.tsx): fix default team view - show all personal keys

* fix(navbar.tsx): fix custom logo

Fixes https://github.com/BerriAI/litellm/issues/7895

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2025-01-23 21:02:15 -08:00
Ishaan Jaff
fd5cd422f0
Revert "Remove UI build output" (#7861) 2025-01-18 12:54:25 -08:00
yujonglee
eb91ceb8d5 remore ui build 2025-01-18 16:55:45 +09:00
Ishaan Jaff
02875d4ae8
(fix) aiohttp_openai/ route - get to 1K RPS on single instance (#7539)
* ClientSession

* re use client_session

* _init_client_session

* fix aiohttp
2025-01-03 15:12:17 -08:00
Krish Dholakia
3671829e39
Complete 'requests' library removal (#7350)
All checks were successful
Read Version from pyproject.toml / read-version (push) Successful in 12s
* refactor: initial commit moving watsonx_text to base_llm_http_handler + clarifying new provider directory structure

* refactor(watsonx/completion/handler.py): move to using base llm http handler

removes 'requests' library usage

* fix(watsonx_text/transformation.py): fix result transformation

migrates to transformation.py, for usage with base llm http handler

* fix(streaming_handler.py): migrate watsonx streaming to transformation.py

ensures streaming works with base llm http handler

* fix(streaming_handler.py): fix streaming linting errors and remove watsonx conditional logic

* fix(watsonx/): fix chat route post completion route refactor

* refactor(watsonx/embed): refactor watsonx to use base llm http handler for embedding calls as well

* refactor(base.py): remove requests library usage from litellm

* build(pyproject.toml): remove requests library usage

* fix: fix linting errors

* fix: fix linting errors

* fix(types/utils.py): fix validation errors for modelresponsestream

* fix(replicate/handler.py): fix linting errors

* fix(litellm_logging.py): handle modelresponsestream object

* fix(streaming_handler.py): fix modelresponsestream args

* fix: remove unused imports

* test: fix test

* fix: fix test

* test: fix test

* test: fix tests

* test: fix test

* test: fix patch target

* test: fix test
2024-12-22 07:21:25 -08:00
Krish Dholakia
404bf2974b
Litellm dev 2024 12 20 p1 (#7335)
* fix(utils.py): e2e azure tts cost tracking working

moves tts response obj to include hidden params (allows for litellm call id, etc. to be sent in response headers) ; fixes spend_Tracking_utils logging payload to account for non-base model use-case

Fixes https://github.com/BerriAI/litellm/issues/7223

* fix: fix linting errors

* build(model_prices_and_context_window.json): add bedrock llama 3.3

Closes https://github.com/BerriAI/litellm/issues/7329

* fix(openai.py): fix return type for sync openai httpx response

* test: update test

* fix(spend_tracking_utils.py): fix if check

* fix(spend_tracking_utils.py): fix if check

* test: improve debugging for test

* fix: fix import
2024-12-20 21:22:31 -08:00
Krish Dholakia
f79365df6e
LiteLLM Minor Fixes & Improvements (10/30/2024) (#6519)
* refactor: move gemini translation logic inside the transformation.py file

easier to isolate the gemini translation logic

* fix(gemini-transformation): support multiple tool calls in message body

Merges https://github.com/BerriAI/litellm/pull/6487/files

* test(test_vertex.py): add remaining tests from https://github.com/BerriAI/litellm/pull/6487

* fix(gemini-transformation): return tool calls for multiple tool calls

* fix: support passing logprobs param for vertex + gemini

* feat(vertex_ai): add logprobs support for gemini calls

* fix(anthropic/chat/transformation.py): fix disable parallel tool use flag

* fix: fix linting error

* fix(_logging.py): log stacktrace information in json logs

Closes https://github.com/BerriAI/litellm/issues/6497

* fix(utils.py): fix mem leak for async stream + completion

Uses a global executor pool instead of creating a new thread on each request

Fixes https://github.com/BerriAI/litellm/issues/6404

* fix(factory.py): handle tool call + content in assistant message for bedrock

* fix: fix import

* fix(factory.py): maintain support for content as a str in assistant response

* fix: fix import

* test: cleanup test

* fix(vertex_and_google_ai_studio/): return none for content if no str value

* test: retry flaky tests

* (UI) Fix viewing members, keys in a team + added testing  (#6514)

* fix listing teams on ui

* LiteLLM Minor Fixes & Improvements (10/28/2024)  (#6475)

* fix(anthropic/chat/transformation.py): support anthropic disable_parallel_tool_use param

Fixes https://github.com/BerriAI/litellm/issues/6456

* feat(anthropic/chat/transformation.py): support anthropic computer tool use

Closes https://github.com/BerriAI/litellm/issues/6427

* fix(vertex_ai/common_utils.py): parse out '$schema' when calling vertex ai

Fixes issue when trying to call vertex from vercel sdk

* fix(main.py): add 'extra_headers' support for azure on all translation endpoints

Fixes https://github.com/BerriAI/litellm/issues/6465

* fix: fix linting errors

* fix(transformation.py): handle no beta headers for anthropic

* test: cleanup test

* fix: fix linting error

* fix: fix linting errors

* fix: fix linting errors

* fix(transformation.py): handle dummy tool call

* fix(main.py): fix linting error

* fix(azure.py): pass required param

* LiteLLM Minor Fixes & Improvements (10/24/2024) (#6441)

* fix(azure.py): handle /openai/deployment in azure api base

* fix(factory.py): fix faulty anthropic tool result translation check

Fixes https://github.com/BerriAI/litellm/issues/6422

* fix(gpt_transformation.py): add support for parallel_tool_calls to azure

Fixes https://github.com/BerriAI/litellm/issues/6440

* fix(factory.py): support anthropic prompt caching for tool results

* fix(vertex_ai/common_utils): don't pop non-null required field

Fixes https://github.com/BerriAI/litellm/issues/6426

* feat(vertex_ai.py): support code_execution tool call for vertex ai + gemini

Closes https://github.com/BerriAI/litellm/issues/6434

* build(model_prices_and_context_window.json): Add 'supports_assistant_prefill' for bedrock claude-3-5-sonnet v2 models

Closes https://github.com/BerriAI/litellm/issues/6437

* fix(types/utils.py): fix linting

* test: update test to include required fields

* test: fix test

* test: handle flaky test

* test: remove e2e test - hitting gemini rate limits

* Litellm dev 10 26 2024 (#6472)

* docs(exception_mapping.md): add missing exception types

Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183

* fix(main.py): register custom model pricing with specific key

Ensure custom model pricing is registered to the specific model+provider key combination

* test: make testing more robust for custom pricing

* fix(redis_cache.py): instrument otel logging for sync redis calls

ensures complete coverage for all redis cache calls

* (Testing) Add unit testing for DualCache - ensure in memory cache is used when expected  (#6471)

* test test_dual_cache_get_set

* unit testing for dual cache

* fix async_set_cache_sadd

* test_dual_cache_local_only

* redis otel tracing + async support for latency routing (#6452)

* docs(exception_mapping.md): add missing exception types

Fixes https://github.com/Aider-AI/aider/issues/2120#issuecomment-2438971183

* fix(main.py): register custom model pricing with specific key

Ensure custom model pricing is registered to the specific model+provider key combination

* test: make testing more robust for custom pricing

* fix(redis_cache.py): instrument otel logging for sync redis calls

ensures complete coverage for all redis cache calls

* refactor: pass parent_otel_span for redis caching calls in router

allows for more observability into what calls are causing latency issues

* test: update tests with new params

* refactor: ensure e2e otel tracing for router

* refactor(router.py): add more otel tracing acrosss router

catch all latency issues for router requests

* fix: fix linting error

* fix(router.py): fix linting error

* fix: fix test

* test: fix tests

* fix(dual_cache.py): pass ttl to redis cache

* fix: fix param

* fix(dual_cache.py): set default value for parent_otel_span

* fix(transformation.py): support 'response_format' for anthropic calls

* fix(transformation.py): check for cache_control inside 'function' block

* fix: fix linting error

* fix: fix linting errors

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

---------

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* ui new build

* Add retry strat (#6520)

Signed-off-by: dbczumar <corey.zumar@databricks.com>

* (fix) slack alerting - don't spam the failed cost tracking alert for the same model  (#6543)

* fix use failing_model as cache key for failed_tracking_alert

* fix use standard logging payload for getting response cost

* fix  kwargs.get("response_cost")

* fix getting response cost

* (feat) add XAI ChatCompletion Support  (#6373)

* init commit for XAI

* add full logic for xai chat completion

* test_completion_xai

* docs xAI

* add xai/grok-beta

* test_xai_chat_config_get_openai_compatible_provider_info

* test_xai_chat_config_map_openai_params

* add xai streaming test

---------

Signed-off-by: dbczumar <corey.zumar@databricks.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Corey Zumar <39497902+dbczumar@users.noreply.github.com>
2024-11-02 00:44:32 +05:30
Krish Dholakia
e4dcd6f745
fix(navbar.tsx): only show 'get enterprise license' if user is not already a premium user (#5568) 2024-09-06 17:14:30 -07:00
Krrish Dholakia
3630896fde fix(team_endpoints.py): fix check 2024-07-16 22:05:48 -07:00
Krrish Dholakia
a4bea47a2d fix(router.py): log rejected router requests to langfuse
Fixes issue where rejected requests weren't being logged
2024-06-24 17:52:01 -07:00
Krrish Dholakia
9704e1cde0 test(test_alangfuse.py): fix test to flush in order 2024-06-20 10:22:15 -07:00
Krrish Dholakia
e66b3d264f fix(factory.py): handle bedrock claude image url's 2024-06-07 10:04:03 -07:00
Krrish Dholakia
52a2f5150c fix(utils.py): fix cost calculation for openai-compatible streaming object 2024-06-04 10:36:25 -07:00
Krrish Dholakia
3d18897d69 feat(router.py): enable filtering model group by 'allowed_model_region' 2024-05-08 22:10:17 -07:00
Krrish Dholakia
db666b01e5 feat(proxy_server.py): add CRUD endpoints for 'end_user' management
allow admin to specify region + default models for end users
2024-05-08 18:50:36 -07:00
Krish Dholakia
303e0c6226
Revert "* feat(factory.py): add support for merging consecutive messages of one role when separated with empty message of another role" 2024-05-07 21:42:18 -07:00
nkvch
389530efb4 * chore(.gitignore): add 'venv' to the list of ignored files/directories
* fix(test_completion.py): fix import order and remove unused imports
* feat(test_completion.py): add test for empty assistant message in completion_claude_3_empty_message()
2024-05-07 12:51:30 +02:00
Ishaan Jaff
df8e33739d
Revert "Add return_exceptions to litellm.batch_completion" 2024-05-04 13:01:17 -07:00
ffreemt
a7ec1772b1 Add litellm\tests\test_batch_completion_return_exceptions.py 2024-05-03 11:28:38 +08:00
Krrish Dholakia
160acc085a fix(router.py): fix default retry logic 2024-04-25 11:57:27 -07:00
Krrish Dholakia
bae6f41017 build(add-fallbacks-on-UI): allows admin to add fallbacks on the UI 2024-04-24 15:40:02 -07:00
Krrish Dholakia
76bd667b12 fix(caching.py): fix async batch redis get request 2024-04-10 21:37:45 -07:00
Krrish Dholakia
f4d34673d0 fix: update gitignore 2024-04-10 20:56:39 -07:00
Ishaan Jaff
9b7db7ea47 gitignore updates 2024-04-08 18:59:23 -07:00
Ishaan Jaff
1a96f52795 gitignore kub yaml 2024-04-08 18:30:28 -07:00
Daniel Fjeldstad
255dbdedff
Update .gitignore
Includes /node_modules
2024-04-02 21:31:57 +02:00
Debdut Chakraborty
7a1b3ca30d feat(helm-chart): redis as cache managed by chart 2024-03-10 01:53:28 +05:30
Lunik
880213d4a6
Refresh Helm chart structure
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-02-16 22:59:14 +01:00
Krrish Dholakia
77fe71ee08 fix(utils.py): support together ai function calling 2024-02-05 15:30:44 -08:00
Shaun Maher
fb4c228c39 Merge upstream .gitignore changes 2024-01-30 09:38:54 +11:00
Shaun Maher
4d0e369e7a Authored a Helm chart for LiteLLM. Added GitHub workflows/actions to build and push the helm chart to the ghcr.io OCI registry. 2024-01-25 11:53:59 +11:00
ishaan-jaff
0c25d2e157 (chore) gitignore 2024-01-15 20:42:30 -08:00
Krrish Dholakia
a7245dba07 build(Dockerfile): fixes the build time setup 2024-01-06 16:41:37 +05:30
ishaan-jaff
357c6c56bd Revert "build(Dockerfile): move prisma build to dockerfile"
This reverts commit 2741835605.
2024-01-06 09:51:44 +05:30
Krrish Dholakia
2741835605 build(Dockerfile): move prisma build to dockerfile
Seems to solve - https://github.com/BerriAI/litellm/issues/1321
2024-01-05 19:03:41 +05:30
Krrish Dholakia
2070a785a4 feat(utils.py): support google kms for secret management
https://github.com/BerriAI/litellm/issues/1235
2023-12-26 15:39:40 +05:30