Krish Dholakia
e9aa492af3
LiteLLM Minor Fixes & Improvement (11/14/2024) ( #6730 )
...
* fix(ollama.py): fix get model info request
Fixes https://github.com/BerriAI/litellm/issues/6703
* feat(anthropic/chat/transformation.py): support passing user id to anthropic via openai 'user' param
* docs(anthropic.md): document all supported openai params for anthropic
* test: fix tests
* fix: fix tests
* feat(jina_ai/): add rerank support
Closes https://github.com/BerriAI/litellm/issues/6691
* test: handle service unavailable error
* fix(handler.py): refactor together ai rerank call
* test: update test to handle overloaded error
* test: fix test
* Litellm router trace (#6742 )
* feat(router.py): add trace_id to parent functions - allows tracking retry/fallbacks
* feat(router.py): log trace id across retry/fallback logic
allows grouping llm logs for the same request
* test: fix tests
* fix: fix test
* fix(transformation.py): only set non-none stop_sequences
* Litellm router disable fallbacks (#6743 )
* bump: version 1.52.6 → 1.52.7
* feat(router.py): enable dynamically disabling fallbacks
Allows for enabling/disabling fallbacks per key
* feat(litellm_pre_call_utils.py): support setting 'disable_fallbacks' on litellm key
* test: fix test
* fix(exception_mapping_utils.py): map 'model is overloaded' to internal server error
* test: handle gemini error
* test: fix test
* fix: new run
2024-11-15 01:02:54 +05:30
Krish Dholakia
3a6ba0b955
Litellm perf improvements 3 ( #6573 )
...
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
2024-11-05 03:51:26 +05:30
Krish Dholakia
c03e5da41f
LiteLLM Minor Fixes & Improvements (10/24/2024) ( #6421 )
...
* fix(utils.py): support passing dynamic api base to validate_environment
Returns True if just api base is required and api base is passed
* fix(litellm_pre_call_utils.py): feature flag sending client headers to llm api
Fixes https://github.com/BerriAI/litellm/issues/6410
* fix(anthropic/chat/transformation.py): return correct error message
* fix(http_handler.py): add error response text in places where we expect it
* fix(factory.py): handle base case of no non-system messages to bedrock
Fixes https://github.com/BerriAI/litellm/issues/6411
* feat(cohere/embed): Support cohere image embeddings
Closes https://github.com/BerriAI/litellm/issues/6413
* fix(__init__.py): fix linting error
* docs(supported_embedding.md): add image embedding example to docs
* feat(cohere/embed): use cohere embedding returned usage for cost calc
* build(model_prices_and_context_window.json): add embed-english-v3.0 details (image cost + 'supports_image_input' flag)
* fix(cohere_transformation.py): fix linting error
* test(test_proxy_server.py): cleanup test
* test: cleanup test
* fix: fix linting errors
2024-10-25 15:55:56 -07:00
Krish Dholakia
9fccf829b1
feat(litellm_pre_call_utils.py): support 'add_user_information_to_llm… ( #6390 )
...
* feat(litellm_pre_call_utils.py): support 'add_user_information_to_llm_headers' param
enables passing user info to backend llm (user request for custom vllm server)
* fix(litellm_logging.py): fix linting error
2024-10-24 22:03:16 -07:00
Ishaan Jaff
610974b4fc
(code quality) add ruff check PLR0915 for too-many-statements
( #6309 )
...
* ruff add PLR0915
* add noqa for PLR0915
* fix noqa
* add # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
* add # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
* # noqa: PLR0915
2024-10-18 15:36:49 +05:30
Krish Dholakia
38a9a106d2
LiteLLM Minor Fixes & Improvements (10/16/2024) ( #6265 )
...
* fix(caching_handler.py): handle positional arguments in add cache logic
Fixes https://github.com/BerriAI/litellm/issues/6264
* feat(litellm_pre_call_utils.py): allow forwarding openai org id to backend client
https://github.com/BerriAI/litellm/issues/6237
* docs(configs.md): add 'forward_openai_org_id' to docs
* fix(proxy_server.py): return model info if user_model is set
Fixes https://github.com/BerriAI/litellm/issues/6233
* fix(hosted_vllm/chat/transformation.py): don't set tools unless non-none
* fix(openai.py): improve debug log for openai 'str' error
Addresses https://github.com/BerriAI/litellm/issues/6272
* fix(proxy_server.py): fix linting error
* fix(proxy_server.py): fix linting errors
* test: skip WIP test
* docs(openai.md): add docs on passing openai org id from client to openai
2024-10-16 22:16:23 -07:00
Krish Dholakia
54ebdbf7ce
LiteLLM Minor Fixes & Improvements (10/15/2024) ( #6242 )
...
* feat(litellm_pre_call_utils.py): support forwarding request headers to backend llm api
* fix(litellm_pre_call_utils.py): handle custom litellm key header
* test(router_code_coverage.py): check if all router functions are dire… (#6186 )
* test(router_code_coverage.py): check if all router functions are directly tested
prevent regressions
* docs(configs.md): document all environment variables (#6185 )
* docs: make it easier to find anthropic/openai prompt caching doc
* aded codecov yml (#6207 )
* fix codecov.yaml
* run ci/cd again
* (refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 )
* use folder for caching
* fix importing caching
* fix clickhouse pyright
* fix linting
* fix correctly pass kwargs and args
* fix test case for embedding
* fix linting
* fix embedding caching logic
* fix refactor handle utils.py
* fix test_embedding_caching_azure_individual_items_reordered
* (feat) prometheus have well defined latency buckets (#6211 )
* fix prometheus have well defined latency buckets
* use a well define latency bucket
* use types file for prometheus logging
* add test for LATENCY_BUCKETS
* fix prom testing
* fix config.yml
* (refactor caching) use LLMCachingHandler for caching streaming responses (#6210 )
* use folder for caching
* fix importing caching
* fix clickhouse pyright
* fix linting
* fix correctly pass kwargs and args
* fix test case for embedding
* fix linting
* fix embedding caching logic
* fix refactor handle utils.py
* refactor async set stream cache
* fix linting
* bump (#6187 )
* update code cov yaml
* fix config.yml
* add caching component to code cov
* fix config.yml ci/cd
* add coverage for proxy auth
* (refactor caching) use common `_retrieve_from_cache` helper (#6212 )
* use folder for caching
* fix importing caching
* fix clickhouse pyright
* fix linting
* fix correctly pass kwargs and args
* fix test case for embedding
* fix linting
* fix embedding caching logic
* fix refactor handle utils.py
* refactor async set stream cache
* fix linting
* refactor - use _retrieve_from_cache
* refactor use _convert_cached_result_to_model_response
* fix linting errors
* bump: version 1.49.2 → 1.49.3
* fix code cov components
* test(test_router_helpers.py): add router component unit tests
* test: add additional router tests
* test: add more router testing
* test: add more router testing + more mock functions
* ci(router_code_coverage.py): fix check
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
* bump: version 1.49.3 → 1.49.4
* (refactor) use helper function `_assemble_complete_response_from_streaming_chunks` to assemble complete responses in caching and logging callbacks (#6220 )
* (refactor) use _assemble_complete_response_from_streaming_chunks
* add unit test for test_assemble_complete_response_from_streaming_chunks_1
* fix assemble complete_streaming_response
* config add logging_testing
* add logging_coverage in codecov
* test test_assemble_complete_response_from_streaming_chunks_3
* add unit tests for _assemble_complete_response_from_streaming_chunks
* fix remove unused / junk function
* add test for streaming_chunks when error assembling
* (refactor) OTEL - use safe_set_attribute for setting attributes (#6226 )
* otel - use safe_set_attribute for setting attributes
* fix OTEL only use safe_set_attribute
* (fix) prompt caching cost calculation OpenAI, Azure OpenAI (#6231 )
* fix prompt caching cost calculation
* fix testing for prompt cache cost calc
* fix(allowed_model_region): allow us as allowed region (#6234 )
* test(router_code_coverage.py): check if all router functions are dire… (#6186 )
* test(router_code_coverage.py): check if all router functions are directly tested
prevent regressions
* docs(configs.md): document all environment variables (#6185 )
* docs: make it easier to find anthropic/openai prompt caching doc
* aded codecov yml (#6207 )
* fix codecov.yaml
* run ci/cd again
* (refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 )
* use folder for caching
* fix importing caching
* fix clickhouse pyright
* fix linting
* fix correctly pass kwargs and args
* fix test case for embedding
* fix linting
* fix embedding caching logic
* fix refactor handle utils.py
* fix test_embedding_caching_azure_individual_items_reordered
* (feat) prometheus have well defined latency buckets (#6211 )
* fix prometheus have well defined latency buckets
* use a well define latency bucket
* use types file for prometheus logging
* add test for LATENCY_BUCKETS
* fix prom testing
* fix config.yml
* (refactor caching) use LLMCachingHandler for caching streaming responses (#6210 )
* use folder for caching
* fix importing caching
* fix clickhouse pyright
* fix linting
* fix correctly pass kwargs and args
* fix test case for embedding
* fix linting
* fix embedding caching logic
* fix refactor handle utils.py
* refactor async set stream cache
* fix linting
* bump (#6187 )
* update code cov yaml
* fix config.yml
* add caching component to code cov
* fix config.yml ci/cd
* add coverage for proxy auth
* (refactor caching) use common `_retrieve_from_cache` helper (#6212 )
* use folder for caching
* fix importing caching
* fix clickhouse pyright
* fix linting
* fix correctly pass kwargs and args
* fix test case for embedding
* fix linting
* fix embedding caching logic
* fix refactor handle utils.py
* refactor async set stream cache
* fix linting
* refactor - use _retrieve_from_cache
* refactor use _convert_cached_result_to_model_response
* fix linting errors
* bump: version 1.49.2 → 1.49.3
* fix code cov components
* test(test_router_helpers.py): add router component unit tests
* test: add additional router tests
* test: add more router testing
* test: add more router testing + more mock functions
* ci(router_code_coverage.py): fix check
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
* bump: version 1.49.3 → 1.49.4
* (refactor) use helper function `_assemble_complete_response_from_streaming_chunks` to assemble complete responses in caching and logging callbacks (#6220 )
* (refactor) use _assemble_complete_response_from_streaming_chunks
* add unit test for test_assemble_complete_response_from_streaming_chunks_1
* fix assemble complete_streaming_response
* config add logging_testing
* add logging_coverage in codecov
* test test_assemble_complete_response_from_streaming_chunks_3
* add unit tests for _assemble_complete_response_from_streaming_chunks
* fix remove unused / junk function
* add test for streaming_chunks when error assembling
* (refactor) OTEL - use safe_set_attribute for setting attributes (#6226 )
* otel - use safe_set_attribute for setting attributes
* fix OTEL only use safe_set_attribute
* fix(allowed_model_region): allow us as allowed region
---------
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
* fix(litellm_pre_call_utils.py): support 'us' region routing + fix header forwarding to filter on `x-` headers
* docs(customer_routing.md): fix region-based routing example
* feat(azure.py): handle empty arguments function call - azure
Closes https://github.com/BerriAI/litellm/issues/6241
* feat(guardrails_ai.py): support guardrails ai integration
Adds support for on-prem guardrails via guardrails ai
* fix(proxy/utils.py): prevent sql injection attack
Fixes https://huntr.com/bounties/a4f6d357-5b44-4e00-9cac-f1cc351211d2
* fix: fix linting errors
* fix(litellm_pre_call_utils.py): don't log litellm api key in proxy server request headers
* fix(litellm_pre_call_utils.py): don't forward stainless headers
* docs(guardrails_ai.md): add guardrails ai quick start to docs
* test: handle flaky test
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: yujonglee <yujonglee.dev@gmail.com>
Co-authored-by: Marcus Elwin <marcus@elwin.com>
2024-10-16 07:32:06 -07:00
Krish Dholakia
0b30e212da
LiteLLM Minor Fixes & Improvements (09/27/2024) ( #5938 )
...
* fix(langfuse.py): prevent double logging requester metadata
Fixes https://github.com/BerriAI/litellm/issues/5935
* build(model_prices_and_context_window.json): add mistral pixtral cost tracking
Closes https://github.com/BerriAI/litellm/issues/5837
* handle streaming for azure ai studio error
* [Perf Proxy] parallel request limiter - use one cache update call (#5932 )
* fix parallel request limiter - use one cache update call
* ci/cd run again
* run ci/cd again
* use docker username password
* fix config.yml
* fix config
* fix config
* fix config.yml
* ci/cd run again
* use correct typing for batch set cache
* fix async_set_cache_pipeline
* fix only check user id tpm / rpm limits when limits set
* fix test_openai_azure_embedding_with_oidc_and_cf
* fix(groq/chat/transformation.py): Fixes https://github.com/BerriAI/litellm/issues/5839
* feat(anthropic/chat.py): return 'retry-after' headers from anthropic
Fixes https://github.com/BerriAI/litellm/issues/4387
* feat: raise validation error if message has tool calls without passing `tools` param for anthropic/bedrock
Closes https://github.com/BerriAI/litellm/issues/5747
* [Feature]#5940, add max_workers parameter for the batch_completion (#5947 )
* handle streaming for azure ai studio error
* bump: version 1.48.2 → 1.48.3
* docs(data_security.md): add legal/compliance faq's
Make it easier for companies to use litellm
* docs: resolve imports
* [Feature]#5940, add max_workers parameter for the batch_completion method
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local>
* fix(converse_transformation.py): fix default message value
* fix(utils.py): fix get_model_info to handle finetuned models
Fixes issue for standard logging payloads, where model_map_value was null for finetuned openai models
* fix(litellm_pre_call_utils.py): add debug statement for data sent after updating with team/key callbacks
* fix: fix linting errors
* fix(anthropic/chat/handler.py): fix cache creation input tokens
* fix(exception_mapping_utils.py): fix missing imports
* fix(anthropic/chat/handler.py): fix usage block translation
* test: fix test
* test: fix tests
* style(types/utils.py): trigger new build
* test: fix test
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Jose Alberto Arango Sanchez <jose.arangos@udea.edu.co>
Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local>
2024-09-27 22:52:57 -07:00
Ishaan Jaff
036fce8f18
[Fix] Tag Based Routing not work with wildcard routing ( #5805 )
...
* allow using tag routing for free
* only enforce tags for teams / keys
2024-09-20 14:05:56 -07:00
Ishaan Jaff
7f4dfe434a
[Fix] o1-mini causes pydantic warnings on reasoning_tokens
( #5754 )
...
* add requester_metadata in standard logging payload
* log requester_metadata in metadata
* use StandardLoggingPayload for logging
* docs StandardLoggingPayload
* fix import
* include standard logging object in failure
* add test for requester metadata
* handle completion_tokens_details
* add test for completion_tokens_details
2024-09-17 20:23:14 -07:00
Krish Dholakia
98c34a7e27
LiteLLM Minor Fixes and Improvements (11/09/2024) ( #5634 )
...
* fix(caching.py): set ttl for async_increment cache
fixes issue where ttl for redis client was not being set on increment_cache
Fixes https://github.com/BerriAI/litellm/issues/5609
* fix(caching.py): fix increment cache w/ ttl for sync increment cache on redis
Fixes https://github.com/BerriAI/litellm/issues/5609
* fix(router.py): support adding retry policy + allowed fails policy via config.yaml
* fix(router.py): don't cooldown single deployments
No point, as there's no other deployment to loadbalance with.
* fix(user_api_key_auth.py): support setting allowed email domains on jwt tokens
Closes https://github.com/BerriAI/litellm/issues/5605
* docs(token_auth.md): add user upsert + allowed email domain to jwt auth docs
* fix(litellm_pre_call_utils.py): fix dynamic key logging when team id is set
Fixes issue where key logging would not be set if team metadata was not none
* fix(secret_managers/main.py): load environment variables correctly
Fixes issue where os.environ/ was not being loaded correctly
* test(test_router.py): fix test
* feat(spend_tracking_utils.py): support logging additional usage params - e.g. prompt caching values for deepseek
* test: fix tests
* test: fix test
* test: fix test
* test: fix test
* test: fix test
2024-09-11 22:36:06 -07:00
Ishaan Jaff
bbdcc75c60
fix log failures for key based logging
2024-09-09 16:33:06 -07:00
Ishaan Jaff
9b5164b38d
fix allow setting language per call to presidio
2024-09-04 12:46:59 -07:00
Krrish Dholakia
afb00a27cb
fix(litellm_pre_call_utils.py): don't override k-v pair sent in spend_logs_metadata by user
2024-08-23 07:10:18 -07:00
Krrish Dholakia
7aec6f0f2a
fix(litellm_pre_call_utils.py): handle dynamic keys via api correctly
2024-08-21 13:37:21 -07:00
Krrish Dholakia
04fc0bd7b3
feat(litellm_pre_call_utils.py): support passing tags/spend logs metadata from keys/team metadata to request
2024-08-21 08:13:36 -07:00
Ishaan Jaff
cea7b73015
enforece guardrails per API Key as enterprise
2024-08-20 17:34:28 -07:00
Krrish Dholakia
5f565806bd
fix(litellm_pre_call_utils.py): only pass api_version if set
2024-08-20 16:00:46 -07:00
Krish Dholakia
409306b266
Merge branch 'main' into litellm_fix_azure_api_version
2024-08-20 11:40:53 -07:00
Ishaan Jaff
aceab2669f
test guardrails with API Key
2024-08-20 08:40:00 -07:00
Krrish Dholakia
a85a932e25
fix(litellm_pre_call_utils.py): handle no query params in request
2024-08-19 21:09:03 -07:00
Ishaan Jaff
eb9da06033
feat - guardrails v2
2024-08-19 21:03:37 -07:00
Krrish Dholakia
2aba1f17cc
feat(langfuse_endpoints.py): support team based logging for langfuse pass-through endpoints
2024-08-19 21:03:37 -07:00
Ishaan Jaff
c7b3978655
Merge pull request #5288 from BerriAI/litellm_aporia_refactor
...
[Feat] V2 aporia guardrails litellm
2024-08-19 20:41:45 -07:00
Ishaan Jaff
8cd1963c11
feat - guardrails v2
2024-08-19 18:24:20 -07:00
Krrish Dholakia
f9640d8a58
feat(langfuse_endpoints.py): support team based logging for langfuse pass-through endpoints
2024-08-19 17:58:39 -07:00
Krrish Dholakia
49416e121c
feat(azure.py): support dynamic api versions
...
Closes https://github.com/BerriAI/litellm/issues/5228
2024-08-19 12:17:43 -07:00
Ishaan Jaff
6cb3675a06
fix using prompt caching on proxy
2024-08-15 20:12:11 -07:00
Krish Dholakia
22243c6571
Merge pull request #5176 from BerriAI/litellm_key_logging
...
Allow specifying langfuse project for logging in key metadata
2024-08-14 12:55:07 -07:00
Ishaan Jaff
0c6c350c23
feat log use_x_forwarded_for
2024-08-13 15:22:54 -07:00
Krrish Dholakia
93a1335e46
fix(litellm_pre_call_utils.py): support routing to logging project by api key
2024-08-12 21:21:40 -07:00
Ishaan Jaff
70c836623d
use itellm.forward_traceparent_to_llm_provider
2024-08-01 09:05:13 -07:00
Ishaan Jaff
56ce7e892d
fix batches inserting metadata
2024-07-26 18:08:54 -07:00
Ishaan Jaff
a71b60d005
Pass litellm proxy specific metadata
2024-07-23 15:31:30 -07:00
Ishaan Jaff
24ae0119d1
add debug logging for team callback settings
2024-07-23 08:41:05 -07:00
Ishaan Jaff
dcd8f7ebf2
control team callbacks using API
2024-07-22 18:29:21 -07:00
Ishaan Jaff
502b739b33
add tags to metadata
2024-07-18 21:55:53 -07:00
Ishaan Jaff
f3e0a89597
check if using tag based routing
2024-07-18 20:10:45 -07:00
Ishaan Jaff
b6e60d481e
fix remove previous code on free/paid tier
2024-07-18 19:24:13 -07:00
Ishaan Jaff
fda5578263
feat - enterprise
2024-07-18 17:15:47 -07:00
Ishaan Jaff
3dfeee03d0
fix pre call utils on embedding
2024-07-17 18:29:34 -07:00
Ishaan Jaff
12f207b499
feat - support /create assistants endpoint
2024-07-09 10:03:47 -07:00
Ishaan Jaff
626c630eaf
track user_ip address per request
2024-07-08 09:00:08 -07:00
Krrish Dholakia
20e39d6acc
fix(utils.py): cleanup 'additionalProperties=False' for tool calling with zod
...
Fixes issue with zod passing in additionalProperties=False, causing vertex ai / gemini calls to fail
2024-07-06 17:27:37 -07:00
Ishaan Jaff
3bcf9dd9fb
Merge branch 'main' into litellm_fix_in_mem_usage
2024-06-27 21:12:06 -07:00
Ishaan Jaff
413877d1c6
fix pre call utils adding extra headers
2024-06-27 21:03:36 -07:00
Ishaan Jaff
b9bc16590d
forward otel traceparent in request headers
2024-06-27 20:20:46 -07:00
Ishaan Jaff
b16b846711
forward otel traceparent in request headers
2024-06-26 12:31:28 -07:00
Ishaan Jaff
5e2af8236a
fix - thread create endpoints
2024-06-18 07:54:47 -07:00
Ishaan Jaff
9b340fb2f8
feat - add remaining budget for key on prometheus
2024-06-13 14:37:02 -07:00