forked from phoenix/litellm-mirror
* fix(pattern_matching_router.py): update model name using correct function
* fix(langfuse.py): metadata deepcopy can cause unhandled error (#6563)
Co-authored-by: seva <seva@inita.com>
* fix(stream_chunk_builder_utils.py): correctly set prompt tokens + log correct streaming usage
Closes https://github.com/BerriAI/litellm/issues/6488
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* fix ImageObject conversion (#6584)
* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* fix allow using 15 seconds for premium license check
* testing fix bedrock deprecated cohere.command-text-v14
* (feat) add `Predicted Outputs` for OpenAI (#6594)
* bump openai to openai==1.54.0
* add 'prediction' param
* testing fix bedrock deprecated cohere.command-text-v14
* test test_openai_prediction_param.py
* test_openai_prediction_param_with_caching
* doc Predicted Outputs
* doc Predicted Output
* (fix) Vertex Improve Performance when using `image_url` (#6593)
* fix transformation vertex
* test test_process_gemini_image
* test_image_completion_request
* testing fix - bedrock has deprecated cohere.command-text-v14
* fix vertex pdf
* bump: version 1.51.5 → 1.52.0
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check
* fix(lowest_tpm_rpm_v2.py): return headers in correct format
* test: update test
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* test: remove eol model
* fix(proxy_server.py): fix db config loading logic
* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten
* test: skip test if required env var is missing
* test: fix test
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
* test: mark flaky test
* test: handle anthropic api instability
* test(test_proxy_utils.py): add testing for db config update logic
* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version (#6597)
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* fix ImageObject conversion (#6584)
* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* fix allow using 15 seconds for premium license check
* testing fix bedrock deprecated cohere.command-text-v14
* (feat) add `Predicted Outputs` for OpenAI (#6594)
* bump openai to openai==1.54.0
* add 'prediction' param
* testing fix bedrock deprecated cohere.command-text-v14
* test test_openai_prediction_param.py
* test_openai_prediction_param_with_caching
* doc Predicted Outputs
* doc Predicted Output
* (fix) Vertex Improve Performance when using `image_url` (#6593)
* fix transformation vertex
* test test_process_gemini_image
* test_image_completion_request
* testing fix - bedrock has deprecated cohere.command-text-v14
* fix vertex pdf
* bump: version 1.51.5 → 1.52.0
* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com>
* fix(langfuse.py): fix linting errors
* fix: fix linting errors
* fix: fix casting error
* fix: fix typing error
* fix: add more tests
* fix(utils.py): fix return_processed_chunk_logic
* Revert "Update setuptools in docker and fastapi to latest verison, in order t…" (#6615)
This reverts commit 1a7f7bdfb7
.
* docs fix clarify team_id on team based logging
* doc fix team based logging with langfuse
* fix flake8 checks
* test: bump sleep time
* refactor: replace claude-instant-1.2 with haiku in testing
* fix(proxy_server.py): move to using sl payload in track_cost_callback
* fix(proxy_server.py): fix linting errors
* fix(proxy_server.py): fallback to kwargs(response_cost) if given
* test: remove claude-instant-1 from tests
* test: fix claude test
* docs fix clarify team_id on team based logging
* doc fix team based logging with langfuse
* build: remove lint.yml
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Vsevolod Karvetskiy <56288164+karvetskiy@users.noreply.github.com>
Co-authored-by: seva <seva@inita.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Jacob Hagstedt P Suorra <Jacobh2@users.noreply.github.com>
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com>
200 lines
6.5 KiB
Python
200 lines
6.5 KiB
Python
"""
|
|
Class to handle llm wildcard routing and regex pattern matching
|
|
"""
|
|
|
|
import copy
|
|
import re
|
|
from re import Match
|
|
from typing import Dict, List, Optional
|
|
|
|
from litellm import get_llm_provider
|
|
from litellm._logging import verbose_router_logger
|
|
|
|
|
|
class PatternMatchRouter:
|
|
"""
|
|
Class to handle llm wildcard routing and regex pattern matching
|
|
|
|
doc: https://docs.litellm.ai/docs/proxy/configs#provider-specific-wildcard-routing
|
|
|
|
This class will store a mapping for regex pattern: List[Deployments]
|
|
"""
|
|
|
|
def __init__(self):
|
|
self.patterns: Dict[str, List] = {}
|
|
|
|
def add_pattern(self, pattern: str, llm_deployment: Dict):
|
|
"""
|
|
Add a regex pattern and the corresponding llm deployments to the patterns
|
|
|
|
Args:
|
|
pattern: str
|
|
llm_deployment: str or List[str]
|
|
"""
|
|
# Convert the pattern to a regex
|
|
regex = self._pattern_to_regex(pattern)
|
|
if regex not in self.patterns:
|
|
self.patterns[regex] = []
|
|
self.patterns[regex].append(llm_deployment)
|
|
|
|
def _pattern_to_regex(self, pattern: str) -> str:
|
|
"""
|
|
Convert a wildcard pattern to a regex pattern
|
|
|
|
example:
|
|
pattern: openai/*
|
|
regex: openai/.*
|
|
|
|
pattern: openai/fo::*::static::*
|
|
regex: openai/fo::.*::static::.*
|
|
|
|
Args:
|
|
pattern: str
|
|
|
|
Returns:
|
|
str: regex pattern
|
|
"""
|
|
# # Replace '*' with '.*' for regex matching
|
|
# regex = pattern.replace("*", ".*")
|
|
# # Escape other special characters
|
|
# regex = re.escape(regex).replace(r"\.\*", ".*")
|
|
# return f"^{regex}$"
|
|
return re.escape(pattern).replace(r"\*", "(.*)")
|
|
|
|
def _return_pattern_matched_deployments(
|
|
self, matched_pattern: Match, deployments: List[Dict]
|
|
) -> List[Dict]:
|
|
new_deployments = []
|
|
for deployment in deployments:
|
|
new_deployment = copy.deepcopy(deployment)
|
|
new_deployment["litellm_params"]["model"] = (
|
|
PatternMatchRouter.set_deployment_model_name(
|
|
matched_pattern=matched_pattern,
|
|
litellm_deployment_litellm_model=deployment["litellm_params"][
|
|
"model"
|
|
],
|
|
)
|
|
)
|
|
new_deployments.append(new_deployment)
|
|
|
|
return new_deployments
|
|
|
|
def route(self, request: Optional[str]) -> Optional[List[Dict]]:
|
|
"""
|
|
Route a requested model to the corresponding llm deployments based on the regex pattern
|
|
|
|
loop through all the patterns and find the matching pattern
|
|
if a pattern is found, return the corresponding llm deployments
|
|
if no pattern is found, return None
|
|
|
|
Args:
|
|
request: Optional[str]
|
|
|
|
Returns:
|
|
Optional[List[Deployment]]: llm deployments
|
|
"""
|
|
try:
|
|
if request is None:
|
|
return None
|
|
for pattern, llm_deployments in self.patterns.items():
|
|
pattern_match = re.match(pattern, request)
|
|
if pattern_match:
|
|
return self._return_pattern_matched_deployments(
|
|
matched_pattern=pattern_match, deployments=llm_deployments
|
|
)
|
|
except Exception as e:
|
|
verbose_router_logger.debug(f"Error in PatternMatchRouter.route: {str(e)}")
|
|
|
|
return None # No matching pattern found
|
|
|
|
@staticmethod
|
|
def set_deployment_model_name(
|
|
matched_pattern: Match,
|
|
litellm_deployment_litellm_model: str,
|
|
) -> str:
|
|
"""
|
|
Set the model name for the matched pattern llm deployment
|
|
|
|
E.g.:
|
|
|
|
model_name: llmengine/* (can be any regex pattern or wildcard pattern)
|
|
litellm_params:
|
|
model: openai/*
|
|
|
|
if model_name = "llmengine/foo" -> model = "openai/foo"
|
|
"""
|
|
|
|
## BASE CASE: if the deployment model name does not contain a wildcard, return the deployment model name
|
|
if "*" not in litellm_deployment_litellm_model:
|
|
return litellm_deployment_litellm_model
|
|
|
|
wildcard_count = litellm_deployment_litellm_model.count("*")
|
|
|
|
# Extract all dynamic segments from the request
|
|
dynamic_segments = matched_pattern.groups()
|
|
|
|
if len(dynamic_segments) > wildcard_count:
|
|
raise ValueError(
|
|
f"More wildcards in the deployment model name than the pattern. Wildcard count: {wildcard_count}, dynamic segments count: {len(dynamic_segments)}"
|
|
)
|
|
|
|
# Replace the corresponding wildcards in the litellm model pattern with extracted segments
|
|
for segment in dynamic_segments:
|
|
litellm_deployment_litellm_model = litellm_deployment_litellm_model.replace(
|
|
"*", segment, 1
|
|
)
|
|
|
|
return litellm_deployment_litellm_model
|
|
|
|
def get_pattern(
|
|
self, model: str, custom_llm_provider: Optional[str] = None
|
|
) -> Optional[List[Dict]]:
|
|
"""
|
|
Check if a pattern exists for the given model and custom llm provider
|
|
|
|
Args:
|
|
model: str
|
|
custom_llm_provider: Optional[str]
|
|
|
|
Returns:
|
|
bool: True if pattern exists, False otherwise
|
|
"""
|
|
if custom_llm_provider is None:
|
|
try:
|
|
(
|
|
_,
|
|
custom_llm_provider,
|
|
_,
|
|
_,
|
|
) = get_llm_provider(model=model)
|
|
except Exception:
|
|
# get_llm_provider raises exception when provider is unknown
|
|
pass
|
|
return self.route(model) or self.route(f"{custom_llm_provider}/{model}")
|
|
|
|
def get_deployments_by_pattern(
|
|
self, model: str, custom_llm_provider: Optional[str] = None
|
|
) -> List[Dict]:
|
|
"""
|
|
Get the deployments by pattern
|
|
|
|
Args:
|
|
model: str
|
|
custom_llm_provider: Optional[str]
|
|
|
|
Returns:
|
|
List[Dict]: llm deployments matching the pattern
|
|
"""
|
|
pattern_match = self.get_pattern(model, custom_llm_provider)
|
|
if pattern_match:
|
|
return pattern_match
|
|
return []
|
|
|
|
|
|
# Example usage:
|
|
# router = PatternRouter()
|
|
# router.add_pattern('openai/*', [Deployment(), Deployment()])
|
|
# router.add_pattern('openai/fo::*::static::*', Deployment())
|
|
# print(router.route('openai/gpt-4')) # Output: [Deployment(), Deployment()]
|
|
# print(router.route('openai/fo::hi::static::hi')) # Output: [Deployment()]
|
|
# print(router.route('something/else')) # Output: None
|