mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-25 18:54:30 +00:00
LiteLLM Minor Fixes & Improvements (11/04/2024) (#6572)
* feat: initial commit for watsonx chat endpoint support Closes https://github.com/BerriAI/litellm/issues/6562 * feat(watsonx/chat/handler.py): support tool calling for watsonx Closes https://github.com/BerriAI/litellm/issues/6562 * fix(streaming_utils.py): return empty chunk instead of failing if streaming value is invalid dict ensures streaming works for ibm watsonx * fix(openai_like/chat/handler.py): ensure asynchttphandler is passed correctly for openai like calls * fix: ensure exception mapping works well for watsonx calls * fix(openai_like/chat/handler.py): handle async streaming correctly * feat(main.py): Make it clear when a user is passing an invalid message add validation for user content message Closes https://github.com/BerriAI/litellm/issues/6565 * fix: cleanup * fix(utils.py): loosen validation check, to just make sure content types are valid make litellm robust to future content updates * fix: fix linting erro * fix: fix linting errors * fix(utils.py): make validation check more flexible * test: handle langfuse list index out of range error * Litellm dev 11 02 2024 (#6561) * fix(dual_cache.py): update in-memory check for redis batch get cache Fixes latency delay for async_batch_redis_cache * fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set * feat(user_api_key_auth.py): add parent otel component for auth allows us to isolate how much latency is added by auth checks * perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task) reduces latency by 200ms * feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter) Reduces latency by 400-800ms * fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls reduces latency by 50-100ms * fix: fix linting error * fix(_service_logger.py): fix import * fix(user_api_key_auth.py): fix service logging * fix(dual_cache.py): don't pass 'self' * fix: fix python3.8 error * fix: fix init] * bump: version 1.51.4 → 1.51.5 * build(deps): bump cookie and express in /docs/my-website (#6566) Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together. Updates `cookie` from 0.6.0 to 0.7.1 - [Release notes](https://github.com/jshttp/cookie/releases) - [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1) Updates `express` from 4.20.0 to 4.21.1 - [Release notes](https://github.com/expressjs/express/releases) - [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md) - [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1) --- updated-dependencies: - dependency-name: cookie dependency-type: indirect - dependency-name: express dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs(virtual_keys.md): update Dockerfile reference (#6554) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> * (proxy fix) - call connect on prisma client when running setup (#6534) * critical fix - call connect on prisma client when running setup * fix test_proxy_server_prisma_setup * fix test_proxy_server_prisma_setup * Add 3.5 haiku (#6588) * feat: add claude-3-5-haiku-20241022 entries * feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models * add missing entries, remove vision * remove image token costs * Litellm perf improvements 3 (#6573) * perf: move writing key to cache, to background task * perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils adds 200ms on calls with pgdb connected * fix(litellm_pre_call_utils.py'): rename call_type to actual call used * perf(proxy_server.py): remove db logic from _get_config_from_file was causing db calls to occur on every llm request, if team_id was set on key * fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db reduces latency/call by ~100ms * fix(proxy_server.py): minor fix on existing_settings not incl alerting * fix(exception_mapping_utils.py): map databricks exception string * fix(auth_checks.py): fix auth check logic * test: correctly mark flaky test * fix(utils.py): handle auth token error for tokenizers.from_pretrained * build: fix map * build: fix map * build: fix json for model map * Litellm dev 11 02 2024 (#6561) * fix(dual_cache.py): update in-memory check for redis batch get cache Fixes latency delay for async_batch_redis_cache * fix(service_logger.py): fix race condition causing otel service logging to be overwritten if service_callbacks set * feat(user_api_key_auth.py): add parent otel component for auth allows us to isolate how much latency is added by auth checks * perf(parallel_request_limiter.py): move async_set_cache_pipeline (from max parallel request limiter) out of execution path (background task) reduces latency by 200ms * feat(user_api_key_auth.py): have user api key auth object return user tpm/rpm limits - reduces redis calls in downstream task (parallel_request_limiter) Reduces latency by 400-800ms * fix(parallel_request_limiter.py): use batch get cache to reduce user/key/team usage object calls reduces latency by 50-100ms * fix: fix linting error * fix(_service_logger.py): fix import * fix(user_api_key_auth.py): fix service logging * fix(dual_cache.py): don't pass 'self' * fix: fix python3.8 error * fix: fix init] * Litellm perf improvements 3 (#6573) * perf: move writing key to cache, to background task * perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils adds 200ms on calls with pgdb connected * fix(litellm_pre_call_utils.py'): rename call_type to actual call used * perf(proxy_server.py): remove db logic from _get_config_from_file was causing db calls to occur on every llm request, if team_id was set on key * fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db reduces latency/call by ~100ms * fix(proxy_server.py): minor fix on existing_settings not incl alerting * fix(exception_mapping_utils.py): map databricks exception string * fix(auth_checks.py): fix auth check logic * test: correctly mark flaky test * fix(utils.py): handle auth token error for tokenizers.from_pretrained * fix ImageObject conversion (#6584) * (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546) * unit test test_huggingface_text_completion_logprobs * fix return TextCompletionHandler convert_chat_to_text_completion * fix hf rest api * fix test_huggingface_text_completion_logprobs * fix linting errors * fix importLiteLLMResponseObjectHandler * fix test for LiteLLMResponseObjectHandler * fix test text completion * fix allow using 15 seconds for premium license check * testing fix bedrock deprecated cohere.command-text-v14 * (feat) add `Predicted Outputs` for OpenAI (#6594) * bump openai to openai==1.54.0 * add 'prediction' param * testing fix bedrock deprecated cohere.command-text-v14 * test test_openai_prediction_param.py * test_openai_prediction_param_with_caching * doc Predicted Outputs * doc Predicted Output * (fix) Vertex Improve Performance when using `image_url` (#6593) * fix transformation vertex * test test_process_gemini_image * test_image_completion_request * testing fix - bedrock has deprecated cohere.command-text-v14 * fix vertex pdf * bump: version 1.51.5 → 1.52.0 * fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577) * fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check * fix(lowest_tpm_rpm_v2.py): return headers in correct format * test: update test * build(deps): bump cookie and express in /docs/my-website (#6566) Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together. Updates `cookie` from 0.6.0 to 0.7.1 - [Release notes](https://github.com/jshttp/cookie/releases) - [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1) Updates `express` from 4.20.0 to 4.21.1 - [Release notes](https://github.com/expressjs/express/releases) - [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md) - [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1) --- updated-dependencies: - dependency-name: cookie dependency-type: indirect - dependency-name: express dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs(virtual_keys.md): update Dockerfile reference (#6554) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> * (proxy fix) - call connect on prisma client when running setup (#6534) * critical fix - call connect on prisma client when running setup * fix test_proxy_server_prisma_setup * fix test_proxy_server_prisma_setup * Add 3.5 haiku (#6588) * feat: add claude-3-5-haiku-20241022 entries * feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models * add missing entries, remove vision * remove image token costs * Litellm perf improvements 3 (#6573) * perf: move writing key to cache, to background task * perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils adds 200ms on calls with pgdb connected * fix(litellm_pre_call_utils.py'): rename call_type to actual call used * perf(proxy_server.py): remove db logic from _get_config_from_file was causing db calls to occur on every llm request, if team_id was set on key * fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db reduces latency/call by ~100ms * fix(proxy_server.py): minor fix on existing_settings not incl alerting * fix(exception_mapping_utils.py): map databricks exception string * fix(auth_checks.py): fix auth check logic * test: correctly mark flaky test * fix(utils.py): handle auth token error for tokenizers.from_pretrained * build: fix map * build: fix map * build: fix json for model map * test: remove eol model * fix(proxy_server.py): fix db config loading logic * fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten * test: skip test if required env var is missing * test: fix test --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com> * test: mark flaky test * test: handle anthropic api instability * test: update test * test: bump num retries on langfuse tests - their api is quite bad --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
This commit is contained in:
parent
4ed7f96360
commit
741da7e182
24 changed files with 1510 additions and 554 deletions
399
litellm/utils.py
399
litellm/utils.py
|
@ -69,6 +69,9 @@ from litellm.litellm_core_utils.get_llm_provider_logic import (
|
|||
_is_non_openai_azure_model,
|
||||
get_llm_provider,
|
||||
)
|
||||
from litellm.litellm_core_utils.get_supported_openai_params import (
|
||||
get_supported_openai_params,
|
||||
)
|
||||
from litellm.litellm_core_utils.llm_request_utils import _ensure_extra_body_is_safe
|
||||
from litellm.litellm_core_utils.llm_response_utils.convert_dict_to_response import (
|
||||
LiteLLMResponseObjectHandler,
|
||||
|
@ -962,9 +965,10 @@ def client(original_function): # noqa: PLR0915
|
|||
result._hidden_params["additional_headers"] = process_response_headers(
|
||||
result._hidden_params.get("additional_headers") or {}
|
||||
) # GUARANTEE OPENAI HEADERS IN RESPONSE
|
||||
result._response_ms = (
|
||||
end_time - start_time
|
||||
).total_seconds() * 1000 # return response latency in ms like openai
|
||||
if result is not None:
|
||||
result._response_ms = (
|
||||
end_time - start_time
|
||||
).total_seconds() * 1000 # return response latency in ms like openai
|
||||
return result
|
||||
except Exception as e:
|
||||
call_type = original_function.__name__
|
||||
|
@ -3622,43 +3626,30 @@ def get_optional_params( # noqa: PLR0915
|
|||
model=model, custom_llm_provider=custom_llm_provider
|
||||
)
|
||||
_check_valid_arg(supported_params=supported_params)
|
||||
if max_tokens is not None:
|
||||
optional_params["max_new_tokens"] = max_tokens
|
||||
if stream:
|
||||
optional_params["stream"] = stream
|
||||
if temperature is not None:
|
||||
optional_params["temperature"] = temperature
|
||||
if top_p is not None:
|
||||
optional_params["top_p"] = top_p
|
||||
if frequency_penalty is not None:
|
||||
optional_params["repetition_penalty"] = frequency_penalty
|
||||
if seed is not None:
|
||||
optional_params["random_seed"] = seed
|
||||
if stop is not None:
|
||||
optional_params["stop_sequences"] = stop
|
||||
|
||||
# WatsonX-only parameters
|
||||
extra_body = {}
|
||||
if "decoding_method" in passed_params:
|
||||
extra_body["decoding_method"] = passed_params.pop("decoding_method")
|
||||
if "min_tokens" in passed_params or "min_new_tokens" in passed_params:
|
||||
extra_body["min_new_tokens"] = passed_params.pop(
|
||||
"min_tokens", passed_params.pop("min_new_tokens")
|
||||
)
|
||||
if "top_k" in passed_params:
|
||||
extra_body["top_k"] = passed_params.pop("top_k")
|
||||
if "truncate_input_tokens" in passed_params:
|
||||
extra_body["truncate_input_tokens"] = passed_params.pop(
|
||||
"truncate_input_tokens"
|
||||
)
|
||||
if "length_penalty" in passed_params:
|
||||
extra_body["length_penalty"] = passed_params.pop("length_penalty")
|
||||
if "time_limit" in passed_params:
|
||||
extra_body["time_limit"] = passed_params.pop("time_limit")
|
||||
if "return_options" in passed_params:
|
||||
extra_body["return_options"] = passed_params.pop("return_options")
|
||||
optional_params["extra_body"] = (
|
||||
extra_body # openai client supports `extra_body` param
|
||||
optional_params = litellm.IBMWatsonXChatConfig().map_openai_params(
|
||||
non_default_params=non_default_params,
|
||||
optional_params=optional_params,
|
||||
model=model,
|
||||
drop_params=(
|
||||
drop_params
|
||||
if drop_params is not None and isinstance(drop_params, bool)
|
||||
else False
|
||||
),
|
||||
)
|
||||
# WatsonX-text param check
|
||||
for param in passed_params.keys():
|
||||
if litellm.IBMWatsonXAIConfig().is_watsonx_text_param(param):
|
||||
raise ValueError(
|
||||
f"LiteLLM now defaults to Watsonx's `/text/chat` endpoint. Please use the `watsonx_text` provider instead, to call the `/text/generation` endpoint. Param: {param}"
|
||||
)
|
||||
elif custom_llm_provider == "watsonx_text":
|
||||
supported_params = get_supported_openai_params(
|
||||
model=model, custom_llm_provider=custom_llm_provider
|
||||
)
|
||||
_check_valid_arg(supported_params=supported_params)
|
||||
optional_params = litellm.IBMWatsonXAIConfig().map_openai_params(
|
||||
non_default_params=non_default_params,
|
||||
optional_params=optional_params,
|
||||
)
|
||||
elif custom_llm_provider == "openai":
|
||||
supported_params = get_supported_openai_params(
|
||||
|
@ -4160,290 +4151,6 @@ def get_first_chars_messages(kwargs: dict) -> str:
|
|||
return ""
|
||||
|
||||
|
||||
def get_supported_openai_params( # noqa: PLR0915
|
||||
model: str,
|
||||
custom_llm_provider: Optional[str] = None,
|
||||
request_type: Literal["chat_completion", "embeddings"] = "chat_completion",
|
||||
) -> Optional[list]:
|
||||
"""
|
||||
Returns the supported openai params for a given model + provider
|
||||
|
||||
Example:
|
||||
```
|
||||
get_supported_openai_params(model="anthropic.claude-3", custom_llm_provider="bedrock")
|
||||
```
|
||||
|
||||
Returns:
|
||||
- List if custom_llm_provider is mapped
|
||||
- None if unmapped
|
||||
"""
|
||||
if not custom_llm_provider:
|
||||
try:
|
||||
custom_llm_provider = litellm.get_llm_provider(model=model)[1]
|
||||
except BadRequestError:
|
||||
return None
|
||||
if custom_llm_provider == "bedrock":
|
||||
return litellm.AmazonConverseConfig().get_supported_openai_params(model=model)
|
||||
elif custom_llm_provider == "ollama":
|
||||
return litellm.OllamaConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "ollama_chat":
|
||||
return litellm.OllamaChatConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "anthropic":
|
||||
return litellm.AnthropicConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "fireworks_ai":
|
||||
if request_type == "embeddings":
|
||||
return litellm.FireworksAIEmbeddingConfig().get_supported_openai_params(
|
||||
model=model
|
||||
)
|
||||
else:
|
||||
return litellm.FireworksAIConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "nvidia_nim":
|
||||
if request_type == "chat_completion":
|
||||
return litellm.nvidiaNimConfig.get_supported_openai_params(model=model)
|
||||
elif request_type == "embeddings":
|
||||
return litellm.nvidiaNimEmbeddingConfig.get_supported_openai_params()
|
||||
elif custom_llm_provider == "cerebras":
|
||||
return litellm.CerebrasConfig().get_supported_openai_params(model=model)
|
||||
elif custom_llm_provider == "xai":
|
||||
return litellm.XAIChatConfig().get_supported_openai_params(model=model)
|
||||
elif custom_llm_provider == "ai21_chat":
|
||||
return litellm.AI21ChatConfig().get_supported_openai_params(model=model)
|
||||
elif custom_llm_provider == "volcengine":
|
||||
return litellm.VolcEngineConfig().get_supported_openai_params(model=model)
|
||||
elif custom_llm_provider == "groq":
|
||||
return litellm.GroqChatConfig().get_supported_openai_params(model=model)
|
||||
elif custom_llm_provider == "hosted_vllm":
|
||||
return litellm.HostedVLLMChatConfig().get_supported_openai_params(model=model)
|
||||
elif custom_llm_provider == "deepseek":
|
||||
return [
|
||||
# https://platform.deepseek.com/api-docs/api/create-chat-completion
|
||||
"frequency_penalty",
|
||||
"max_tokens",
|
||||
"presence_penalty",
|
||||
"response_format",
|
||||
"stop",
|
||||
"stream",
|
||||
"temperature",
|
||||
"top_p",
|
||||
"logprobs",
|
||||
"top_logprobs",
|
||||
"tools",
|
||||
"tool_choice",
|
||||
]
|
||||
elif custom_llm_provider == "cohere":
|
||||
return [
|
||||
"stream",
|
||||
"temperature",
|
||||
"max_tokens",
|
||||
"logit_bias",
|
||||
"top_p",
|
||||
"frequency_penalty",
|
||||
"presence_penalty",
|
||||
"stop",
|
||||
"n",
|
||||
"extra_headers",
|
||||
]
|
||||
elif custom_llm_provider == "cohere_chat":
|
||||
return [
|
||||
"stream",
|
||||
"temperature",
|
||||
"max_tokens",
|
||||
"top_p",
|
||||
"frequency_penalty",
|
||||
"presence_penalty",
|
||||
"stop",
|
||||
"n",
|
||||
"tools",
|
||||
"tool_choice",
|
||||
"seed",
|
||||
"extra_headers",
|
||||
]
|
||||
elif custom_llm_provider == "maritalk":
|
||||
return [
|
||||
"stream",
|
||||
"temperature",
|
||||
"max_tokens",
|
||||
"top_p",
|
||||
"presence_penalty",
|
||||
"stop",
|
||||
]
|
||||
elif custom_llm_provider == "openai":
|
||||
return litellm.OpenAIConfig().get_supported_openai_params(model=model)
|
||||
elif custom_llm_provider == "azure":
|
||||
if litellm.AzureOpenAIO1Config().is_o1_model(model=model):
|
||||
return litellm.AzureOpenAIO1Config().get_supported_openai_params(
|
||||
model=model
|
||||
)
|
||||
else:
|
||||
return litellm.AzureOpenAIConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "openrouter":
|
||||
return [
|
||||
"temperature",
|
||||
"top_p",
|
||||
"frequency_penalty",
|
||||
"presence_penalty",
|
||||
"repetition_penalty",
|
||||
"seed",
|
||||
"max_tokens",
|
||||
"logit_bias",
|
||||
"logprobs",
|
||||
"top_logprobs",
|
||||
"response_format",
|
||||
"stop",
|
||||
"tools",
|
||||
"tool_choice",
|
||||
]
|
||||
elif custom_llm_provider == "mistral" or custom_llm_provider == "codestral":
|
||||
# mistal and codestral api have the exact same params
|
||||
if request_type == "chat_completion":
|
||||
return litellm.MistralConfig().get_supported_openai_params()
|
||||
elif request_type == "embeddings":
|
||||
return litellm.MistralEmbeddingConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "text-completion-codestral":
|
||||
return litellm.MistralTextCompletionConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "replicate":
|
||||
return [
|
||||
"stream",
|
||||
"temperature",
|
||||
"max_tokens",
|
||||
"top_p",
|
||||
"stop",
|
||||
"seed",
|
||||
"tools",
|
||||
"tool_choice",
|
||||
"functions",
|
||||
"function_call",
|
||||
]
|
||||
elif custom_llm_provider == "huggingface":
|
||||
return litellm.HuggingfaceConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "together_ai":
|
||||
return [
|
||||
"stream",
|
||||
"temperature",
|
||||
"max_tokens",
|
||||
"top_p",
|
||||
"stop",
|
||||
"frequency_penalty",
|
||||
"tools",
|
||||
"tool_choice",
|
||||
"response_format",
|
||||
]
|
||||
elif custom_llm_provider == "ai21":
|
||||
return [
|
||||
"stream",
|
||||
"n",
|
||||
"temperature",
|
||||
"max_tokens",
|
||||
"top_p",
|
||||
"stop",
|
||||
"frequency_penalty",
|
||||
"presence_penalty",
|
||||
]
|
||||
elif custom_llm_provider == "databricks":
|
||||
if request_type == "chat_completion":
|
||||
return litellm.DatabricksConfig().get_supported_openai_params()
|
||||
elif request_type == "embeddings":
|
||||
return litellm.DatabricksEmbeddingConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "palm" or custom_llm_provider == "gemini":
|
||||
return litellm.GoogleAIStudioGeminiConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "vertex_ai":
|
||||
if request_type == "chat_completion":
|
||||
if model.startswith("meta/"):
|
||||
return litellm.VertexAILlama3Config().get_supported_openai_params()
|
||||
if model.startswith("mistral"):
|
||||
return litellm.MistralConfig().get_supported_openai_params()
|
||||
if model.startswith("codestral"):
|
||||
return (
|
||||
litellm.MistralTextCompletionConfig().get_supported_openai_params()
|
||||
)
|
||||
if model.startswith("claude"):
|
||||
return litellm.VertexAIAnthropicConfig().get_supported_openai_params()
|
||||
return litellm.VertexAIConfig().get_supported_openai_params()
|
||||
elif request_type == "embeddings":
|
||||
return litellm.VertexAITextEmbeddingConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "vertex_ai_beta":
|
||||
if request_type == "chat_completion":
|
||||
return litellm.VertexGeminiConfig().get_supported_openai_params()
|
||||
elif request_type == "embeddings":
|
||||
return litellm.VertexAITextEmbeddingConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "sagemaker":
|
||||
return ["stream", "temperature", "max_tokens", "top_p", "stop", "n"]
|
||||
elif custom_llm_provider == "aleph_alpha":
|
||||
return [
|
||||
"max_tokens",
|
||||
"stream",
|
||||
"top_p",
|
||||
"temperature",
|
||||
"presence_penalty",
|
||||
"frequency_penalty",
|
||||
"n",
|
||||
"stop",
|
||||
]
|
||||
elif custom_llm_provider == "cloudflare":
|
||||
return ["max_tokens", "stream"]
|
||||
elif custom_llm_provider == "nlp_cloud":
|
||||
return [
|
||||
"max_tokens",
|
||||
"stream",
|
||||
"temperature",
|
||||
"top_p",
|
||||
"presence_penalty",
|
||||
"frequency_penalty",
|
||||
"n",
|
||||
"stop",
|
||||
]
|
||||
elif custom_llm_provider == "petals":
|
||||
return ["max_tokens", "temperature", "top_p", "stream"]
|
||||
elif custom_llm_provider == "deepinfra":
|
||||
return litellm.DeepInfraConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "perplexity":
|
||||
return [
|
||||
"temperature",
|
||||
"top_p",
|
||||
"stream",
|
||||
"max_tokens",
|
||||
"presence_penalty",
|
||||
"frequency_penalty",
|
||||
]
|
||||
elif custom_llm_provider == "anyscale":
|
||||
return [
|
||||
"temperature",
|
||||
"top_p",
|
||||
"stream",
|
||||
"max_tokens",
|
||||
"stop",
|
||||
"frequency_penalty",
|
||||
"presence_penalty",
|
||||
]
|
||||
elif custom_llm_provider == "watsonx":
|
||||
return litellm.IBMWatsonXAIConfig().get_supported_openai_params()
|
||||
elif custom_llm_provider == "custom_openai" or "text-completion-openai":
|
||||
return [
|
||||
"functions",
|
||||
"function_call",
|
||||
"temperature",
|
||||
"top_p",
|
||||
"n",
|
||||
"stream",
|
||||
"stream_options",
|
||||
"stop",
|
||||
"max_tokens",
|
||||
"presence_penalty",
|
||||
"frequency_penalty",
|
||||
"logit_bias",
|
||||
"user",
|
||||
"response_format",
|
||||
"seed",
|
||||
"tools",
|
||||
"tool_choice",
|
||||
"max_retries",
|
||||
"logprobs",
|
||||
"top_logprobs",
|
||||
"extra_headers",
|
||||
]
|
||||
return None
|
||||
|
||||
|
||||
def _count_characters(text: str) -> int:
|
||||
# Remove white spaces and count characters
|
||||
filtered_text = "".join(char for char in text if not char.isspace())
|
||||
|
@ -8640,3 +8347,47 @@ def add_dummy_tool(custom_llm_provider: str) -> List[ChatCompletionToolParam]:
|
|||
),
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
from litellm.types.llms.openai import (
|
||||
ChatCompletionAudioObject,
|
||||
ChatCompletionImageObject,
|
||||
ChatCompletionTextObject,
|
||||
ChatCompletionUserMessage,
|
||||
OpenAIMessageContent,
|
||||
ValidUserMessageContentTypes,
|
||||
)
|
||||
|
||||
|
||||
def validate_chat_completion_user_messages(messages: List[AllMessageValues]):
|
||||
"""
|
||||
Ensures all user messages are valid OpenAI chat completion messages.
|
||||
|
||||
Args:
|
||||
messages: List of message dictionaries
|
||||
message_content_type: Type to validate content against
|
||||
|
||||
Returns:
|
||||
List[dict]: The validated messages
|
||||
|
||||
Raises:
|
||||
ValueError: If any message is invalid
|
||||
"""
|
||||
for idx, m in enumerate(messages):
|
||||
try:
|
||||
if m["role"] == "user":
|
||||
user_content = m.get("content")
|
||||
if user_content is not None:
|
||||
if isinstance(user_content, str):
|
||||
continue
|
||||
elif isinstance(user_content, list):
|
||||
for item in user_content:
|
||||
if isinstance(item, dict):
|
||||
if item.get("type") not in ValidUserMessageContentTypes:
|
||||
raise Exception("invalid content type")
|
||||
except Exception:
|
||||
raise Exception(
|
||||
f"Invalid user message={m} at index {idx}. Please ensure all user messages are valid OpenAI chat completion messages."
|
||||
)
|
||||
|
||||
return messages
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue