forked from phoenix/litellm-mirror
Merge branch 'BerriAI:main' into main
This commit is contained in:
commit
de7fe98556
27 changed files with 850 additions and 346 deletions
47
.github/pull_request_template.md
vendored
Normal file
47
.github/pull_request_template.md
vendored
Normal file
|
@ -0,0 +1,47 @@
|
|||
<!-- This is just examples. You can remove all items if you want. -->
|
||||
<!-- Please remove all comments. -->
|
||||
|
||||
## Title
|
||||
|
||||
<!-- e.g. "Implement user authentication feature" -->
|
||||
|
||||
## Relevant issues
|
||||
|
||||
<!-- e.g. "Fixes #000" -->
|
||||
|
||||
## Type
|
||||
|
||||
<!-- Select the type of Pull Request -->
|
||||
<!-- Keep only the necessary ones -->
|
||||
|
||||
🆕 New Feature
|
||||
🐛 Bug Fix
|
||||
🧹 Refactoring
|
||||
📖 Documentation
|
||||
💻 Development Environment
|
||||
🚄 Infrastructure
|
||||
✅ Test
|
||||
|
||||
## Changes
|
||||
|
||||
<!-- List of changes -->
|
||||
|
||||
## Testing
|
||||
|
||||
<!-- Test procedure -->
|
||||
|
||||
## Notes
|
||||
|
||||
<!-- Test results -->
|
||||
|
||||
<!-- Points to note for the reviewer, consultation content, concerns -->
|
||||
|
||||
## Pre-Submission Checklist (optional but appreciated):
|
||||
|
||||
- [ ] I have included relevant documentation updates (stored in /docs/my-website)
|
||||
|
||||
## OS Tests (optional but appreciated):
|
||||
|
||||
- [ ] Tested on Windows
|
||||
- [ ] Tested on MacOS
|
||||
- [ ] Tested on Linux
|
|
@ -248,7 +248,7 @@ Step 2: Navigate into the project, and install dependencies:
|
|||
|
||||
```
|
||||
cd litellm
|
||||
poetry install
|
||||
poetry install -E extra_proxy -E proxy
|
||||
```
|
||||
|
||||
Step 3: Test your change:
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
# Completion Token Usage & Cost
|
||||
By default LiteLLM returns token usage in all completion requests ([See here](https://litellm.readthedocs.io/en/latest/output/))
|
||||
|
||||
However, we also expose 5 helper functions + **[NEW]** an API to calculate token usage across providers:
|
||||
However, we also expose some helper functions + **[NEW]** an API to calculate token usage across providers:
|
||||
|
||||
- `encode`: This encodes the text passed in, using the model-specific tokenizer. [**Jump to code**](#1-encode)
|
||||
|
||||
|
@ -9,17 +9,19 @@ However, we also expose 5 helper functions + **[NEW]** an API to calculate token
|
|||
|
||||
- `token_counter`: This returns the number of tokens for a given input - it uses the tokenizer based on the model, and defaults to tiktoken if no model-specific tokenizer is available. [**Jump to code**](#3-token_counter)
|
||||
|
||||
- `cost_per_token`: This returns the cost (in USD) for prompt (input) and completion (output) tokens. Uses the live list from `api.litellm.ai`. [**Jump to code**](#4-cost_per_token)
|
||||
- `create_pretrained_tokenizer` and `create_tokenizer`: LiteLLM provides default tokenizer support for OpenAI, Cohere, Anthropic, Llama2, and Llama3 models. If you are using a different model, you can create a custom tokenizer and pass it as `custom_tokenizer` to the `encode`, `decode`, and `token_counter` methods. [**Jump to code**](#4-create_pretrained_tokenizer-and-create_tokenizer)
|
||||
|
||||
- `completion_cost`: This returns the overall cost (in USD) for a given LLM API Call. It combines `token_counter` and `cost_per_token` to return the cost for that query (counting both cost of input and output). [**Jump to code**](#5-completion_cost)
|
||||
- `cost_per_token`: This returns the cost (in USD) for prompt (input) and completion (output) tokens. Uses the live list from `api.litellm.ai`. [**Jump to code**](#5-cost_per_token)
|
||||
|
||||
- `get_max_tokens`: This returns the maximum number of tokens allowed for the given model. [**Jump to code**](#6-get_max_tokens)
|
||||
- `completion_cost`: This returns the overall cost (in USD) for a given LLM API Call. It combines `token_counter` and `cost_per_token` to return the cost for that query (counting both cost of input and output). [**Jump to code**](#6-completion_cost)
|
||||
|
||||
- `model_cost`: This returns a dictionary for all models, with their max_tokens, input_cost_per_token and output_cost_per_token. It uses the `api.litellm.ai` call shown below. [**Jump to code**](#7-model_cost)
|
||||
- `get_max_tokens`: This returns the maximum number of tokens allowed for the given model. [**Jump to code**](#7-get_max_tokens)
|
||||
|
||||
- `register_model`: This registers new / overrides existing models (and their pricing details) in the model cost dictionary. [**Jump to code**](#8-register_model)
|
||||
- `model_cost`: This returns a dictionary for all models, with their max_tokens, input_cost_per_token and output_cost_per_token. It uses the `api.litellm.ai` call shown below. [**Jump to code**](#8-model_cost)
|
||||
|
||||
- `api.litellm.ai`: Live token + price count across [all supported models](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). [**Jump to code**](#9-apilitellmai)
|
||||
- `register_model`: This registers new / overrides existing models (and their pricing details) in the model cost dictionary. [**Jump to code**](#9-register_model)
|
||||
|
||||
- `api.litellm.ai`: Live token + price count across [all supported models](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json). [**Jump to code**](#10-apilitellmai)
|
||||
|
||||
📣 This is a community maintained list. Contributions are welcome! ❤️
|
||||
|
||||
|
@ -60,7 +62,24 @@ messages = [{"user": "role", "content": "Hey, how's it going"}]
|
|||
print(token_counter(model="gpt-3.5-turbo", messages=messages))
|
||||
```
|
||||
|
||||
### 4. `cost_per_token`
|
||||
### 4. `create_pretrained_tokenizer` and `create_tokenizer`
|
||||
|
||||
```python
|
||||
from litellm import create_pretrained_tokenizer, create_tokenizer
|
||||
|
||||
# get tokenizer from huggingface repo
|
||||
custom_tokenizer_1 = create_pretrained_tokenizer("Xenova/llama-3-tokenizer")
|
||||
|
||||
# use tokenizer from json file
|
||||
with open("tokenizer.json") as f:
|
||||
json_data = json.load(f)
|
||||
|
||||
json_str = json.dumps(json_data)
|
||||
|
||||
custom_tokenizer_2 = create_tokenizer(json_str)
|
||||
```
|
||||
|
||||
### 5. `cost_per_token`
|
||||
|
||||
```python
|
||||
from litellm import cost_per_token
|
||||
|
@ -72,7 +91,7 @@ prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = cost_per_toke
|
|||
print(prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar)
|
||||
```
|
||||
|
||||
### 5. `completion_cost`
|
||||
### 6. `completion_cost`
|
||||
|
||||
* Input: Accepts a `litellm.completion()` response **OR** prompt + completion strings
|
||||
* Output: Returns a `float` of cost for the `completion` call
|
||||
|
@ -99,7 +118,7 @@ cost = completion_cost(model="bedrock/anthropic.claude-v2", prompt="Hey!", compl
|
|||
formatted_string = f"${float(cost):.10f}"
|
||||
print(formatted_string)
|
||||
```
|
||||
### 6. `get_max_tokens`
|
||||
### 7. `get_max_tokens`
|
||||
|
||||
Input: Accepts a model name - e.g., gpt-3.5-turbo (to get a complete list, call litellm.model_list).
|
||||
Output: Returns the maximum number of tokens allowed for the given model
|
||||
|
@ -112,7 +131,7 @@ model = "gpt-3.5-turbo"
|
|||
print(get_max_tokens(model)) # Output: 4097
|
||||
```
|
||||
|
||||
### 7. `model_cost`
|
||||
### 8. `model_cost`
|
||||
|
||||
* Output: Returns a dict object containing the max_tokens, input_cost_per_token, output_cost_per_token for all models on [community-maintained list](https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json)
|
||||
|
||||
|
@ -122,7 +141,7 @@ from litellm import model_cost
|
|||
print(model_cost) # {'gpt-3.5-turbo': {'max_tokens': 4000, 'input_cost_per_token': 1.5e-06, 'output_cost_per_token': 2e-06}, ...}
|
||||
```
|
||||
|
||||
### 8. `register_model`
|
||||
### 9. `register_model`
|
||||
|
||||
* Input: Provide EITHER a model cost dictionary or a url to a hosted json blob
|
||||
* Output: Returns updated model_cost dictionary + updates litellm.model_cost with model details.
|
||||
|
@ -157,5 +176,3 @@ export LITELLM_LOCAL_MODEL_COST_MAP="True"
|
|||
```
|
||||
|
||||
Note: this means you will need to upgrade to get updated pricing, and newer models.
|
||||
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
# Greenscale Tutorial
|
||||
# Greenscale - Track LLM Spend and Responsible Usage
|
||||
|
||||
[Greenscale](https://greenscale.ai/) is a production monitoring platform for your LLM-powered app that provides you granular key insights into your GenAI spending and responsible usage. Greenscale only captures metadata to minimize the exposure risk of personally identifiable information (PII).
|
||||
|
||||
|
|
|
@ -178,6 +178,7 @@ const sidebars = {
|
|||
"observability/traceloop_integration",
|
||||
"observability/athina_integration",
|
||||
"observability/lunary_integration",
|
||||
"observability/greenscale_integration",
|
||||
"observability/helicone_integration",
|
||||
"observability/supabase_integration",
|
||||
`observability/telemetry`,
|
||||
|
|
8
litellm-js/spend-logs/package-lock.json
generated
8
litellm-js/spend-logs/package-lock.json
generated
|
@ -5,7 +5,7 @@
|
|||
"packages": {
|
||||
"": {
|
||||
"dependencies": {
|
||||
"@hono/node-server": "^1.9.0",
|
||||
"@hono/node-server": "^1.10.1",
|
||||
"hono": "^4.2.7"
|
||||
},
|
||||
"devDependencies": {
|
||||
|
@ -382,9 +382,9 @@
|
|||
}
|
||||
},
|
||||
"node_modules/@hono/node-server": {
|
||||
"version": "1.9.0",
|
||||
"resolved": "https://registry.npmjs.org/@hono/node-server/-/node-server-1.9.0.tgz",
|
||||
"integrity": "sha512-oJjk7WXBlENeHhWiMqSyxPIZ3Kmf5ZYxqdlcSIXyN8Rn50bNJsPl99G4POBS03Jxh56FdfRJ0SEnC8mAVIiavQ==",
|
||||
"version": "1.10.1",
|
||||
"resolved": "https://registry.npmjs.org/@hono/node-server/-/node-server-1.10.1.tgz",
|
||||
"integrity": "sha512-5BKW25JH5PQKPDkTcIgv3yNUPtOAbnnjFFgWvIxxAY/B/ZNeYjjWoAeDmqhIiCgOAJ3Tauuw+0G+VainhuZRYQ==",
|
||||
"engines": {
|
||||
"node": ">=18.14.1"
|
||||
}
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
"dev": "tsx watch src/index.ts"
|
||||
},
|
||||
"dependencies": {
|
||||
"@hono/node-server": "^1.9.0",
|
||||
"@hono/node-server": "^1.10.1",
|
||||
"hono": "^4.2.7"
|
||||
},
|
||||
"devDependencies": {
|
||||
|
|
|
@ -542,7 +542,11 @@ models_by_provider: dict = {
|
|||
"together_ai": together_ai_models,
|
||||
"baseten": baseten_models,
|
||||
"openrouter": openrouter_models,
|
||||
"vertex_ai": vertex_chat_models + vertex_text_models,
|
||||
"vertex_ai": vertex_chat_models
|
||||
+ vertex_text_models
|
||||
+ vertex_anthropic_models
|
||||
+ vertex_vision_models
|
||||
+ vertex_language_models,
|
||||
"ai21": ai21_models,
|
||||
"bedrock": bedrock_models,
|
||||
"petals": petals_models,
|
||||
|
@ -609,6 +613,8 @@ from .utils import (
|
|||
get_optional_params,
|
||||
modify_integration,
|
||||
token_counter,
|
||||
create_pretrained_tokenizer,
|
||||
create_tokenizer,
|
||||
cost_per_token,
|
||||
completion_cost,
|
||||
supports_function_calling,
|
||||
|
|
|
@ -38,7 +38,7 @@ class OpenMeterLogger(CustomLogger):
|
|||
in the environment
|
||||
"""
|
||||
missing_keys = []
|
||||
if litellm.get_secret("OPENMETER_API_KEY", None) is None:
|
||||
if os.getenv("OPENMETER_API_KEY", None) is None:
|
||||
missing_keys.append("OPENMETER_API_KEY")
|
||||
|
||||
if len(missing_keys) > 0:
|
||||
|
@ -71,15 +71,13 @@ class OpenMeterLogger(CustomLogger):
|
|||
}
|
||||
|
||||
def log_success_event(self, kwargs, response_obj, start_time, end_time):
|
||||
_url = litellm.get_secret(
|
||||
"OPENMETER_API_ENDPOINT", default_value="https://openmeter.cloud"
|
||||
)
|
||||
_url = os.getenv("OPENMETER_API_ENDPOINT", "https://openmeter.cloud")
|
||||
if _url.endswith("/"):
|
||||
_url += "api/v1/events"
|
||||
else:
|
||||
_url += "/api/v1/events"
|
||||
|
||||
api_key = litellm.get_secret("OPENMETER_API_KEY")
|
||||
api_key = os.getenv("OPENMETER_API_KEY")
|
||||
|
||||
_data = self._common_logic(kwargs=kwargs, response_obj=response_obj)
|
||||
self.sync_http_handler.post(
|
||||
|
@ -92,15 +90,13 @@ class OpenMeterLogger(CustomLogger):
|
|||
)
|
||||
|
||||
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
|
||||
_url = litellm.get_secret(
|
||||
"OPENMETER_API_ENDPOINT", default_value="https://openmeter.cloud"
|
||||
)
|
||||
_url = os.getenv("OPENMETER_API_ENDPOINT", "https://openmeter.cloud")
|
||||
if _url.endswith("/"):
|
||||
_url += "api/v1/events"
|
||||
else:
|
||||
_url += "/api/v1/events"
|
||||
|
||||
api_key = litellm.get_secret("OPENMETER_API_KEY")
|
||||
api_key = os.getenv("OPENMETER_API_KEY")
|
||||
|
||||
_data = self._common_logic(kwargs=kwargs, response_obj=response_obj)
|
||||
_headers = {
|
||||
|
@ -117,7 +113,6 @@ class OpenMeterLogger(CustomLogger):
|
|||
|
||||
response.raise_for_status()
|
||||
except Exception as e:
|
||||
print(f"\nAn Exception Occurred - {str(e)}")
|
||||
if hasattr(response, "text"):
|
||||
print(f"\nError Message: {response.text}")
|
||||
litellm.print_verbose(f"\nError Message: {response.text}")
|
||||
raise e
|
||||
|
|
|
@ -48,19 +48,6 @@ class SlackAlerting:
|
|||
self.internal_usage_cache = DualCache()
|
||||
self.async_http_handler = AsyncHTTPHandler()
|
||||
self.alert_to_webhook_url = alert_to_webhook_url
|
||||
self.langfuse_logger = None
|
||||
|
||||
try:
|
||||
from litellm.integrations.langfuse import LangFuseLogger
|
||||
|
||||
self.langfuse_logger = LangFuseLogger(
|
||||
os.getenv("LANGFUSE_PUBLIC_KEY"),
|
||||
os.getenv("LANGFUSE_SECRET_KEY"),
|
||||
flush_interval=1,
|
||||
)
|
||||
except:
|
||||
pass
|
||||
|
||||
pass
|
||||
|
||||
def update_values(
|
||||
|
@ -110,62 +97,8 @@ class SlackAlerting:
|
|||
start_time: Optional[datetime.datetime] = None,
|
||||
end_time: Optional[datetime.datetime] = None,
|
||||
):
|
||||
import uuid
|
||||
|
||||
# For now: do nothing as we're debugging why this is not working as expected
|
||||
if request_data is not None:
|
||||
trace_id = request_data.get("metadata", {}).get(
|
||||
"trace_id", None
|
||||
) # get langfuse trace id
|
||||
if trace_id is None:
|
||||
trace_id = "litellm-alert-trace-" + str(uuid.uuid4())
|
||||
request_data["metadata"]["trace_id"] = trace_id
|
||||
elif kwargs is not None:
|
||||
_litellm_params = kwargs.get("litellm_params", {})
|
||||
trace_id = _litellm_params.get("metadata", {}).get(
|
||||
"trace_id", None
|
||||
) # get langfuse trace id
|
||||
if trace_id is None:
|
||||
trace_id = "litellm-alert-trace-" + str(uuid.uuid4())
|
||||
_litellm_params["metadata"]["trace_id"] = trace_id
|
||||
|
||||
# Log hanging request as an error on langfuse
|
||||
if type == "hanging_request":
|
||||
if self.langfuse_logger is not None:
|
||||
_logging_kwargs = copy.deepcopy(request_data)
|
||||
if _logging_kwargs is None:
|
||||
_logging_kwargs = {}
|
||||
_logging_kwargs["litellm_params"] = {}
|
||||
request_data = request_data or {}
|
||||
_logging_kwargs["litellm_params"]["metadata"] = request_data.get(
|
||||
"metadata", {}
|
||||
)
|
||||
# log to langfuse in a separate thread
|
||||
import threading
|
||||
|
||||
threading.Thread(
|
||||
target=self.langfuse_logger.log_event,
|
||||
args=(
|
||||
_logging_kwargs,
|
||||
None,
|
||||
start_time,
|
||||
end_time,
|
||||
None,
|
||||
print,
|
||||
"ERROR",
|
||||
"Requests is hanging",
|
||||
),
|
||||
).start()
|
||||
|
||||
_langfuse_host = os.environ.get("LANGFUSE_HOST", "https://cloud.langfuse.com")
|
||||
_langfuse_project_id = os.environ.get("LANGFUSE_PROJECT_ID")
|
||||
|
||||
# langfuse urls look like: https://us.cloud.langfuse.com/project/************/traces/litellm-alert-trace-ididi9dk-09292-************
|
||||
|
||||
_langfuse_url = (
|
||||
f"{_langfuse_host}/project/{_langfuse_project_id}/traces/{trace_id}"
|
||||
)
|
||||
request_info += f"\n🪢 Langfuse Trace: {_langfuse_url}"
|
||||
# do nothing for now
|
||||
pass
|
||||
return request_info
|
||||
|
||||
def _response_taking_too_long_callback(
|
||||
|
@ -242,10 +175,6 @@ class SlackAlerting:
|
|||
request_info = f"\nRequest Model: `{model}`\nAPI Base: `{api_base}`\nMessages: `{messages}`"
|
||||
slow_message = f"`Responses are slow - {round(time_difference_float,2)}s response time > Alerting threshold: {self.alerting_threshold}s`"
|
||||
if time_difference_float > self.alerting_threshold:
|
||||
if "langfuse" in litellm.success_callback:
|
||||
request_info = self._add_langfuse_trace_id_to_alert(
|
||||
request_info=request_info, kwargs=kwargs, type="slow_response"
|
||||
)
|
||||
# add deployment latencies to alert
|
||||
if (
|
||||
kwargs is not None
|
||||
|
|
|
@ -34,6 +34,8 @@ from litellm.utils import (
|
|||
async_mock_completion_streaming_obj,
|
||||
convert_to_model_response_object,
|
||||
token_counter,
|
||||
create_pretrained_tokenizer,
|
||||
create_tokenizer,
|
||||
Usage,
|
||||
get_optional_params_embeddings,
|
||||
get_optional_params_image_gen,
|
||||
|
|
|
@ -338,6 +338,18 @@
|
|||
"output_cost_per_second": 0.0001,
|
||||
"litellm_provider": "azure"
|
||||
},
|
||||
"azure/gpt-4-turbo-2024-04-09": {
|
||||
"max_tokens": 4096,
|
||||
"max_input_tokens": 128000,
|
||||
"max_output_tokens": 4096,
|
||||
"input_cost_per_token": 0.00001,
|
||||
"output_cost_per_token": 0.00003,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"azure/gpt-4-0125-preview": {
|
||||
"max_tokens": 4096,
|
||||
"max_input_tokens": 128000,
|
||||
|
@ -813,6 +825,7 @@
|
|||
"litellm_provider": "anthropic",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true,
|
||||
"tool_use_system_prompt_tokens": 264
|
||||
},
|
||||
"claude-3-opus-20240229": {
|
||||
|
@ -824,6 +837,7 @@
|
|||
"litellm_provider": "anthropic",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true,
|
||||
"tool_use_system_prompt_tokens": 395
|
||||
},
|
||||
"claude-3-sonnet-20240229": {
|
||||
|
@ -835,6 +849,7 @@
|
|||
"litellm_provider": "anthropic",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true,
|
||||
"tool_use_system_prompt_tokens": 159
|
||||
},
|
||||
"text-bison": {
|
||||
|
@ -1142,7 +1157,8 @@
|
|||
"output_cost_per_token": 0.000015,
|
||||
"litellm_provider": "vertex_ai-anthropic_models",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"vertex_ai/claude-3-haiku@20240307": {
|
||||
"max_tokens": 4096,
|
||||
|
@ -1152,7 +1168,8 @@
|
|||
"output_cost_per_token": 0.00000125,
|
||||
"litellm_provider": "vertex_ai-anthropic_models",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"vertex_ai/claude-3-opus@20240229": {
|
||||
"max_tokens": 4096,
|
||||
|
@ -1162,7 +1179,8 @@
|
|||
"output_cost_per_token": 0.0000075,
|
||||
"litellm_provider": "vertex_ai-anthropic_models",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"textembedding-gecko": {
|
||||
"max_tokens": 3072,
|
||||
|
@ -1581,6 +1599,7 @@
|
|||
"litellm_provider": "openrouter",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true,
|
||||
"tool_use_system_prompt_tokens": 395
|
||||
},
|
||||
"openrouter/google/palm-2-chat-bison": {
|
||||
|
@ -1929,7 +1948,8 @@
|
|||
"output_cost_per_token": 0.000015,
|
||||
"litellm_provider": "bedrock",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"anthropic.claude-3-haiku-20240307-v1:0": {
|
||||
"max_tokens": 4096,
|
||||
|
@ -1939,7 +1959,8 @@
|
|||
"output_cost_per_token": 0.00000125,
|
||||
"litellm_provider": "bedrock",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"anthropic.claude-3-opus-20240229-v1:0": {
|
||||
"max_tokens": 4096,
|
||||
|
@ -1949,7 +1970,8 @@
|
|||
"output_cost_per_token": 0.000075,
|
||||
"litellm_provider": "bedrock",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"anthropic.claude-v1": {
|
||||
"max_tokens": 8191,
|
||||
|
|
|
@ -11,5 +11,12 @@ router_settings:
|
|||
redis_password: os.environ/REDIS_PASSWORD
|
||||
redis_port: os.environ/REDIS_PORT
|
||||
|
||||
router_settings:
|
||||
routing_strategy: "latency-based-routing"
|
||||
|
||||
litellm_settings:
|
||||
success_callback: ["openmeter"]
|
||||
|
||||
general_settings:
|
||||
alerting: ["slack"]
|
||||
alert_types: ["llm_exceptions"]
|
|
@ -3446,172 +3446,6 @@ def model_list(
|
|||
)
|
||||
|
||||
|
||||
@router.post(
|
||||
"/v1/completions", dependencies=[Depends(user_api_key_auth)], tags=["completions"]
|
||||
)
|
||||
@router.post(
|
||||
"/completions", dependencies=[Depends(user_api_key_auth)], tags=["completions"]
|
||||
)
|
||||
@router.post(
|
||||
"/engines/{model:path}/completions",
|
||||
dependencies=[Depends(user_api_key_auth)],
|
||||
tags=["completions"],
|
||||
)
|
||||
@router.post(
|
||||
"/openai/deployments/{model:path}/completions",
|
||||
dependencies=[Depends(user_api_key_auth)],
|
||||
tags=["completions"],
|
||||
)
|
||||
async def completion(
|
||||
request: Request,
|
||||
fastapi_response: Response,
|
||||
model: Optional[str] = None,
|
||||
user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
|
||||
):
|
||||
global user_temperature, user_request_timeout, user_max_tokens, user_api_base
|
||||
try:
|
||||
body = await request.body()
|
||||
body_str = body.decode()
|
||||
try:
|
||||
data = ast.literal_eval(body_str)
|
||||
except:
|
||||
data = json.loads(body_str)
|
||||
|
||||
data["user"] = data.get("user", user_api_key_dict.user_id)
|
||||
data["model"] = (
|
||||
general_settings.get("completion_model", None) # server default
|
||||
or user_model # model name passed via cli args
|
||||
or model # for azure deployments
|
||||
or data["model"] # default passed in http request
|
||||
)
|
||||
if user_model:
|
||||
data["model"] = user_model
|
||||
if "metadata" not in data:
|
||||
data["metadata"] = {}
|
||||
data["metadata"]["user_api_key"] = user_api_key_dict.api_key
|
||||
data["metadata"]["user_api_key_metadata"] = user_api_key_dict.metadata
|
||||
data["metadata"]["user_api_key_alias"] = getattr(
|
||||
user_api_key_dict, "key_alias", None
|
||||
)
|
||||
data["metadata"]["user_api_key_user_id"] = user_api_key_dict.user_id
|
||||
data["metadata"]["user_api_key_team_id"] = getattr(
|
||||
user_api_key_dict, "team_id", None
|
||||
)
|
||||
data["metadata"]["user_api_key_team_alias"] = getattr(
|
||||
user_api_key_dict, "team_alias", None
|
||||
)
|
||||
_headers = dict(request.headers)
|
||||
_headers.pop(
|
||||
"authorization", None
|
||||
) # do not store the original `sk-..` api key in the db
|
||||
data["metadata"]["headers"] = _headers
|
||||
data["metadata"]["endpoint"] = str(request.url)
|
||||
|
||||
# override with user settings, these are params passed via cli
|
||||
if user_temperature:
|
||||
data["temperature"] = user_temperature
|
||||
if user_request_timeout:
|
||||
data["request_timeout"] = user_request_timeout
|
||||
if user_max_tokens:
|
||||
data["max_tokens"] = user_max_tokens
|
||||
if user_api_base:
|
||||
data["api_base"] = user_api_base
|
||||
|
||||
### MODEL ALIAS MAPPING ###
|
||||
# check if model name in model alias map
|
||||
# get the actual model name
|
||||
if data["model"] in litellm.model_alias_map:
|
||||
data["model"] = litellm.model_alias_map[data["model"]]
|
||||
|
||||
### CALL HOOKS ### - modify incoming data before calling the model
|
||||
data = await proxy_logging_obj.pre_call_hook(
|
||||
user_api_key_dict=user_api_key_dict, data=data, call_type="completion"
|
||||
)
|
||||
|
||||
### ROUTE THE REQUESTs ###
|
||||
router_model_names = llm_router.model_names if llm_router is not None else []
|
||||
# skip router if user passed their key
|
||||
if "api_key" in data:
|
||||
response = await litellm.atext_completion(**data)
|
||||
elif (
|
||||
llm_router is not None and data["model"] in router_model_names
|
||||
): # model in router model list
|
||||
response = await llm_router.atext_completion(**data)
|
||||
elif (
|
||||
llm_router is not None
|
||||
and llm_router.model_group_alias is not None
|
||||
and data["model"] in llm_router.model_group_alias
|
||||
): # model set in model_group_alias
|
||||
response = await llm_router.atext_completion(**data)
|
||||
elif (
|
||||
llm_router is not None and data["model"] in llm_router.deployment_names
|
||||
): # model in router deployments, calling a specific deployment on the router
|
||||
response = await llm_router.atext_completion(
|
||||
**data, specific_deployment=True
|
||||
)
|
||||
elif (
|
||||
llm_router is not None
|
||||
and data["model"] not in router_model_names
|
||||
and llm_router.default_deployment is not None
|
||||
): # model in router deployments, calling a specific deployment on the router
|
||||
response = await llm_router.atext_completion(**data)
|
||||
elif user_model is not None: # `litellm --model <your-model-name>`
|
||||
response = await litellm.atext_completion(**data)
|
||||
else:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail={
|
||||
"error": "Invalid model name passed in model="
|
||||
+ data.get("model", "")
|
||||
},
|
||||
)
|
||||
|
||||
if hasattr(response, "_hidden_params"):
|
||||
model_id = response._hidden_params.get("model_id", None) or ""
|
||||
original_response = (
|
||||
response._hidden_params.get("original_response", None) or ""
|
||||
)
|
||||
else:
|
||||
model_id = ""
|
||||
original_response = ""
|
||||
|
||||
verbose_proxy_logger.debug("final response: %s", response)
|
||||
if (
|
||||
"stream" in data and data["stream"] == True
|
||||
): # use generate_responses to stream responses
|
||||
custom_headers = {
|
||||
"x-litellm-model-id": model_id,
|
||||
}
|
||||
selected_data_generator = select_data_generator(
|
||||
response=response, user_api_key_dict=user_api_key_dict
|
||||
)
|
||||
|
||||
return StreamingResponse(
|
||||
selected_data_generator,
|
||||
media_type="text/event-stream",
|
||||
headers=custom_headers,
|
||||
)
|
||||
|
||||
fastapi_response.headers["x-litellm-model-id"] = model_id
|
||||
return response
|
||||
except Exception as e:
|
||||
data["litellm_status"] = "fail" # used for alerting
|
||||
verbose_proxy_logger.debug("EXCEPTION RAISED IN PROXY MAIN.PY")
|
||||
verbose_proxy_logger.debug(
|
||||
"\033[1;31mAn error occurred: %s\n\n Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`",
|
||||
e,
|
||||
)
|
||||
traceback.print_exc()
|
||||
error_traceback = traceback.format_exc()
|
||||
error_msg = f"{str(e)}"
|
||||
raise ProxyException(
|
||||
message=getattr(e, "message", error_msg),
|
||||
type=getattr(e, "type", "None"),
|
||||
param=getattr(e, "param", "None"),
|
||||
code=getattr(e, "status_code", 500),
|
||||
)
|
||||
|
||||
|
||||
@router.post(
|
||||
"/v1/chat/completions",
|
||||
dependencies=[Depends(user_api_key_auth)],
|
||||
|
@ -3810,7 +3644,7 @@ async def chat_completion(
|
|||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail={
|
||||
"error": "Invalid model name passed in model="
|
||||
"error": "chat_completion: Invalid model name passed in model="
|
||||
+ data.get("model", "")
|
||||
},
|
||||
)
|
||||
|
@ -3884,6 +3718,172 @@ async def chat_completion(
|
|||
)
|
||||
|
||||
|
||||
@router.post(
|
||||
"/v1/completions", dependencies=[Depends(user_api_key_auth)], tags=["completions"]
|
||||
)
|
||||
@router.post(
|
||||
"/completions", dependencies=[Depends(user_api_key_auth)], tags=["completions"]
|
||||
)
|
||||
@router.post(
|
||||
"/engines/{model:path}/completions",
|
||||
dependencies=[Depends(user_api_key_auth)],
|
||||
tags=["completions"],
|
||||
)
|
||||
@router.post(
|
||||
"/openai/deployments/{model:path}/completions",
|
||||
dependencies=[Depends(user_api_key_auth)],
|
||||
tags=["completions"],
|
||||
)
|
||||
async def completion(
|
||||
request: Request,
|
||||
fastapi_response: Response,
|
||||
model: Optional[str] = None,
|
||||
user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
|
||||
):
|
||||
global user_temperature, user_request_timeout, user_max_tokens, user_api_base
|
||||
try:
|
||||
body = await request.body()
|
||||
body_str = body.decode()
|
||||
try:
|
||||
data = ast.literal_eval(body_str)
|
||||
except:
|
||||
data = json.loads(body_str)
|
||||
|
||||
data["user"] = data.get("user", user_api_key_dict.user_id)
|
||||
data["model"] = (
|
||||
general_settings.get("completion_model", None) # server default
|
||||
or user_model # model name passed via cli args
|
||||
or model # for azure deployments
|
||||
or data["model"] # default passed in http request
|
||||
)
|
||||
if user_model:
|
||||
data["model"] = user_model
|
||||
if "metadata" not in data:
|
||||
data["metadata"] = {}
|
||||
data["metadata"]["user_api_key"] = user_api_key_dict.api_key
|
||||
data["metadata"]["user_api_key_metadata"] = user_api_key_dict.metadata
|
||||
data["metadata"]["user_api_key_alias"] = getattr(
|
||||
user_api_key_dict, "key_alias", None
|
||||
)
|
||||
data["metadata"]["user_api_key_user_id"] = user_api_key_dict.user_id
|
||||
data["metadata"]["user_api_key_team_id"] = getattr(
|
||||
user_api_key_dict, "team_id", None
|
||||
)
|
||||
data["metadata"]["user_api_key_team_alias"] = getattr(
|
||||
user_api_key_dict, "team_alias", None
|
||||
)
|
||||
_headers = dict(request.headers)
|
||||
_headers.pop(
|
||||
"authorization", None
|
||||
) # do not store the original `sk-..` api key in the db
|
||||
data["metadata"]["headers"] = _headers
|
||||
data["metadata"]["endpoint"] = str(request.url)
|
||||
|
||||
# override with user settings, these are params passed via cli
|
||||
if user_temperature:
|
||||
data["temperature"] = user_temperature
|
||||
if user_request_timeout:
|
||||
data["request_timeout"] = user_request_timeout
|
||||
if user_max_tokens:
|
||||
data["max_tokens"] = user_max_tokens
|
||||
if user_api_base:
|
||||
data["api_base"] = user_api_base
|
||||
|
||||
### MODEL ALIAS MAPPING ###
|
||||
# check if model name in model alias map
|
||||
# get the actual model name
|
||||
if data["model"] in litellm.model_alias_map:
|
||||
data["model"] = litellm.model_alias_map[data["model"]]
|
||||
|
||||
### CALL HOOKS ### - modify incoming data before calling the model
|
||||
data = await proxy_logging_obj.pre_call_hook(
|
||||
user_api_key_dict=user_api_key_dict, data=data, call_type="completion"
|
||||
)
|
||||
|
||||
### ROUTE THE REQUESTs ###
|
||||
router_model_names = llm_router.model_names if llm_router is not None else []
|
||||
# skip router if user passed their key
|
||||
if "api_key" in data:
|
||||
response = await litellm.atext_completion(**data)
|
||||
elif (
|
||||
llm_router is not None and data["model"] in router_model_names
|
||||
): # model in router model list
|
||||
response = await llm_router.atext_completion(**data)
|
||||
elif (
|
||||
llm_router is not None
|
||||
and llm_router.model_group_alias is not None
|
||||
and data["model"] in llm_router.model_group_alias
|
||||
): # model set in model_group_alias
|
||||
response = await llm_router.atext_completion(**data)
|
||||
elif (
|
||||
llm_router is not None and data["model"] in llm_router.deployment_names
|
||||
): # model in router deployments, calling a specific deployment on the router
|
||||
response = await llm_router.atext_completion(
|
||||
**data, specific_deployment=True
|
||||
)
|
||||
elif (
|
||||
llm_router is not None
|
||||
and data["model"] not in router_model_names
|
||||
and llm_router.default_deployment is not None
|
||||
): # model in router deployments, calling a specific deployment on the router
|
||||
response = await llm_router.atext_completion(**data)
|
||||
elif user_model is not None: # `litellm --model <your-model-name>`
|
||||
response = await litellm.atext_completion(**data)
|
||||
else:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail={
|
||||
"error": "completion: Invalid model name passed in model="
|
||||
+ data.get("model", "")
|
||||
},
|
||||
)
|
||||
|
||||
if hasattr(response, "_hidden_params"):
|
||||
model_id = response._hidden_params.get("model_id", None) or ""
|
||||
original_response = (
|
||||
response._hidden_params.get("original_response", None) or ""
|
||||
)
|
||||
else:
|
||||
model_id = ""
|
||||
original_response = ""
|
||||
|
||||
verbose_proxy_logger.debug("final response: %s", response)
|
||||
if (
|
||||
"stream" in data and data["stream"] == True
|
||||
): # use generate_responses to stream responses
|
||||
custom_headers = {
|
||||
"x-litellm-model-id": model_id,
|
||||
}
|
||||
selected_data_generator = select_data_generator(
|
||||
response=response, user_api_key_dict=user_api_key_dict
|
||||
)
|
||||
|
||||
return StreamingResponse(
|
||||
selected_data_generator,
|
||||
media_type="text/event-stream",
|
||||
headers=custom_headers,
|
||||
)
|
||||
|
||||
fastapi_response.headers["x-litellm-model-id"] = model_id
|
||||
return response
|
||||
except Exception as e:
|
||||
data["litellm_status"] = "fail" # used for alerting
|
||||
verbose_proxy_logger.debug("EXCEPTION RAISED IN PROXY MAIN.PY")
|
||||
verbose_proxy_logger.debug(
|
||||
"\033[1;31mAn error occurred: %s\n\n Debug this by setting `--debug`, e.g. `litellm --model gpt-3.5-turbo --debug`",
|
||||
e,
|
||||
)
|
||||
traceback.print_exc()
|
||||
error_traceback = traceback.format_exc()
|
||||
error_msg = f"{str(e)}"
|
||||
raise ProxyException(
|
||||
message=getattr(e, "message", error_msg),
|
||||
type=getattr(e, "type", "None"),
|
||||
param=getattr(e, "param", "None"),
|
||||
code=getattr(e, "status_code", 500),
|
||||
)
|
||||
|
||||
|
||||
@router.post(
|
||||
"/v1/embeddings",
|
||||
dependencies=[Depends(user_api_key_auth)],
|
||||
|
@ -4041,7 +4041,7 @@ async def embeddings(
|
|||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail={
|
||||
"error": "Invalid model name passed in model="
|
||||
"error": "embeddings: Invalid model name passed in model="
|
||||
+ data.get("model", "")
|
||||
},
|
||||
)
|
||||
|
@ -4197,7 +4197,7 @@ async def image_generation(
|
|||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail={
|
||||
"error": "Invalid model name passed in model="
|
||||
"error": "image_generation: Invalid model name passed in model="
|
||||
+ data.get("model", "")
|
||||
},
|
||||
)
|
||||
|
@ -4372,7 +4372,7 @@ async def audio_transcriptions(
|
|||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail={
|
||||
"error": "Invalid model name passed in model="
|
||||
"error": "audio_transcriptions: Invalid model name passed in model="
|
||||
+ data.get("model", "")
|
||||
},
|
||||
)
|
||||
|
@ -4538,7 +4538,7 @@ async def moderations(
|
|||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail={
|
||||
"error": "Invalid model name passed in model="
|
||||
"error": "moderations: Invalid model name passed in model="
|
||||
+ data.get("model", "")
|
||||
},
|
||||
)
|
||||
|
|
|
@ -387,8 +387,14 @@ class ProxyLogging:
|
|||
"""
|
||||
|
||||
### ALERTING ###
|
||||
if "llm_exceptions" not in self.alert_types:
|
||||
return
|
||||
if "llm_exceptions" in self.alert_types and not isinstance(
|
||||
original_exception, HTTPException
|
||||
):
|
||||
"""
|
||||
Just alert on LLM API exceptions. Do not alert on user errors
|
||||
|
||||
Related issue - https://github.com/BerriAI/litellm/issues/3395
|
||||
"""
|
||||
asyncio.create_task(
|
||||
self.alerting_handler(
|
||||
message=f"LLM API call failed: {str(original_exception)}",
|
||||
|
@ -679,8 +685,8 @@ class PrismaClient:
|
|||
@backoff.on_exception(
|
||||
backoff.expo,
|
||||
Exception, # base exception to catch for the backoff
|
||||
max_tries=3, # maximum number of retries
|
||||
max_time=10, # maximum total time to retry for
|
||||
max_tries=1, # maximum number of retries
|
||||
max_time=2, # maximum total time to retry for
|
||||
on_backoff=on_backoff, # specifying the function to call on backoff
|
||||
)
|
||||
async def get_generic_data(
|
||||
|
@ -718,7 +724,8 @@ class PrismaClient:
|
|||
import traceback
|
||||
|
||||
error_msg = f"LiteLLM Prisma Client Exception get_generic_data: {str(e)}"
|
||||
print_verbose(error_msg)
|
||||
verbose_proxy_logger.error(error_msg)
|
||||
error_msg = error_msg + "\nException Type: {}".format(type(e))
|
||||
error_traceback = error_msg + "\n" + traceback.format_exc()
|
||||
end_time = time.time()
|
||||
_duration = end_time - start_time
|
||||
|
|
|
@ -2590,6 +2590,16 @@ class Router:
|
|||
return model
|
||||
return None
|
||||
|
||||
def get_model_info(self, id: str) -> Optional[dict]:
|
||||
"""
|
||||
For a given model id, return the model info
|
||||
"""
|
||||
for model in self.model_list:
|
||||
if "model_info" in model and "id" in model["model_info"]:
|
||||
if id == model["model_info"]["id"]:
|
||||
return model
|
||||
return None
|
||||
|
||||
def get_model_ids(self):
|
||||
ids = []
|
||||
for model in self.model_list:
|
||||
|
@ -2904,15 +2914,10 @@ class Router:
|
|||
m for m in self.model_list if m["litellm_params"]["model"] == model
|
||||
]
|
||||
|
||||
verbose_router_logger.debug(
|
||||
f"initial list of deployments: {healthy_deployments}"
|
||||
)
|
||||
litellm.print_verbose(f"initial list of deployments: {healthy_deployments}")
|
||||
|
||||
verbose_router_logger.debug(
|
||||
f"healthy deployments: length {len(healthy_deployments)} {healthy_deployments}"
|
||||
)
|
||||
if len(healthy_deployments) == 0:
|
||||
raise ValueError(f"No healthy deployment available, passed model={model}")
|
||||
raise ValueError(f"No healthy deployment available, passed model={model}. ")
|
||||
if litellm.model_alias_map and model in litellm.model_alias_map:
|
||||
model = litellm.model_alias_map[
|
||||
model
|
||||
|
|
|
@ -79,10 +79,12 @@ class LowestTPMLoggingHandler_v2(CustomLogger):
|
|||
model=deployment.get("litellm_params", {}).get("model"),
|
||||
response=httpx.Response(
|
||||
status_code=429,
|
||||
content="{} rpm limit={}. current usage={}".format(
|
||||
content="{} rpm limit={}. current usage={}. id={}, model_group={}. Get the model info by calling 'router.get_model_info(id)".format(
|
||||
RouterErrors.user_defined_ratelimit_error.value,
|
||||
deployment_rpm,
|
||||
local_result,
|
||||
model_id,
|
||||
deployment.get("model_name", ""),
|
||||
),
|
||||
request=httpx.Request(method="tpm_rpm_limits", url="https://github.com/BerriAI/litellm"), # type: ignore
|
||||
),
|
||||
|
|
|
@ -0,0 +1,88 @@
|
|||
int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
Traceback (most recent call last):
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
|
||||
"usage": _convert_usage_input(usage) if usage is not None else None,
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
|
||||
"totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
|
||||
return int(usage[key])
|
||||
^^^^^^^^^^^^^^^
|
||||
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
Traceback (most recent call last):
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
|
||||
"usage": _convert_usage_input(usage) if usage is not None else None,
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
|
||||
"totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
|
||||
return int(usage[key])
|
||||
^^^^^^^^^^^^^^^
|
||||
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
Traceback (most recent call last):
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
|
||||
"usage": _convert_usage_input(usage) if usage is not None else None,
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
|
||||
"totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
|
||||
return int(usage[key])
|
||||
^^^^^^^^^^^^^^^
|
||||
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
Traceback (most recent call last):
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
|
||||
"usage": _convert_usage_input(usage) if usage is not None else None,
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
|
||||
"totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
|
||||
return int(usage[key])
|
||||
^^^^^^^^^^^^^^^
|
||||
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
Traceback (most recent call last):
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/client.py", line 778, in generation
|
||||
"usage": _convert_usage_input(usage) if usage is not None else None,
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 77, in _convert_usage_input
|
||||
"totalCost": extract_by_priority(usage, ["totalCost", "total_cost"]),
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
File "/opt/homebrew/lib/python3.11/site-packages/langfuse/utils.py", line 32, in extract_by_priority
|
||||
return int(usage[key])
|
||||
^^^^^^^^^^^^^^^
|
||||
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
|
||||
consumer is running...
|
||||
Getting observations... None, None, None, None, litellm-test-98e1cc75-bef8-4280-a2b9-e08633b81acd, None, GENERATION
|
||||
consumer is running...
|
||||
Getting observations... None, None, None, None, litellm-test-532d2bc8-f8d6-42fd-8f78-416bae79925d, None, GENERATION
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
||||
joining 1 consumer threads
|
||||
consumer thread 0 joined
|
|
@ -205,8 +205,6 @@ async def test_langfuse_logging_without_request_response(stream):
|
|||
assert _trace_data[0].output == {
|
||||
"role": "assistant",
|
||||
"content": "redacted-by-litellm",
|
||||
"function_call": None,
|
||||
"tool_calls": None,
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
|
|
|
@ -3,7 +3,7 @@
|
|||
|
||||
import sys
|
||||
import os
|
||||
import io, asyncio
|
||||
import io, asyncio, httpx
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
# import logging
|
||||
|
@ -17,6 +17,61 @@ import asyncio
|
|||
from unittest.mock import patch, MagicMock
|
||||
from litellm.caching import DualCache
|
||||
from litellm.integrations.slack_alerting import SlackAlerting
|
||||
from litellm.proxy._types import UserAPIKeyAuth
|
||||
from litellm.proxy.proxy_server import HTTPException
|
||||
|
||||
|
||||
@pytest.mark.parametrize("exception_type", ["llm-exception", "non-llm-exception"])
|
||||
@pytest.mark.asyncio
|
||||
async def test_slack_alerting_llm_exceptions(exception_type, monkeypatch):
|
||||
"""
|
||||
Test if non-llm exception -> No request
|
||||
Test if llm exception -> Request triggered
|
||||
"""
|
||||
_pl = ProxyLogging(user_api_key_cache=DualCache())
|
||||
_pl.update_values(
|
||||
alerting=["slack"],
|
||||
alerting_threshold=100,
|
||||
redis_cache=None,
|
||||
alert_types=["llm_exceptions"],
|
||||
)
|
||||
|
||||
async def mock_alerting_handler(message, level, alert_type):
|
||||
global exception_type
|
||||
|
||||
if exception_type == "llm-exception":
|
||||
pass
|
||||
elif exception_type == "non-llm-exception":
|
||||
pytest.fail("Function should not have been called")
|
||||
|
||||
monkeypatch.setattr(_pl, "alerting_handler", mock_alerting_handler)
|
||||
|
||||
if exception_type == "llm-exception":
|
||||
await _pl.post_call_failure_hook(
|
||||
original_exception=litellm.APIError(
|
||||
status_code=500,
|
||||
message="This is a test exception",
|
||||
llm_provider="openai",
|
||||
model="gpt-3.5-turbo",
|
||||
request=httpx.Request(
|
||||
method="completion", url="https://github.com/BerriAI/litellm"
|
||||
),
|
||||
),
|
||||
user_api_key_dict=UserAPIKeyAuth(),
|
||||
)
|
||||
|
||||
await asyncio.sleep(2)
|
||||
|
||||
elif exception_type == "non-llm-exception":
|
||||
await _pl.post_call_failure_hook(
|
||||
original_exception=HTTPException(
|
||||
status_code=400,
|
||||
detail={"error": "this is a test exception"},
|
||||
),
|
||||
user_api_key_dict=UserAPIKeyAuth(),
|
||||
)
|
||||
|
||||
await asyncio.sleep(2)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
|
|
|
@ -169,7 +169,7 @@ def test_chat_completion_exception_any_model(client):
|
|||
)
|
||||
assert isinstance(openai_exception, openai.BadRequestError)
|
||||
_error_message = openai_exception.message
|
||||
assert "Invalid model name passed in model=Lite-GPT-12" in str(_error_message)
|
||||
assert "chat_completion: Invalid model name passed in model=Lite-GPT-12" in str(_error_message)
|
||||
|
||||
except Exception as e:
|
||||
pytest.fail(f"LiteLLM Proxy test failed. Exception {str(e)}")
|
||||
|
@ -197,7 +197,7 @@ def test_embedding_exception_any_model(client):
|
|||
print("Exception raised=", openai_exception)
|
||||
assert isinstance(openai_exception, openai.BadRequestError)
|
||||
_error_message = openai_exception.message
|
||||
assert "Invalid model name passed in model=Lite-GPT-12" in str(_error_message)
|
||||
assert "embeddings: Invalid model name passed in model=Lite-GPT-12" in str(_error_message)
|
||||
|
||||
except Exception as e:
|
||||
pytest.fail(f"LiteLLM Proxy test failed. Exception {str(e)}")
|
||||
|
|
|
@ -1,5 +1,6 @@
|
|||
import sys, os
|
||||
import traceback
|
||||
from unittest import mock
|
||||
from dotenv import load_dotenv
|
||||
|
||||
load_dotenv()
|
||||
|
@ -35,6 +36,77 @@ token = "sk-1234"
|
|||
|
||||
headers = {"Authorization": f"Bearer {token}"}
|
||||
|
||||
example_completion_result = {
|
||||
"choices": [
|
||||
{
|
||||
"message": {
|
||||
"content": "Whispers of the wind carry dreams to me.",
|
||||
"role": "assistant"
|
||||
}
|
||||
}
|
||||
],
|
||||
}
|
||||
example_embedding_result = {
|
||||
"object": "list",
|
||||
"data": [
|
||||
{
|
||||
"object": "embedding",
|
||||
"index": 0,
|
||||
"embedding": [
|
||||
-0.006929283495992422,
|
||||
-0.005336422007530928,
|
||||
-4.547132266452536e-05,
|
||||
-0.024047505110502243,
|
||||
-0.006929283495992422,
|
||||
-0.005336422007530928,
|
||||
-4.547132266452536e-05,
|
||||
-0.024047505110502243,
|
||||
-0.006929283495992422,
|
||||
-0.005336422007530928,
|
||||
-4.547132266452536e-05,
|
||||
-0.024047505110502243,
|
||||
],
|
||||
}
|
||||
],
|
||||
"model": "text-embedding-3-small",
|
||||
"usage": {
|
||||
"prompt_tokens": 5,
|
||||
"total_tokens": 5
|
||||
}
|
||||
}
|
||||
example_image_generation_result = {
|
||||
"created": 1589478378,
|
||||
"data": [
|
||||
{
|
||||
"url": "https://..."
|
||||
},
|
||||
{
|
||||
"url": "https://..."
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
def mock_patch_acompletion():
|
||||
return mock.patch(
|
||||
"litellm.proxy.proxy_server.llm_router.acompletion",
|
||||
return_value=example_completion_result,
|
||||
)
|
||||
|
||||
|
||||
def mock_patch_aembedding():
|
||||
return mock.patch(
|
||||
"litellm.proxy.proxy_server.llm_router.aembedding",
|
||||
return_value=example_embedding_result,
|
||||
)
|
||||
|
||||
|
||||
def mock_patch_aimage_generation():
|
||||
return mock.patch(
|
||||
"litellm.proxy.proxy_server.llm_router.aimage_generation",
|
||||
return_value=example_image_generation_result,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(scope="function")
|
||||
def client_no_auth():
|
||||
|
@ -52,7 +124,8 @@ def client_no_auth():
|
|||
return TestClient(app)
|
||||
|
||||
|
||||
def test_chat_completion(client_no_auth):
|
||||
@mock_patch_acompletion()
|
||||
def test_chat_completion(mock_acompletion, client_no_auth):
|
||||
global headers
|
||||
try:
|
||||
# Your test data
|
||||
|
@ -66,6 +139,19 @@ def test_chat_completion(client_no_auth):
|
|||
|
||||
print("testing proxy server with chat completions")
|
||||
response = client_no_auth.post("/v1/chat/completions", json=test_data)
|
||||
mock_acompletion.assert_called_once_with(
|
||||
model="gpt-3.5-turbo",
|
||||
messages=[
|
||||
{"role": "user", "content": "hi"},
|
||||
],
|
||||
max_tokens=10,
|
||||
litellm_call_id=mock.ANY,
|
||||
litellm_logging_obj=mock.ANY,
|
||||
request_timeout=mock.ANY,
|
||||
specific_deployment=True,
|
||||
metadata=mock.ANY,
|
||||
proxy_server_request=mock.ANY,
|
||||
)
|
||||
print(f"response - {response.text}")
|
||||
assert response.status_code == 200
|
||||
result = response.json()
|
||||
|
@ -77,7 +163,8 @@ def test_chat_completion(client_no_auth):
|
|||
# Run the test
|
||||
|
||||
|
||||
def test_chat_completion_azure(client_no_auth):
|
||||
@mock_patch_acompletion()
|
||||
def test_chat_completion_azure(mock_acompletion, client_no_auth):
|
||||
global headers
|
||||
try:
|
||||
# Your test data
|
||||
|
@ -92,6 +179,19 @@ def test_chat_completion_azure(client_no_auth):
|
|||
print("testing proxy server with Azure Request /chat/completions")
|
||||
response = client_no_auth.post("/v1/chat/completions", json=test_data)
|
||||
|
||||
mock_acompletion.assert_called_once_with(
|
||||
model="azure/chatgpt-v-2",
|
||||
messages=[
|
||||
{"role": "user", "content": "write 1 sentence poem"},
|
||||
],
|
||||
max_tokens=10,
|
||||
litellm_call_id=mock.ANY,
|
||||
litellm_logging_obj=mock.ANY,
|
||||
request_timeout=mock.ANY,
|
||||
specific_deployment=True,
|
||||
metadata=mock.ANY,
|
||||
proxy_server_request=mock.ANY,
|
||||
)
|
||||
assert response.status_code == 200
|
||||
result = response.json()
|
||||
print(f"Received response: {result}")
|
||||
|
@ -104,8 +204,51 @@ def test_chat_completion_azure(client_no_auth):
|
|||
# test_chat_completion_azure()
|
||||
|
||||
|
||||
@mock_patch_acompletion()
|
||||
def test_openai_deployments_model_chat_completions_azure(mock_acompletion, client_no_auth):
|
||||
global headers
|
||||
try:
|
||||
# Your test data
|
||||
test_data = {
|
||||
"model": "azure/chatgpt-v-2",
|
||||
"messages": [
|
||||
{"role": "user", "content": "write 1 sentence poem"},
|
||||
],
|
||||
"max_tokens": 10,
|
||||
}
|
||||
|
||||
url = "/openai/deployments/azure/chatgpt-v-2/chat/completions"
|
||||
print(f"testing proxy server with Azure Request {url}")
|
||||
response = client_no_auth.post(url, json=test_data)
|
||||
|
||||
mock_acompletion.assert_called_once_with(
|
||||
model="azure/chatgpt-v-2",
|
||||
messages=[
|
||||
{"role": "user", "content": "write 1 sentence poem"},
|
||||
],
|
||||
max_tokens=10,
|
||||
litellm_call_id=mock.ANY,
|
||||
litellm_logging_obj=mock.ANY,
|
||||
request_timeout=mock.ANY,
|
||||
specific_deployment=True,
|
||||
metadata=mock.ANY,
|
||||
proxy_server_request=mock.ANY,
|
||||
)
|
||||
assert response.status_code == 200
|
||||
result = response.json()
|
||||
print(f"Received response: {result}")
|
||||
assert len(result["choices"][0]["message"]["content"]) > 0
|
||||
except Exception as e:
|
||||
pytest.fail(f"LiteLLM Proxy test failed. Exception - {str(e)}")
|
||||
|
||||
|
||||
# Run the test
|
||||
# test_openai_deployments_model_chat_completions_azure()
|
||||
|
||||
|
||||
### EMBEDDING
|
||||
def test_embedding(client_no_auth):
|
||||
@mock_patch_aembedding()
|
||||
def test_embedding(mock_aembedding, client_no_auth):
|
||||
global headers
|
||||
from litellm.proxy.proxy_server import user_custom_auth
|
||||
|
||||
|
@ -117,6 +260,13 @@ def test_embedding(client_no_auth):
|
|||
|
||||
response = client_no_auth.post("/v1/embeddings", json=test_data)
|
||||
|
||||
mock_aembedding.assert_called_once_with(
|
||||
model="azure/azure-embedding-model",
|
||||
input=["good morning from litellm"],
|
||||
specific_deployment=True,
|
||||
metadata=mock.ANY,
|
||||
proxy_server_request=mock.ANY,
|
||||
)
|
||||
assert response.status_code == 200
|
||||
result = response.json()
|
||||
print(len(result["data"][0]["embedding"]))
|
||||
|
@ -125,7 +275,8 @@ def test_embedding(client_no_auth):
|
|||
pytest.fail(f"LiteLLM Proxy test failed. Exception - {str(e)}")
|
||||
|
||||
|
||||
def test_bedrock_embedding(client_no_auth):
|
||||
@mock_patch_aembedding()
|
||||
def test_bedrock_embedding(mock_aembedding, client_no_auth):
|
||||
global headers
|
||||
from litellm.proxy.proxy_server import user_custom_auth
|
||||
|
||||
|
@ -137,6 +288,12 @@ def test_bedrock_embedding(client_no_auth):
|
|||
|
||||
response = client_no_auth.post("/v1/embeddings", json=test_data)
|
||||
|
||||
mock_aembedding.assert_called_once_with(
|
||||
model="amazon-embeddings",
|
||||
input=["good morning from litellm"],
|
||||
metadata=mock.ANY,
|
||||
proxy_server_request=mock.ANY,
|
||||
)
|
||||
assert response.status_code == 200
|
||||
result = response.json()
|
||||
print(len(result["data"][0]["embedding"]))
|
||||
|
@ -171,7 +328,8 @@ def test_sagemaker_embedding(client_no_auth):
|
|||
#### IMAGE GENERATION
|
||||
|
||||
|
||||
def test_img_gen(client_no_auth):
|
||||
@mock_patch_aimage_generation()
|
||||
def test_img_gen(mock_aimage_generation, client_no_auth):
|
||||
global headers
|
||||
from litellm.proxy.proxy_server import user_custom_auth
|
||||
|
||||
|
@ -185,6 +343,14 @@ def test_img_gen(client_no_auth):
|
|||
|
||||
response = client_no_auth.post("/v1/images/generations", json=test_data)
|
||||
|
||||
mock_aimage_generation.assert_called_once_with(
|
||||
model='dall-e-3',
|
||||
prompt='A cute baby sea otter',
|
||||
n=1,
|
||||
size='1024x1024',
|
||||
metadata=mock.ANY,
|
||||
proxy_server_request=mock.ANY,
|
||||
)
|
||||
assert response.status_code == 200
|
||||
result = response.json()
|
||||
print(len(result["data"][0]["url"]))
|
||||
|
@ -249,7 +415,8 @@ class MyCustomHandler(CustomLogger):
|
|||
customHandler = MyCustomHandler()
|
||||
|
||||
|
||||
def test_chat_completion_optional_params(client_no_auth):
|
||||
@mock_patch_acompletion()
|
||||
def test_chat_completion_optional_params(mock_acompletion, client_no_auth):
|
||||
# [PROXY: PROD TEST] - DO NOT DELETE
|
||||
# This tests if all the /chat/completion params are passed to litellm
|
||||
try:
|
||||
|
@ -267,6 +434,20 @@ def test_chat_completion_optional_params(client_no_auth):
|
|||
litellm.callbacks = [customHandler]
|
||||
print("testing proxy server: optional params")
|
||||
response = client_no_auth.post("/v1/chat/completions", json=test_data)
|
||||
mock_acompletion.assert_called_once_with(
|
||||
model="gpt-3.5-turbo",
|
||||
messages=[
|
||||
{"role": "user", "content": "hi"},
|
||||
],
|
||||
max_tokens=10,
|
||||
user="proxy-user",
|
||||
litellm_call_id=mock.ANY,
|
||||
litellm_logging_obj=mock.ANY,
|
||||
request_timeout=mock.ANY,
|
||||
specific_deployment=True,
|
||||
metadata=mock.ANY,
|
||||
proxy_server_request=mock.ANY,
|
||||
)
|
||||
assert response.status_code == 200
|
||||
result = response.json()
|
||||
print(f"Received response: {result}")
|
||||
|
|
|
@ -9,7 +9,7 @@ sys.path.insert(
|
|||
0, os.path.abspath("../..")
|
||||
) # Adds the parent directory to the system path
|
||||
import time
|
||||
from litellm import token_counter, encode, decode
|
||||
from litellm import token_counter, create_pretrained_tokenizer, encode, decode
|
||||
|
||||
|
||||
def test_token_counter_normal_plus_function_calling():
|
||||
|
@ -69,15 +69,23 @@ def test_tokenizers():
|
|||
model="meta-llama/Llama-2-7b-chat", text=sample_text
|
||||
)
|
||||
|
||||
# llama3 tokenizer (also testing custom tokenizer)
|
||||
llama3_tokens_1 = token_counter(model="meta-llama/llama-3-70b-instruct", text=sample_text)
|
||||
|
||||
llama3_tokenizer = create_pretrained_tokenizer("Xenova/llama-3-tokenizer")
|
||||
llama3_tokens_2 = token_counter(custom_tokenizer=llama3_tokenizer, text=sample_text)
|
||||
|
||||
print(
|
||||
f"openai tokens: {openai_tokens}; claude tokens: {claude_tokens}; cohere tokens: {cohere_tokens}; llama2 tokens: {llama2_tokens}"
|
||||
f"openai tokens: {openai_tokens}; claude tokens: {claude_tokens}; cohere tokens: {cohere_tokens}; llama2 tokens: {llama2_tokens}; llama3 tokens: {llama3_tokens_1}"
|
||||
)
|
||||
|
||||
# assert that all token values are different
|
||||
assert (
|
||||
openai_tokens != cohere_tokens != llama2_tokens
|
||||
openai_tokens != cohere_tokens != llama2_tokens != llama3_tokens_1
|
||||
), "Token values are not different."
|
||||
|
||||
assert llama3_tokens_1 == llama3_tokens_2, "Custom tokenizer is not being used! It has been configured to use the same tokenizer as the built in llama3 tokenizer and the results should be the same."
|
||||
|
||||
print("test tokenizer: It worked!")
|
||||
except Exception as e:
|
||||
pytest.fail(f"An exception occured: {e}")
|
||||
|
|
|
@ -20,6 +20,8 @@ from litellm.utils import (
|
|||
validate_environment,
|
||||
function_to_dict,
|
||||
token_counter,
|
||||
create_pretrained_tokenizer,
|
||||
create_tokenizer,
|
||||
)
|
||||
|
||||
# Assuming your trim_messages, shorten_message_to_fit_limit, and get_token_count functions are all in a module named 'message_utils'
|
||||
|
|
150
litellm/utils.py
150
litellm/utils.py
|
@ -378,16 +378,13 @@ class Message(OpenAIObject):
|
|||
super(Message, self).__init__(**params)
|
||||
self.content = content
|
||||
self.role = role
|
||||
self.tool_calls = None
|
||||
self.function_call = None
|
||||
|
||||
if function_call is not None:
|
||||
self.function_call = FunctionCall(**function_call)
|
||||
|
||||
if tool_calls is not None:
|
||||
self.tool_calls = [
|
||||
ChatCompletionMessageToolCall(**tool_call) for tool_call in tool_calls
|
||||
]
|
||||
self.tool_calls = []
|
||||
for tool_call in tool_calls:
|
||||
self.tool_calls.append(ChatCompletionMessageToolCall(**tool_call))
|
||||
|
||||
if logprobs is not None:
|
||||
self._logprobs = ChoiceLogprobs(**logprobs)
|
||||
|
@ -413,8 +410,6 @@ class Message(OpenAIObject):
|
|||
|
||||
|
||||
class Delta(OpenAIObject):
|
||||
tool_calls: Optional[List[ChatCompletionDeltaToolCall]] = None
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
content=None,
|
||||
|
@ -1700,10 +1695,17 @@ class Logging:
|
|||
print_verbose("reaches langfuse for streaming logging!")
|
||||
result = kwargs["complete_streaming_response"]
|
||||
if langFuseLogger is None or (
|
||||
self.langfuse_public_key != langFuseLogger.public_key
|
||||
and self.langfuse_secret != langFuseLogger.secret_key
|
||||
(
|
||||
self.langfuse_public_key is not None
|
||||
and self.langfuse_public_key
|
||||
!= langFuseLogger.public_key
|
||||
)
|
||||
and (
|
||||
self.langfuse_public_key is not None
|
||||
and self.langfuse_public_key
|
||||
!= langFuseLogger.public_key
|
||||
)
|
||||
):
|
||||
print_verbose("Instantiates langfuse client")
|
||||
langFuseLogger = LangFuseLogger(
|
||||
langfuse_public_key=self.langfuse_public_key,
|
||||
langfuse_secret=self.langfuse_secret,
|
||||
|
@ -3773,29 +3775,34 @@ def _select_tokenizer(model: str):
|
|||
elif "llama-2" in model.lower() or "replicate" in model.lower():
|
||||
tokenizer = Tokenizer.from_pretrained("hf-internal-testing/llama-tokenizer")
|
||||
return {"type": "huggingface_tokenizer", "tokenizer": tokenizer}
|
||||
# llama3
|
||||
elif "llama-3" in model.lower():
|
||||
tokenizer = Tokenizer.from_pretrained("Xenova/llama-3-tokenizer")
|
||||
return {"type": "huggingface_tokenizer", "tokenizer": tokenizer}
|
||||
# default - tiktoken
|
||||
else:
|
||||
return {"type": "openai_tokenizer", "tokenizer": encoding}
|
||||
|
||||
|
||||
def encode(model: str, text: str):
|
||||
def encode(model="", text="", custom_tokenizer: Optional[dict] = None):
|
||||
"""
|
||||
Encodes the given text using the specified model.
|
||||
|
||||
Args:
|
||||
model (str): The name of the model to use for tokenization.
|
||||
custom_tokenizer (Optional[dict]): A custom tokenizer created with the `create_pretrained_tokenizer` or `create_tokenizer` method. Must be a dictionary with a string value for `type` and Tokenizer for `tokenizer`. Default is None.
|
||||
text (str): The text to be encoded.
|
||||
|
||||
Returns:
|
||||
enc: The encoded text.
|
||||
"""
|
||||
tokenizer_json = _select_tokenizer(model=model)
|
||||
tokenizer_json = custom_tokenizer or _select_tokenizer(model=model)
|
||||
enc = tokenizer_json["tokenizer"].encode(text)
|
||||
return enc
|
||||
|
||||
|
||||
def decode(model: str, tokens: List[int]):
|
||||
tokenizer_json = _select_tokenizer(model=model)
|
||||
def decode(model="", tokens: List[int] = [], custom_tokenizer: Optional[dict] = None):
|
||||
tokenizer_json = custom_tokenizer or _select_tokenizer(model=model)
|
||||
dec = tokenizer_json["tokenizer"].decode(tokens)
|
||||
return dec
|
||||
|
||||
|
@ -3969,8 +3976,45 @@ def calculage_img_tokens(
|
|||
return total_tokens
|
||||
|
||||
|
||||
def create_pretrained_tokenizer(
|
||||
identifier: str,
|
||||
revision="main",
|
||||
auth_token: Optional[str] = None
|
||||
):
|
||||
"""
|
||||
Creates a tokenizer from an existing file on a HuggingFace repository to be used with `token_counter`.
|
||||
|
||||
Args:
|
||||
identifier (str): The identifier of a Model on the Hugging Face Hub, that contains a tokenizer.json file
|
||||
revision (str, defaults to main): A branch or commit id
|
||||
auth_token (str, optional, defaults to None): An optional auth token used to access private repositories on the Hugging Face Hub
|
||||
|
||||
Returns:
|
||||
dict: A dictionary with the tokenizer and its type.
|
||||
"""
|
||||
|
||||
tokenizer = Tokenizer.from_pretrained(identifier, revision=revision, auth_token=auth_token)
|
||||
return {"type": "huggingface_tokenizer", "tokenizer": tokenizer}
|
||||
|
||||
|
||||
def create_tokenizer(json: str):
|
||||
"""
|
||||
Creates a tokenizer from a valid JSON string for use with `token_counter`.
|
||||
|
||||
Args:
|
||||
json (str): A valid JSON string representing a previously serialized tokenizer
|
||||
|
||||
Returns:
|
||||
dict: A dictionary with the tokenizer and its type.
|
||||
"""
|
||||
|
||||
tokenizer = Tokenizer.from_str(json)
|
||||
return {"type": "huggingface_tokenizer", "tokenizer": tokenizer}
|
||||
|
||||
|
||||
def token_counter(
|
||||
model="",
|
||||
custom_tokenizer: Optional[dict] = None,
|
||||
text: Optional[Union[str, List[str]]] = None,
|
||||
messages: Optional[List] = None,
|
||||
count_response_tokens: Optional[bool] = False,
|
||||
|
@ -3980,13 +4024,14 @@ def token_counter(
|
|||
|
||||
Args:
|
||||
model (str): The name of the model to use for tokenization. Default is an empty string.
|
||||
custom_tokenizer (Optional[dict]): A custom tokenizer created with the `create_pretrained_tokenizer` or `create_tokenizer` method. Must be a dictionary with a string value for `type` and Tokenizer for `tokenizer`. Default is None.
|
||||
text (str): The raw text string to be passed to the model. Default is None.
|
||||
messages (Optional[List[Dict[str, str]]]): Alternative to passing in text. A list of dictionaries representing messages with "role" and "content" keys. Default is None.
|
||||
|
||||
Returns:
|
||||
int: The number of tokens in the text.
|
||||
"""
|
||||
# use tiktoken, anthropic, cohere or llama2's tokenizer depending on the model
|
||||
# use tiktoken, anthropic, cohere, llama2, or llama3's tokenizer depending on the model
|
||||
is_tool_call = False
|
||||
num_tokens = 0
|
||||
if text == None:
|
||||
|
@ -4028,8 +4073,8 @@ def token_counter(
|
|||
elif isinstance(text, str):
|
||||
count_response_tokens = True # user just trying to count tokens for a text. don't add the chat_ml +3 tokens to this
|
||||
|
||||
if model is not None:
|
||||
tokenizer_json = _select_tokenizer(model=model)
|
||||
if model is not None or custom_tokenizer is not None:
|
||||
tokenizer_json = custom_tokenizer or _select_tokenizer(model=model)
|
||||
if tokenizer_json["type"] == "huggingface_tokenizer":
|
||||
print_verbose(
|
||||
f"Token Counter - using hugging face token counter, for model={model}"
|
||||
|
@ -6768,7 +6813,7 @@ def validate_environment(model: Optional[str] = None) -> dict:
|
|||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("NLP_CLOUD_API_KEY")
|
||||
elif custom_llm_provider == "bedrock":
|
||||
elif custom_llm_provider == "bedrock" or custom_llm_provider == "sagemaker":
|
||||
if (
|
||||
"AWS_ACCESS_KEY_ID" in os.environ
|
||||
and "AWS_SECRET_ACCESS_KEY" in os.environ
|
||||
|
@ -6782,11 +6827,72 @@ def validate_environment(model: Optional[str] = None) -> dict:
|
|||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("OLLAMA_API_BASE")
|
||||
elif custom_llm_provider == "anyscale":
|
||||
if "ANYSCALE_API_KEY" in os.environ:
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("ANYSCALE_API_KEY")
|
||||
elif custom_llm_provider == "deepinfra":
|
||||
if "DEEPINFRA_API_KEY" in os.environ:
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("DEEPINFRA_API_KEY")
|
||||
elif custom_llm_provider == "gemini":
|
||||
if "GEMINI_API_KEY" in os.environ:
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("GEMINI_API_KEY")
|
||||
elif custom_llm_provider == "groq":
|
||||
if "GROQ_API_KEY" in os.environ:
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("GROQ_API_KEY")
|
||||
elif custom_llm_provider == "mistral":
|
||||
if "MISTRAL_API_KEY" in os.environ:
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("MISTRAL_API_KEY")
|
||||
elif custom_llm_provider == "palm":
|
||||
if "PALM_API_KEY" in os.environ:
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("PALM_API_KEY")
|
||||
elif custom_llm_provider == "perplexity":
|
||||
if "PERPLEXITYAI_API_KEY" in os.environ:
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("PERPLEXITYAI_API_KEY")
|
||||
elif custom_llm_provider == "voyage":
|
||||
if "VOYAGE_API_KEY" in os.environ:
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("VOYAGE_API_KEY")
|
||||
elif custom_llm_provider == "fireworks_ai":
|
||||
if (
|
||||
"FIREWORKS_AI_API_KEY" in os.environ
|
||||
or "FIREWORKS_API_KEY" in os.environ
|
||||
or "FIREWORKSAI_API_KEY" in os.environ
|
||||
or "FIREWORKS_AI_TOKEN" in os.environ
|
||||
):
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("FIREWORKS_AI_API_KEY")
|
||||
elif custom_llm_provider == "cloudflare":
|
||||
if "CLOUDFLARE_API_KEY" in os.environ and (
|
||||
"CLOUDFLARE_ACCOUNT_ID" in os.environ
|
||||
or "CLOUDFLARE_API_BASE" in os.environ
|
||||
):
|
||||
keys_in_environment = True
|
||||
else:
|
||||
missing_keys.append("CLOUDFLARE_API_KEY")
|
||||
missing_keys.append("CLOUDFLARE_API_BASE")
|
||||
else:
|
||||
## openai - chatcompletion + text completion
|
||||
if (
|
||||
model in litellm.open_ai_chat_completion_models
|
||||
or model in litellm.open_ai_text_completion_models
|
||||
or model in litellm.open_ai_embedding_models
|
||||
or model in litellm.openai_image_generation_models
|
||||
):
|
||||
if "OPENAI_API_KEY" in os.environ:
|
||||
keys_in_environment = True
|
||||
|
@ -6817,7 +6923,11 @@ def validate_environment(model: Optional[str] = None) -> dict:
|
|||
else:
|
||||
missing_keys.append("OPENROUTER_API_KEY")
|
||||
## vertex - text + chat models
|
||||
elif model in litellm.vertex_chat_models or model in litellm.vertex_text_models:
|
||||
elif (
|
||||
model in litellm.vertex_chat_models
|
||||
or model in litellm.vertex_text_models
|
||||
or model in litellm.models_by_provider["vertex_ai"]
|
||||
):
|
||||
if "VERTEXAI_PROJECT" in os.environ and "VERTEXAI_LOCATION" in os.environ:
|
||||
keys_in_environment = True
|
||||
else:
|
||||
|
|
|
@ -338,6 +338,18 @@
|
|||
"output_cost_per_second": 0.0001,
|
||||
"litellm_provider": "azure"
|
||||
},
|
||||
"azure/gpt-4-turbo-2024-04-09": {
|
||||
"max_tokens": 4096,
|
||||
"max_input_tokens": 128000,
|
||||
"max_output_tokens": 4096,
|
||||
"input_cost_per_token": 0.00001,
|
||||
"output_cost_per_token": 0.00003,
|
||||
"litellm_provider": "azure",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_parallel_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"azure/gpt-4-0125-preview": {
|
||||
"max_tokens": 4096,
|
||||
"max_input_tokens": 128000,
|
||||
|
@ -813,6 +825,7 @@
|
|||
"litellm_provider": "anthropic",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true,
|
||||
"tool_use_system_prompt_tokens": 264
|
||||
},
|
||||
"claude-3-opus-20240229": {
|
||||
|
@ -824,6 +837,7 @@
|
|||
"litellm_provider": "anthropic",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true,
|
||||
"tool_use_system_prompt_tokens": 395
|
||||
},
|
||||
"claude-3-sonnet-20240229": {
|
||||
|
@ -835,6 +849,7 @@
|
|||
"litellm_provider": "anthropic",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true,
|
||||
"tool_use_system_prompt_tokens": 159
|
||||
},
|
||||
"text-bison": {
|
||||
|
@ -1142,7 +1157,8 @@
|
|||
"output_cost_per_token": 0.000015,
|
||||
"litellm_provider": "vertex_ai-anthropic_models",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"vertex_ai/claude-3-haiku@20240307": {
|
||||
"max_tokens": 4096,
|
||||
|
@ -1152,7 +1168,8 @@
|
|||
"output_cost_per_token": 0.00000125,
|
||||
"litellm_provider": "vertex_ai-anthropic_models",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"vertex_ai/claude-3-opus@20240229": {
|
||||
"max_tokens": 4096,
|
||||
|
@ -1162,7 +1179,8 @@
|
|||
"output_cost_per_token": 0.0000075,
|
||||
"litellm_provider": "vertex_ai-anthropic_models",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"textembedding-gecko": {
|
||||
"max_tokens": 3072,
|
||||
|
@ -1581,6 +1599,7 @@
|
|||
"litellm_provider": "openrouter",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true,
|
||||
"tool_use_system_prompt_tokens": 395
|
||||
},
|
||||
"openrouter/google/palm-2-chat-bison": {
|
||||
|
@ -1929,7 +1948,8 @@
|
|||
"output_cost_per_token": 0.000015,
|
||||
"litellm_provider": "bedrock",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"anthropic.claude-3-haiku-20240307-v1:0": {
|
||||
"max_tokens": 4096,
|
||||
|
@ -1939,7 +1959,8 @@
|
|||
"output_cost_per_token": 0.00000125,
|
||||
"litellm_provider": "bedrock",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"anthropic.claude-3-opus-20240229-v1:0": {
|
||||
"max_tokens": 4096,
|
||||
|
@ -1949,7 +1970,8 @@
|
|||
"output_cost_per_token": 0.000075,
|
||||
"litellm_provider": "bedrock",
|
||||
"mode": "chat",
|
||||
"supports_function_calling": true
|
||||
"supports_function_calling": true,
|
||||
"supports_vision": true
|
||||
},
|
||||
"anthropic.claude-v1": {
|
||||
"max_tokens": 8191,
|
||||
|
|
6
poetry.lock
generated
6
poetry.lock
generated
|
@ -1153,13 +1153,13 @@ typing = ["types-PyYAML", "types-requests", "types-simplejson", "types-toml", "t
|
|||
|
||||
[[package]]
|
||||
name = "idna"
|
||||
version = "3.6"
|
||||
version = "3.7"
|
||||
description = "Internationalized Domain Names in Applications (IDNA)"
|
||||
optional = false
|
||||
python-versions = ">=3.5"
|
||||
files = [
|
||||
{file = "idna-3.6-py3-none-any.whl", hash = "sha256:c05567e9c24a6b9faaa835c4821bad0590fbb9d5779e7caa6e1cc4978e7eb24f"},
|
||||
{file = "idna-3.6.tar.gz", hash = "sha256:9ecdbbd083b06798ae1e86adcbfe8ab1479cf864e4ee30fe4e46a003d12491ca"},
|
||||
{file = "idna-3.7-py3-none-any.whl", hash = "sha256:82fee1fc78add43492d3a1898bfa6d8a904cc97d8427f683ed8e798d07761aa0"},
|
||||
{file = "idna-3.7.tar.gz", hash = "sha256:028ff3aadf0609c1fd278d8ea3089299412a7a8b9bd005dd08b9f8285bcb5cfc"},
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue