forked from phoenix/litellm-mirror
* fix(pattern_matching_router.py): update model name using correct function
* fix(langfuse.py): metadata deepcopy can cause unhandled error (#6563)
Co-authored-by: seva <seva@inita.com>
* fix(stream_chunk_builder_utils.py): correctly set prompt tokens + log correct streaming usage
Closes https://github.com/BerriAI/litellm/issues/6488
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* fix ImageObject conversion (#6584)
* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* fix allow using 15 seconds for premium license check
* testing fix bedrock deprecated cohere.command-text-v14
* (feat) add `Predicted Outputs` for OpenAI (#6594)
* bump openai to openai==1.54.0
* add 'prediction' param
* testing fix bedrock deprecated cohere.command-text-v14
* test test_openai_prediction_param.py
* test_openai_prediction_param_with_caching
* doc Predicted Outputs
* doc Predicted Output
* (fix) Vertex Improve Performance when using `image_url` (#6593)
* fix transformation vertex
* test test_process_gemini_image
* test_image_completion_request
* testing fix - bedrock has deprecated cohere.command-text-v14
* fix vertex pdf
* bump: version 1.51.5 → 1.52.0
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check
* fix(lowest_tpm_rpm_v2.py): return headers in correct format
* test: update test
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* test: remove eol model
* fix(proxy_server.py): fix db config loading logic
* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten
* test: skip test if required env var is missing
* test: fix test
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
* test: mark flaky test
* test: handle anthropic api instability
* test(test_proxy_utils.py): add testing for db config update logic
* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version (#6597)
* build(deps): bump cookie and express in /docs/my-website (#6566)
Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.
Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)
Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)
---
updated-dependencies:
- dependency-name: cookie
dependency-type: indirect
- dependency-name: express
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* docs(virtual_keys.md): update Dockerfile reference (#6554)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
* (proxy fix) - call connect on prisma client when running setup (#6534)
* critical fix - call connect on prisma client when running setup
* fix test_proxy_server_prisma_setup
* fix test_proxy_server_prisma_setup
* Add 3.5 haiku (#6588)
* feat: add claude-3-5-haiku-20241022 entries
* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models
* add missing entries, remove vision
* remove image token costs
* Litellm perf improvements 3 (#6573)
* perf: move writing key to cache, to background task
* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils
adds 200ms on calls with pgdb connected
* fix(litellm_pre_call_utils.py'): rename call_type to actual call used
* perf(proxy_server.py): remove db logic from _get_config_from_file
was causing db calls to occur on every llm request, if team_id was set on key
* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db
reduces latency/call by ~100ms
* fix(proxy_server.py): minor fix on existing_settings not incl alerting
* fix(exception_mapping_utils.py): map databricks exception string
* fix(auth_checks.py): fix auth check logic
* test: correctly mark flaky test
* fix(utils.py): handle auth token error for tokenizers.from_pretrained
* build: fix map
* build: fix map
* build: fix json for model map
* fix ImageObject conversion (#6584)
* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)
* unit test test_huggingface_text_completion_logprobs
* fix return TextCompletionHandler convert_chat_to_text_completion
* fix hf rest api
* fix test_huggingface_text_completion_logprobs
* fix linting errors
* fix importLiteLLMResponseObjectHandler
* fix test for LiteLLMResponseObjectHandler
* fix test text completion
* fix allow using 15 seconds for premium license check
* testing fix bedrock deprecated cohere.command-text-v14
* (feat) add `Predicted Outputs` for OpenAI (#6594)
* bump openai to openai==1.54.0
* add 'prediction' param
* testing fix bedrock deprecated cohere.command-text-v14
* test test_openai_prediction_param.py
* test_openai_prediction_param_with_caching
* doc Predicted Outputs
* doc Predicted Output
* (fix) Vertex Improve Performance when using `image_url` (#6593)
* fix transformation vertex
* test test_process_gemini_image
* test_image_completion_request
* testing fix - bedrock has deprecated cohere.command-text-v14
* fix vertex pdf
* bump: version 1.51.5 → 1.52.0
* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com>
* fix(langfuse.py): fix linting errors
* fix: fix linting errors
* fix: fix casting error
* fix: fix typing error
* fix: add more tests
* fix(utils.py): fix return_processed_chunk_logic
* Revert "Update setuptools in docker and fastapi to latest verison, in order t…" (#6615)
This reverts commit 1a7f7bdfb7
.
* docs fix clarify team_id on team based logging
* doc fix team based logging with langfuse
* fix flake8 checks
* test: bump sleep time
* refactor: replace claude-instant-1.2 with haiku in testing
* fix(proxy_server.py): move to using sl payload in track_cost_callback
* fix(proxy_server.py): fix linting errors
* fix(proxy_server.py): fallback to kwargs(response_cost) if given
* test: remove claude-instant-1 from tests
* test: fix claude test
* docs fix clarify team_id on team based logging
* doc fix team based logging with langfuse
* build: remove lint.yml
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Vsevolod Karvetskiy <56288164+karvetskiy@users.noreply.github.com>
Co-authored-by: seva <seva@inita.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Jacob Hagstedt P Suorra <Jacobh2@users.noreply.github.com>
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com>
384 lines
15 KiB
Python
384 lines
15 KiB
Python
#### What this tests ####
|
||
# This tests litellm.token_counter() function
|
||
import traceback
|
||
import os
|
||
import sys
|
||
import time
|
||
from unittest.mock import MagicMock
|
||
|
||
import pytest
|
||
|
||
sys.path.insert(
|
||
0, os.path.abspath("../..")
|
||
) # Adds the parent directory to the system path
|
||
from unittest.mock import AsyncMock, MagicMock, patch
|
||
|
||
import litellm
|
||
from litellm import (
|
||
create_pretrained_tokenizer,
|
||
decode,
|
||
encode,
|
||
get_modified_max_tokens,
|
||
token_counter,
|
||
)
|
||
from large_text import text
|
||
from messages_with_counts import (
|
||
MESSAGES_TEXT,
|
||
MESSAGES_WITH_IMAGES,
|
||
MESSAGES_WITH_TOOLS,
|
||
)
|
||
|
||
|
||
def test_token_counter_normal_plus_function_calling():
|
||
try:
|
||
messages = [
|
||
{"role": "system", "content": "System prompt"},
|
||
{"role": "user", "content": "content1"},
|
||
{"role": "assistant", "content": "content2"},
|
||
{"role": "user", "content": "conten3"},
|
||
{
|
||
"role": "assistant",
|
||
"content": None,
|
||
"tool_calls": [
|
||
{
|
||
"id": "call_E0lOb1h6qtmflUyok4L06TgY",
|
||
"function": {
|
||
"arguments": '{"query":"search query","domain":"google.ca","gl":"ca","hl":"en"}',
|
||
"name": "SearchInternet",
|
||
},
|
||
"type": "function",
|
||
}
|
||
],
|
||
},
|
||
{
|
||
"tool_call_id": "call_E0lOb1h6qtmflUyok4L06TgY",
|
||
"role": "tool",
|
||
"name": "SearchInternet",
|
||
"content": "tool content",
|
||
},
|
||
]
|
||
tokens = token_counter(model="gpt-3.5-turbo", messages=messages)
|
||
print(f"tokens: {tokens}")
|
||
except Exception as e:
|
||
pytest.fail(f"An exception occurred - {str(e)}")
|
||
|
||
|
||
# test_token_counter_normal_plus_function_calling()
|
||
|
||
|
||
@pytest.mark.parametrize(
|
||
"message_count_pair",
|
||
MESSAGES_TEXT,
|
||
)
|
||
def test_token_counter_textonly(message_count_pair):
|
||
counted_tokens = token_counter(
|
||
model="gpt-35-turbo", messages=[message_count_pair["message"]]
|
||
)
|
||
assert counted_tokens == message_count_pair["count"]
|
||
|
||
|
||
@pytest.mark.parametrize(
|
||
"message_count_pair",
|
||
MESSAGES_WITH_IMAGES,
|
||
)
|
||
def test_token_counter_with_images(message_count_pair):
|
||
counted_tokens = token_counter(
|
||
model="gpt-4o", messages=[message_count_pair["message"]]
|
||
)
|
||
assert counted_tokens == message_count_pair["count"]
|
||
|
||
|
||
@pytest.mark.parametrize(
|
||
"message_count_pair",
|
||
MESSAGES_WITH_TOOLS,
|
||
)
|
||
def test_token_counter_with_tools(message_count_pair):
|
||
counted_tokens = token_counter(
|
||
model="gpt-35-turbo",
|
||
messages=[message_count_pair["system_message"]],
|
||
tools=message_count_pair["tools"],
|
||
tool_choice=message_count_pair["tool_choice"],
|
||
)
|
||
expected_tokens = message_count_pair["count"]
|
||
diff = counted_tokens - expected_tokens
|
||
assert (
|
||
diff >= 0 and diff <= 3
|
||
), f"Expected {expected_tokens} tokens, got {counted_tokens}. Counted tokens is only allowed to be off by 3 in the over-counting direction."
|
||
|
||
|
||
def test_tokenizers():
|
||
try:
|
||
### test the openai, claude, cohere and llama2 tokenizers.
|
||
### The tokenizer value should be different for all
|
||
sample_text = "Hellö World, this is my input string! My name is ishaan CTO"
|
||
|
||
# openai tokenizer
|
||
openai_tokens = token_counter(model="gpt-3.5-turbo", text=sample_text)
|
||
|
||
# claude tokenizer
|
||
claude_tokens = token_counter(
|
||
model="claude-3-5-haiku-20241022", text=sample_text
|
||
)
|
||
|
||
# cohere tokenizer
|
||
cohere_tokens = token_counter(model="command-nightly", text=sample_text)
|
||
|
||
# llama2 tokenizer
|
||
llama2_tokens = token_counter(
|
||
model="meta-llama/Llama-2-7b-chat", text=sample_text
|
||
)
|
||
|
||
# llama3 tokenizer (also testing custom tokenizer)
|
||
llama3_tokens_1 = token_counter(
|
||
model="meta-llama/llama-3-70b-instruct", text=sample_text
|
||
)
|
||
|
||
llama3_tokenizer = create_pretrained_tokenizer("Xenova/llama-3-tokenizer")
|
||
llama3_tokens_2 = token_counter(
|
||
custom_tokenizer=llama3_tokenizer, text=sample_text
|
||
)
|
||
|
||
print(
|
||
f"openai tokens: {openai_tokens}; claude tokens: {claude_tokens}; cohere tokens: {cohere_tokens}; llama2 tokens: {llama2_tokens}; llama3 tokens: {llama3_tokens_1}"
|
||
)
|
||
|
||
# assert that all token values are different
|
||
assert (
|
||
openai_tokens != llama2_tokens != llama3_tokens_1
|
||
), "Token values are not different."
|
||
|
||
assert (
|
||
llama3_tokens_1 == llama3_tokens_2
|
||
), "Custom tokenizer is not being used! It has been configured to use the same tokenizer as the built in llama3 tokenizer and the results should be the same."
|
||
|
||
print("test tokenizer: It worked!")
|
||
except Exception as e:
|
||
pytest.fail(f"An exception occured: {e}")
|
||
|
||
|
||
# test_tokenizers()
|
||
|
||
|
||
def test_encoding_and_decoding():
|
||
try:
|
||
sample_text = "Hellö World, this is my input string!"
|
||
# openai encoding + decoding
|
||
openai_tokens = encode(model="gpt-3.5-turbo", text=sample_text)
|
||
openai_text = decode(model="gpt-3.5-turbo", tokens=openai_tokens)
|
||
|
||
assert openai_text == sample_text
|
||
|
||
# claude encoding + decoding
|
||
claude_tokens = encode(model="claude-3-5-haiku-20241022", text=sample_text)
|
||
|
||
claude_text = decode(model="claude-3-5-haiku-20241022", tokens=claude_tokens)
|
||
|
||
assert claude_text == sample_text
|
||
|
||
# cohere encoding + decoding
|
||
cohere_tokens = encode(model="command-nightly", text=sample_text)
|
||
cohere_text = decode(model="command-nightly", tokens=cohere_tokens)
|
||
|
||
assert cohere_text == sample_text
|
||
|
||
# llama2 encoding + decoding
|
||
llama2_tokens = encode(model="meta-llama/Llama-2-7b-chat", text=sample_text)
|
||
llama2_text = decode(
|
||
model="meta-llama/Llama-2-7b-chat", tokens=llama2_tokens.ids
|
||
)
|
||
|
||
assert llama2_text == sample_text
|
||
except Exception as e:
|
||
pytest.fail(f"An exception occured: {e}\n{traceback.format_exc()}")
|
||
|
||
|
||
# test_encoding_and_decoding()
|
||
|
||
|
||
def test_gpt_vision_token_counting():
|
||
messages = [
|
||
{
|
||
"role": "user",
|
||
"content": [
|
||
{"type": "text", "text": "What’s in this image?"},
|
||
{
|
||
"type": "image_url",
|
||
"image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
|
||
},
|
||
],
|
||
}
|
||
]
|
||
tokens = token_counter(model="gpt-4-vision-preview", messages=messages)
|
||
print(f"tokens: {tokens}")
|
||
|
||
|
||
# test_gpt_vision_token_counting()
|
||
|
||
|
||
@pytest.mark.parametrize(
|
||
"model",
|
||
[
|
||
"gpt-4-vision-preview",
|
||
"gpt-4o",
|
||
"claude-3-opus-20240229",
|
||
"command-nightly",
|
||
"mistral/mistral-tiny",
|
||
],
|
||
)
|
||
def test_load_test_token_counter(model):
|
||
"""
|
||
Token count large prompt 100 times.
|
||
|
||
Assert time taken is < 1.5s.
|
||
"""
|
||
import tiktoken
|
||
|
||
messages = [{"role": "user", "content": text}] * 10
|
||
|
||
start_time = time.time()
|
||
for _ in range(10):
|
||
_ = token_counter(model=model, messages=messages)
|
||
# enc.encode("".join(m["content"] for m in messages))
|
||
|
||
end_time = time.time()
|
||
|
||
total_time = end_time - start_time
|
||
print("model={}, total test time={}".format(model, total_time))
|
||
assert total_time < 10, f"Total encoding time > 10s, {total_time}"
|
||
|
||
|
||
def test_openai_token_with_image_and_text():
|
||
model = "gpt-4o"
|
||
full_request = {
|
||
"model": "gpt-4o",
|
||
"tools": [
|
||
{
|
||
"type": "function",
|
||
"function": {
|
||
"name": "json",
|
||
"parameters": {
|
||
"type": "object",
|
||
"required": ["clause"],
|
||
"properties": {"clause": {"type": "string"}},
|
||
},
|
||
"description": "Respond with a JSON object.",
|
||
},
|
||
}
|
||
],
|
||
"logprobs": False,
|
||
"messages": [
|
||
{
|
||
"role": "user",
|
||
"content": [
|
||
{
|
||
"text": "\n Just some long text, long long text, and you know it will be longer than 7 tokens definetly.",
|
||
"type": "text",
|
||
}
|
||
],
|
||
}
|
||
],
|
||
"tool_choice": {"type": "function", "function": {"name": "json"}},
|
||
"exclude_models": [],
|
||
"disable_fallback": False,
|
||
"exclude_providers": [],
|
||
}
|
||
messages = full_request.get("messages", [])
|
||
|
||
token_count = token_counter(model=model, messages=messages)
|
||
print(token_count)
|
||
|
||
|
||
@pytest.mark.parametrize(
|
||
"model, base_model, input_tokens, user_max_tokens, expected_value",
|
||
[
|
||
("random-model", "random-model", 1024, 1024, 1024),
|
||
("command", "command", 1000000, None, None), # model max = 4096
|
||
("command", "command", 4000, 256, 96), # model max = 4096
|
||
("command", "command", 4000, 10, 10), # model max = 4096
|
||
("gpt-3.5-turbo", "gpt-3.5-turbo", 4000, 5000, 4096), # model max output = 4096
|
||
],
|
||
)
|
||
def test_get_modified_max_tokens(
|
||
model, base_model, input_tokens, user_max_tokens, expected_value
|
||
):
|
||
"""
|
||
- Test when max_output is not known => expect user_max_tokens
|
||
- Test when max_output == max_input,
|
||
- input > max_output, no max_tokens => expect None
|
||
- input + max_tokens > max_output => expect remainder
|
||
- input + max_tokens < max_output => expect max_tokens
|
||
- Test when max_tokens > max_output => expect max_output
|
||
"""
|
||
args = locals()
|
||
import litellm
|
||
|
||
litellm.token_counter = MagicMock()
|
||
|
||
def _mock_token_counter(*args, **kwargs):
|
||
return input_tokens
|
||
|
||
litellm.token_counter.side_effect = _mock_token_counter
|
||
print(f"_mock_token_counter: {_mock_token_counter()}")
|
||
messages = [{"role": "user", "content": "Hello world!"}]
|
||
|
||
calculated_value = get_modified_max_tokens(
|
||
model=model,
|
||
base_model=base_model,
|
||
messages=messages,
|
||
user_max_tokens=user_max_tokens,
|
||
buffer_perc=0,
|
||
buffer_num=0,
|
||
)
|
||
|
||
if expected_value is None:
|
||
assert calculated_value is None
|
||
else:
|
||
assert (
|
||
calculated_value == expected_value
|
||
), "Got={}, Expected={}, Params={}".format(
|
||
calculated_value, expected_value, args
|
||
)
|
||
|
||
|
||
def test_empty_tools():
|
||
messages = [{"role": "user", "content": "hey, how's it going?", "tool_calls": None}]
|
||
|
||
result = token_counter(
|
||
messages=messages,
|
||
)
|
||
|
||
print(result)
|
||
|
||
|
||
def test_gpt_4o_token_counter():
|
||
with patch.object(
|
||
litellm.utils, "openai_token_counter", new=MagicMock()
|
||
) as mock_client:
|
||
token_counter(
|
||
model="gpt-4o-2024-05-13", messages=[{"role": "user", "content": "Hey!"}]
|
||
)
|
||
|
||
mock_client.assert_called()
|
||
|
||
|
||
@pytest.mark.parametrize(
|
||
"img_url",
|
||
[
|
||
"https://blog.purpureus.net/assets/blog/personal_key_rotation/simplified-asset-graph.jpg",
|
||
"",
|
||
],
|
||
)
|
||
def test_img_url_token_counter(img_url):
|
||
|
||
from litellm.utils import get_image_dimensions
|
||
|
||
width, height = get_image_dimensions(data=img_url)
|
||
|
||
print(width, height)
|
||
|
||
assert width is not None
|
||
assert height is not None
|
||
|
||
|
||
def test_token_encode_disallowed_special():
|
||
encode(model="gpt-3.5-turbo", text="Hello, world! <|endoftext|>")
|