LiteLLM Minor Fixes & Improvements (11/05/2024) (#6590)

* fix(pattern_matching_router.py): update model name using correct function

* fix(langfuse.py): metadata deepcopy can cause unhandled error (#6563)

Co-authored-by: seva <seva@inita.com>

* fix(stream_chunk_builder_utils.py): correctly set prompt tokens + log correct streaming usage

Closes https://github.com/BerriAI/litellm/issues/6488

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* fix ImageObject conversion (#6584)

* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)

* unit test test_huggingface_text_completion_logprobs

* fix return TextCompletionHandler convert_chat_to_text_completion

* fix hf rest api

* fix test_huggingface_text_completion_logprobs

* fix linting errors

* fix importLiteLLMResponseObjectHandler

* fix test for LiteLLMResponseObjectHandler

* fix test text completion

* fix allow using 15 seconds for premium license check

* testing fix bedrock deprecated cohere.command-text-v14

* (feat) add `Predicted Outputs` for OpenAI  (#6594)

* bump openai to openai==1.54.0

* add 'prediction' param

* testing fix bedrock deprecated cohere.command-text-v14

* test test_openai_prediction_param.py

* test_openai_prediction_param_with_caching

* doc Predicted Outputs

* doc Predicted Output

* (fix) Vertex Improve Performance when using `image_url`  (#6593)

* fix transformation vertex

* test test_process_gemini_image

* test_image_completion_request

* testing fix - bedrock has deprecated cohere.command-text-v14

* fix vertex pdf

* bump: version 1.51.5 → 1.52.0

* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)

* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check

* fix(lowest_tpm_rpm_v2.py): return headers in correct format

* test: update test

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* test: remove eol model

* fix(proxy_server.py): fix db config loading logic

* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten

* test: skip test if required env var is missing

* test: fix test

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>

* test: mark flaky test

* test: handle anthropic api instability

* test(test_proxy_utils.py): add testing for db config update logic

* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version (#6597)

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* fix ImageObject conversion (#6584)

* (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546)

* unit test test_huggingface_text_completion_logprobs

* fix return TextCompletionHandler convert_chat_to_text_completion

* fix hf rest api

* fix test_huggingface_text_completion_logprobs

* fix linting errors

* fix importLiteLLMResponseObjectHandler

* fix test for LiteLLMResponseObjectHandler

* fix test text completion

* fix allow using 15 seconds for premium license check

* testing fix bedrock deprecated cohere.command-text-v14

* (feat) add `Predicted Outputs` for OpenAI  (#6594)

* bump openai to openai==1.54.0

* add 'prediction' param

* testing fix bedrock deprecated cohere.command-text-v14

* test test_openai_prediction_param.py

* test_openai_prediction_param_with_caching

* doc Predicted Outputs

* doc Predicted Output

* (fix) Vertex Improve Performance when using `image_url`  (#6593)

* fix transformation vertex

* test test_process_gemini_image

* test_image_completion_request

* testing fix - bedrock has deprecated cohere.command-text-v14

* fix vertex pdf

* bump: version 1.51.5 → 1.52.0

* Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com>

* fix(langfuse.py): fix linting errors

* fix: fix linting errors

* fix: fix casting error

* fix: fix typing error

* fix: add more tests

* fix(utils.py): fix return_processed_chunk_logic

* Revert "Update setuptools in docker and fastapi to latest verison, in order t…" (#6615)

This reverts commit 1a7f7bdfb7.

* docs fix clarify team_id on team based logging

* doc fix team based logging with langfuse

* fix flake8 checks

* test: bump sleep time

* refactor: replace claude-instant-1.2 with haiku in testing

* fix(proxy_server.py): move to using sl payload in track_cost_callback

* fix(proxy_server.py): fix linting errors

* fix(proxy_server.py): fallback to kwargs(response_cost) if given

* test: remove claude-instant-1 from tests

* test: fix claude test

* docs fix clarify team_id on team based logging

* doc fix team based logging with langfuse

* build: remove lint.yml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Vsevolod Karvetskiy <56288164+karvetskiy@users.noreply.github.com>
Co-authored-by: seva <seva@inita.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
Co-authored-by: Jacob Hagstedt P Suorra <Jacobh2@users.noreply.github.com>
Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com>
This commit is contained in:
Krish Dholakia 2024-11-07 04:17:05 +05:30 committed by GitHub
parent 66c1ee09cf
commit 136693cac4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
32 changed files with 634 additions and 533 deletions

View file

@ -13,7 +13,7 @@ import litellm
## case 1: set_function_to_prompt not set
def test_function_call_non_openai_model():
try:
model = "claude-instant-1"
model = "claude-3-5-haiku-20241022"
messages = [{"role": "user", "content": "what's the weather in sf?"}]
functions = [
{
@ -43,38 +43,4 @@ def test_function_call_non_openai_model():
# test_function_call_non_openai_model()
## case 2: add_function_to_prompt set
@pytest.mark.skip(reason="Anthropic now supports tool calling")
def test_function_call_non_openai_model_litellm_mod_set():
litellm.add_function_to_prompt = True
litellm.set_verbose = True
try:
model = "claude-instant-1.2"
messages = [{"role": "user", "content": "what's the weather in sf?"}]
functions = [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
}
]
response = litellm.completion(
model=model, messages=messages, functions=functions
)
print(f"response: {response}")
except Exception as e:
pytest.fail(f"An error occurred {e}")
# test_function_call_non_openai_model_litellm_mod_set()

View file

@ -480,28 +480,6 @@ async def test_aaalangfuse_logging_metadata(langfuse_client):
print("generation_from_langfuse", generation)
@pytest.mark.skip(reason="beta test - checking langfuse output")
def test_langfuse_logging():
try:
pre_langfuse_setup()
litellm.set_verbose = True
response = completion(
model="claude-instant-1.2",
messages=[{"role": "user", "content": "Hi 👋 - i'm claude"}],
max_tokens=10,
temperature=0.2,
)
print(response)
# time.sleep(5)
# # check langfuse.log to see if there was a failed response
# search_logs("langfuse.log")
except litellm.Timeout as e:
pass
except Exception as e:
pytest.fail(f"An exception occurred - {e}")
# test_langfuse_logging()

View file

@ -69,7 +69,7 @@ def test_batch_completions_models():
def test_batch_completion_models_all_responses():
try:
responses = batch_completion_models_all_responses(
models=["j2-light", "claude-instant-1.2"],
models=["j2-light", "claude-3-haiku-20240307"],
messages=[{"role": "user", "content": "write a poem"}],
max_tokens=10,
)

View file

@ -343,7 +343,7 @@ def test_completion_claude():
try:
# test without max tokens
response = completion(
model="claude-instant-1", messages=messages, request_timeout=10
model="claude-3-5-haiku-20241022", messages=messages, request_timeout=10
)
# Add any assertions here to check response args
print(response)

View file

@ -1562,3 +1562,65 @@ def test_logging_key_masking_gemini():
trimmed_key = key.split("key=")[1]
trimmed_key = trimmed_key.replace("*", "")
assert "PART" == trimmed_key
@pytest.mark.parametrize("sync_mode", [True, False])
@pytest.mark.asyncio
async def test_standard_logging_payload_stream_usage(sync_mode):
"""
Even if stream_options is not provided, correct usage should be logged
"""
from litellm.types.utils import StandardLoggingPayload
from litellm.main import stream_chunk_builder
stream = True
try:
# sync completion
customHandler = CompletionCustomHandler()
litellm.callbacks = [customHandler]
if sync_mode:
patch_event = "log_success_event"
return_val = MagicMock()
else:
patch_event = "async_log_success_event"
return_val = AsyncMock()
with patch.object(customHandler, patch_event, new=return_val) as mock_client:
if sync_mode:
resp = litellm.completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
stream=stream,
)
chunks = []
for chunk in resp:
chunks.append(chunk)
time.sleep(2)
else:
resp = await litellm.acompletion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
stream=stream,
)
chunks = []
async for chunk in resp:
chunks.append(chunk)
await asyncio.sleep(2)
mock_client.assert_called_once()
standard_logging_object: StandardLoggingPayload = (
mock_client.call_args.kwargs["kwargs"]["standard_logging_object"]
)
built_response = stream_chunk_builder(chunks=chunks)
assert (
built_response.usage.total_tokens
!= standard_logging_object["total_tokens"]
)
print(f"standard_logging_object usage: {built_response.usage}")
except litellm.InternalServerError:
pass

View file

@ -163,7 +163,7 @@ def invalid_auth(model): # set the model key to an invalid key, depending on th
elif model == "azure/chatgpt-v-2":
temporary_key = os.environ["AZURE_API_KEY"]
os.environ["AZURE_API_KEY"] = "bad-key"
elif model == "claude-instant-1":
elif model == "claude-3-5-haiku-20241022":
temporary_key = os.environ["ANTHROPIC_API_KEY"]
os.environ["ANTHROPIC_API_KEY"] = "bad-key"
elif model == "command-nightly":
@ -213,7 +213,7 @@ def invalid_auth(model): # set the model key to an invalid key, depending on th
elif model == "chatgpt-test":
os.environ["AZURE_API_KEY"] = temporary_key
azure = True
elif model == "claude-instant-1":
elif model == "claude-3-5-haiku-20241022":
os.environ["ANTHROPIC_API_KEY"] = temporary_key
elif model == "command-nightly":
os.environ["COHERE_API_KEY"] = temporary_key

View file

@ -77,71 +77,6 @@ async def test_langsmith_queue_logging():
pytest.fail(f"Error occurred: {e}")
@pytest.mark.skip(reason="Flaky test. covered by unit tests on custom logger.")
@pytest.mark.asyncio()
async def test_async_langsmith_logging():
try:
test_langsmith_logger = LangsmithLogger()
run_id = str(uuid.uuid4())
litellm.set_verbose = True
litellm.callbacks = ["langsmith"]
response = await litellm.acompletion(
model="claude-instant-1.2",
messages=[{"role": "user", "content": "what llm are u"}],
max_tokens=10,
temperature=0.2,
metadata={
"id": run_id,
"tags": ["tag1", "tag2"],
"user_api_key": "6eb81e014497d89f3cc1aa9da7c2b37bda6b7fea68e4b710d33d94201e68970c",
"user_api_key_alias": "ishaans-langmsith-key",
"user_api_end_user_max_budget": None,
"litellm_api_version": "1.40.19",
"global_max_parallel_requests": None,
"user_api_key_user_id": "admin",
"user_api_key_org_id": None,
"user_api_key_team_id": "dbe2f686-a686-4896-864a-4c3924458709",
"user_api_key_team_alias": "testing-team",
},
)
print(response)
await asyncio.sleep(3)
print("run_id", run_id)
logged_run_on_langsmith = test_langsmith_logger.get_run_by_id(run_id=run_id)
print("logged_run_on_langsmith", logged_run_on_langsmith)
print("fields in logged_run_on_langsmith", logged_run_on_langsmith.keys())
input_fields_on_langsmith = logged_run_on_langsmith.get("inputs")
extra_fields_on_langsmith = logged_run_on_langsmith.get("extra").get(
"invocation_params"
)
print("\nLogged INPUT ON LANGSMITH", input_fields_on_langsmith)
print("\nextra fields on langsmith", extra_fields_on_langsmith)
assert isinstance(input_fields_on_langsmith, dict)
assert "api_key" not in input_fields_on_langsmith
assert "api_key" not in extra_fields_on_langsmith
# assert user_api_key in extra_fields_on_langsmith
assert "user_api_key" in extra_fields_on_langsmith
assert "user_api_key_user_id" in extra_fields_on_langsmith
assert "user_api_key_team_alias" in extra_fields_on_langsmith
for cb in litellm.callbacks:
if isinstance(cb, LangsmithLogger):
await cb.async_httpx_client.client.aclose()
# test_langsmith_logger.async_httpx_client.close()
except Exception as e:
print(e)
pytest.fail(f"Error occurred: {e}")
# test_langsmith_logging()

View file

@ -72,7 +72,7 @@
# # old_stdout = sys.stdout
# # sys.stdout = new_stdout = io.StringIO()
# # response = completion(model="claude-instant-1", messages=messages)
# # response = completion(model="claude-3-5-haiku-20241022", messages=messages)
# # # Restore stdout
# # sys.stdout = old_stdout
@ -154,7 +154,7 @@
# old_stdout = sys.stdout
# sys.stdout = new_stdout = io.StringIO()
# response = completion(model="claude-instant-1", messages=messages, stream=True)
# response = completion(model="claude-3-5-haiku-20241022", messages=messages, stream=True)
# for idx, chunk in enumerate(response):
# pass
@ -255,7 +255,7 @@
# # sys.stdout = new_stdout = io.StringIO()
# # try:
# # response = completion(model="claude-instant-1", messages=messages)
# # response = completion(model="claude-3-5-haiku-20241022", messages=messages)
# # except AuthenticationError:
# # pass
@ -327,7 +327,7 @@
# # sys.stdout = new_stdout = io.StringIO()
# # try:
# # response = completion(model="claude-instant-1", messages=messages)
# # response = completion(model="claude-3-5-haiku-20241022", messages=messages)
# # except AuthenticationError:
# # pass

View file

@ -3,7 +3,7 @@
# BASE_URL = 'http://localhost:8080'
# def test_hello_route():
# data = {"model": "claude-instant-1", "messages": [{"role": "user", "content": "hey, how's it going?"}]}
# data = {"model": "claude-3-5-haiku-20241022", "messages": [{"role": "user", "content": "hey, how's it going?"}]}
# headers = {'Content-Type': 'application/json'}
# response = requests.get(BASE_URL, headers=headers, data=json.dumps(data))
# print(response.text)

View file

@ -31,63 +31,6 @@ litellm.set_verbose = True
import time
@pytest.mark.skip(reason="duplicate test of logging with callbacks")
@pytest.mark.asyncio()
async def test_async_prometheus_success_logging():
from litellm.integrations.prometheus import PrometheusLogger
pl = PrometheusLogger()
run_id = str(uuid.uuid4())
litellm.set_verbose = True
litellm.callbacks = [pl]
response = await litellm.acompletion(
model="claude-instant-1.2",
messages=[{"role": "user", "content": "what llm are u"}],
max_tokens=10,
mock_response="hi",
temperature=0.2,
metadata={
"id": run_id,
"tags": ["tag1", "tag2"],
"user_api_key": "6eb81e014497d89f3cc1aa9da7c2b37bda6b7fea68e4b710d33d94201e68970c",
"user_api_key_alias": "ishaans-prometheus-key",
"user_api_end_user_max_budget": None,
"litellm_api_version": "1.40.19",
"global_max_parallel_requests": None,
"user_api_key_user_id": "admin",
"user_api_key_org_id": None,
"user_api_key_team_id": "dbe2f686-a686-4896-864a-4c3924458709",
"user_api_key_team_alias": "testing-team",
},
)
print(response)
await asyncio.sleep(3)
# get prometheus logger
test_prometheus_logger = pl
print("done with success request")
print(
"vars of test_prometheus_logger",
vars(test_prometheus_logger.litellm_requests_metric),
)
# Get the metrics
metrics = {}
for metric in REGISTRY.collect():
for sample in metric.samples:
metrics[sample.name] = sample.value
print("metrics from prometheus", metrics)
assert metrics["litellm_requests_metric_total"] == 1.0
assert metrics["litellm_total_tokens_total"] == 30.0
assert metrics["litellm_deployment_success_responses_total"] == 1.0
assert metrics["litellm_deployment_total_requests_total"] == 1.0
assert metrics["litellm_deployment_latency_per_output_token_bucket"] == 1.0
@pytest.mark.asyncio()
async def test_async_prometheus_success_logging_with_callbacks():
@ -107,7 +50,7 @@ async def test_async_prometheus_success_logging_with_callbacks():
initial_metrics[sample.name] = sample.value
response = await litellm.acompletion(
model="claude-instant-1.2",
model="claude-3-haiku-20240307",
messages=[{"role": "user", "content": "what llm are u"}],
max_tokens=10,
mock_response="hi",

View file

@ -18,7 +18,7 @@ import time
# sys.stdout = new_stdout = io.StringIO()
# response = completion(model="claude-instant-1.2",
# response = completion(model="claude-3-5-haiku-20241022",
# messages=[{
# "role": "user",
# "content": "Hi 👋 - i'm claude"

View file

@ -56,7 +56,7 @@ def claude_test_completion():
try:
# OVERRIDE WITH DYNAMIC MAX TOKENS
response_1 = litellm.completion(
model="claude-instant-1.2",
model="claude-3-haiku-20240307",
messages=[{"content": "Hello, how are you?", "role": "user"}],
max_tokens=10,
)
@ -66,7 +66,7 @@ def claude_test_completion():
# USE CONFIG TOKENS
response_2 = litellm.completion(
model="claude-instant-1.2",
model="claude-3-haiku-20240307",
messages=[{"content": "Hello, how are you?", "role": "user"}],
)
# Add any assertions here to check the response
@ -77,7 +77,7 @@ def claude_test_completion():
try:
response_3 = litellm.completion(
model="claude-instant-1.2",
model="claude-3-5-haiku-20241022",
messages=[{"content": "Hello, how are you?", "role": "user"}],
n=2,
)

View file

@ -10,7 +10,7 @@ sys.path.insert(
0, os.path.abspath("../..")
) # Adds the parent directory to the system path
import litellm
from unittest.mock import MagicMock, patch, AsyncMock
from litellm.proxy._types import LitellmUserRoles, UserAPIKeyAuth
from litellm.proxy.auth.auth_utils import is_request_body_safe
@ -465,3 +465,48 @@ def test_update_internal_user_params():
updated_data_json["budget_duration"]
== litellm.default_internal_user_params["budget_duration"]
)
@pytest.mark.asyncio
async def test_proxy_config_update_from_db():
from litellm.proxy.proxy_server import ProxyConfig
from pydantic import BaseModel
proxy_config = ProxyConfig()
pc = AsyncMock()
test_config = {
"litellm_settings": {
"callbacks": ["prometheus", "otel"],
}
}
class ReturnValue(BaseModel):
param_name: str
param_value: dict
with patch.object(
pc,
"get_generic_data",
new=AsyncMock(
return_value=ReturnValue(
param_name="litellm_settings",
param_value={
"success_callback": "langfuse",
},
)
),
):
new_config = await proxy_config._update_config_from_db(
prisma_client=pc,
config=test_config,
store_model_in_db=True,
)
assert new_config == {
"litellm_settings": {
"callbacks": ["prometheus", "otel"],
"success_callback": "langfuse",
}
}

View file

@ -1807,7 +1807,7 @@ def test_router_anthropic_key_dynamic():
{
"model_name": "anthropic-claude",
"litellm_params": {
"model": "claude-instant-1.2",
"model": "claude-3-5-haiku-20241022",
"api_key": anthropic_api_key,
},
}

View file

@ -824,8 +824,8 @@ def test_ausage_based_routing_fallbacks():
"rpm": OPENAI_RPM,
},
{
"model_name": "anthropic-claude-instant-1.2",
"litellm_params": get_anthropic_params("claude-instant-1.2"),
"model_name": "anthropic-claude-3-5-haiku-20241022",
"litellm_params": get_anthropic_params("claude-3-5-haiku-20241022"),
"model_info": {"id": 4},
"rpm": ANTHROPIC_RPM,
},
@ -834,7 +834,7 @@ def test_ausage_based_routing_fallbacks():
fallbacks_list = [
{"azure/gpt-4-fast": ["azure/gpt-4-basic"]},
{"azure/gpt-4-basic": ["openai-gpt-4"]},
{"openai-gpt-4": ["anthropic-claude-instant-1.2"]},
{"openai-gpt-4": ["anthropic-claude-3-5-haiku-20241022"]},
]
router = Router(
@ -864,7 +864,7 @@ def test_ausage_based_routing_fallbacks():
assert response._hidden_params["model_id"] == "1"
for i in range(10):
# now make 100 mock requests to OpenAI - expect it to fallback to anthropic-claude-instant-1.2
# now make 100 mock requests to OpenAI - expect it to fallback to anthropic-claude-3-5-haiku-20241022
response = router.completion(
model="azure/gpt-4-fast",
messages=messages,

View file

@ -17,6 +17,7 @@ from litellm.router import Deployment, LiteLLM_Params, ModelInfo
from concurrent.futures import ThreadPoolExecutor
from collections import defaultdict
from dotenv import load_dotenv
from unittest.mock import patch, MagicMock, AsyncMock
load_dotenv()
@ -155,3 +156,35 @@ def test_route_with_exception():
result = router.route("openai/gpt-3.5-turbo")
assert result is None
def test_router_pattern_match_e2e():
"""
Tests the end to end flow of the router
"""
from litellm.llms.custom_httpx.http_handler import HTTPHandler
client = HTTPHandler()
router = Router(
model_list=[
{
"model_name": "llmengine/*",
"litellm_params": {"model": "anthropic/*", "api_key": "test"},
}
]
)
with patch.object(client, "post", new=MagicMock()) as mock_post:
router.completion(
model="llmengine/my-custom-model",
messages=[{"role": "user", "content": "Hello, how are you?"}],
client=client,
api_key="test",
)
mock_post.assert_called_once()
print(mock_post.call_args.kwargs["data"])
mock_post.call_args.kwargs["data"] == {
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello, how are you?"}],
}

View file

@ -38,9 +38,9 @@ def test_router_timeouts():
"tpm": 80000,
},
{
"model_name": "anthropic-claude-instant-1.2",
"model_name": "anthropic-claude-3-5-haiku-20241022",
"litellm_params": {
"model": "claude-instant-1.2",
"model": "claude-3-5-haiku-20241022",
"api_key": "os.environ/ANTHROPIC_API_KEY",
"mock_response": "hello world",
},
@ -49,7 +49,7 @@ def test_router_timeouts():
]
fallbacks_list = [
{"openai-gpt-4": ["anthropic-claude-instant-1.2"]},
{"openai-gpt-4": ["anthropic-claude-3-5-haiku-20241022"]},
]
# Configure router

View file

@ -681,7 +681,7 @@ def test_completion_ollama_hosted_stream():
@pytest.mark.parametrize(
"model",
[
# "claude-instant-1.2",
# "claude-3-5-haiku-20241022",
# "claude-2",
# "mistral/mistral-medium",
"openrouter/openai/gpt-4o-mini",
@ -1112,7 +1112,7 @@ def test_completion_claude_stream_bad_key():
},
]
response = completion(
model="claude-instant-1",
model="claude-3-5-haiku-20241022",
messages=messages,
stream=True,
max_tokens=50,

View file

@ -1,6 +1,6 @@
#### What this tests ####
# This tests litellm.token_counter() function
import traceback
import os
import sys
import time
@ -116,7 +116,9 @@ def test_tokenizers():
openai_tokens = token_counter(model="gpt-3.5-turbo", text=sample_text)
# claude tokenizer
claude_tokens = token_counter(model="claude-instant-1", text=sample_text)
claude_tokens = token_counter(
model="claude-3-5-haiku-20241022", text=sample_text
)
# cohere tokenizer
cohere_tokens = token_counter(model="command-nightly", text=sample_text)
@ -167,8 +169,9 @@ def test_encoding_and_decoding():
assert openai_text == sample_text
# claude encoding + decoding
claude_tokens = encode(model="claude-instant-1", text=sample_text)
claude_text = decode(model="claude-instant-1", tokens=claude_tokens.ids)
claude_tokens = encode(model="claude-3-5-haiku-20241022", text=sample_text)
claude_text = decode(model="claude-3-5-haiku-20241022", tokens=claude_tokens)
assert claude_text == sample_text
@ -186,7 +189,7 @@ def test_encoding_and_decoding():
assert llama2_text == sample_text
except Exception as e:
pytest.fail(f"An exception occured: {e}")
pytest.fail(f"An exception occured: {e}\n{traceback.format_exc()}")
# test_encoding_and_decoding()

View file

@ -26,7 +26,7 @@ def exporter():
return exporter
@pytest.mark.parametrize("model", ["claude-instant-1.2", "gpt-3.5-turbo"])
@pytest.mark.parametrize("model", ["claude-3-5-haiku-20241022", "gpt-3.5-turbo"])
def test_traceloop_logging(exporter, model):
litellm.completion(
model=model,

View file

@ -57,7 +57,7 @@ test_wandb_logging_async()
def test_wandb_logging():
try:
response = completion(
model="claude-instant-1.2",
model="claude-3-5-haiku-20241022",
messages=[{"role": "user", "content": "Hi 👋 - i'm claude"}],
max_tokens=10,
temperature=0.2,

View file

@ -1,19 +1,13 @@
import json
import os
import sys
import threading
from datetime import datetime
from pydantic.main import Model
sys.path.insert(
0, os.path.abspath("../..")
) # Adds the parent directory to the system-path
import pytest
import litellm
import asyncio
import logging
from litellm._logging import verbose_logger
from litellm.integrations.langfuse.langfuse import (
LangFuseLogger,
)
@ -217,3 +211,27 @@ def test_get_langfuse_logger_for_request_with_cached_logger():
assert result == cached_logger
mock_cache.get_cache.assert_called_once()
@pytest.mark.parametrize("metadata", [
{'a': 1, 'b': 2, 'c': 3},
{'a': {'nested_a': 1}, 'b': {'nested_b': 2}},
{'a': [1, 2, 3], 'b': {4, 5, 6}},
{'a': (1, 2), 'b': frozenset([3, 4]), 'c': {'d': [5, 6]}},
{'lock': threading.Lock()},
{'func': lambda x: x + 1},
{
'int': 42,
'str': 'hello',
'list': [1, 2, 3],
'set': {4, 5},
'dict': {'nested': 'value'},
'non_copyable': threading.Lock(),
'function': print
},
['list', 'not', 'a', 'dict'],
{'timestamp': datetime.now()},
{},
None,
])
def test_langfuse_logger_prepare_metadata(metadata):
global_langfuse_logger._prepare_metadata(metadata)

View file

@ -986,3 +986,16 @@ def test_pattern_match_deployment_set_model_name(
print(updated_model) # Expected output: "openai/fo::hi:static::hello"
assert updated_model == expected_model
updated_models = pattern_router._return_pattern_matched_deployments(
match,
deployments=[
{
"model_name": model_name,
"litellm_params": {"model": litellm_model},
}
],
)
for model in updated_models:
assert model["litellm_params"]["model"] == expected_model

View file

@ -523,8 +523,8 @@ async def test_key_info_spend_values():
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_key_info_spend_values_streaming():
@pytest.mark.flaky(retries=6, delay=2)
async def test_aaaaakey_info_spend_values_streaming():
"""
Test to ensure spend is correctly calculated.
- create key
@ -545,7 +545,7 @@ async def test_key_info_spend_values_streaming():
completion_tokens=completion_tokens,
)
response_cost = prompt_cost + completion_cost
await asyncio.sleep(5) # allow db log to be updated
await asyncio.sleep(8) # allow db log to be updated
print(f"new_key: {new_key}")
key_info = await get_key_info(
session=session, get_key=new_key, call_key=new_key