LiteLLM Minor Fixes & Improvements (11/05/2024) (#6590)

* fix(pattern_matching_router.py): update model name using correct function * fix(langfuse.py): metadata deepcopy can cause unhandled error (#6563) Co-authored-by: seva <seva@inita.com> * fix(stream_chunk_builder_utils.py): correctly set prompt tokens + log correct streaming usage Closes https://github.com/BerriAI/litellm/issues/6488 * build(deps): bump cookie and express in /docs/my-website (#6566) Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together. Updates `cookie` from 0.6.0 to 0.7.1 - [Release notes](https://github.com/jshttp/cookie/releases) - [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1) Updates `express` from 4.20.0 to 4.21.1 - [Release notes](https://github.com/expressjs/express/releases) - [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md) - [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1) --- updated-dependencies: - dependency-name: cookie dependency-type: indirect - dependency-name: express dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs(virtual_keys.md): update Dockerfile reference (#6554) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> * (proxy fix) - call connect on prisma client when running setup (#6534) * critical fix - call connect on prisma client when running setup * fix test_proxy_server_prisma_setup * fix test_proxy_server_prisma_setup * Add 3.5 haiku (#6588) * feat: add claude-3-5-haiku-20241022 entries * feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models * add missing entries, remove vision * remove image token costs * Litellm perf improvements 3 (#6573) * perf: move writing key to cache, to background task * perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils adds 200ms on calls with pgdb connected * fix(litellm_pre_call_utils.py'): rename call_type to actual call used * perf(proxy_server.py): remove db logic from _get_config_from_file was causing db calls to occur on every llm request, if team_id was set on key * fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db reduces latency/call by ~100ms * fix(proxy_server.py): minor fix on existing_settings not incl alerting * fix(exception_mapping_utils.py): map databricks exception string * fix(auth_checks.py): fix auth check logic * test: correctly mark flaky test * fix(utils.py): handle auth token error for tokenizers.from_pretrained * build: fix map * build: fix map * build: fix json for model map * fix ImageObject conversion (#6584) * (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546) * unit test test_huggingface_text_completion_logprobs * fix return TextCompletionHandler convert_chat_to_text_completion * fix hf rest api * fix test_huggingface_text_completion_logprobs * fix linting errors * fix importLiteLLMResponseObjectHandler * fix test for LiteLLMResponseObjectHandler * fix test text completion * fix allow using 15 seconds for premium license check * testing fix bedrock deprecated cohere.command-text-v14 * (feat) add `Predicted Outputs` for OpenAI (#6594) * bump openai to openai==1.54.0 * add 'prediction' param * testing fix bedrock deprecated cohere.command-text-v14 * test test_openai_prediction_param.py * test_openai_prediction_param_with_caching * doc Predicted Outputs * doc Predicted Output * (fix) Vertex Improve Performance when using `image_url` (#6593) * fix transformation vertex * test test_process_gemini_image * test_image_completion_request * testing fix - bedrock has deprecated cohere.command-text-v14 * fix vertex pdf * bump: version 1.51.5 → 1.52.0 * fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577) * fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check * fix(lowest_tpm_rpm_v2.py): return headers in correct format * test: update test * build(deps): bump cookie and express in /docs/my-website (#6566) Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together. Updates `cookie` from 0.6.0 to 0.7.1 - [Release notes](https://github.com/jshttp/cookie/releases) - [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1) Updates `express` from 4.20.0 to 4.21.1 - [Release notes](https://github.com/expressjs/express/releases) - [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md) - [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1) --- updated-dependencies: - dependency-name: cookie dependency-type: indirect - dependency-name: express dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs(virtual_keys.md): update Dockerfile reference (#6554) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> * (proxy fix) - call connect on prisma client when running setup (#6534) * critical fix - call connect on prisma client when running setup * fix test_proxy_server_prisma_setup * fix test_proxy_server_prisma_setup * Add 3.5 haiku (#6588) * feat: add claude-3-5-haiku-20241022 entries * feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models * add missing entries, remove vision * remove image token costs * Litellm perf improvements 3 (#6573) * perf: move writing key to cache, to background task * perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils adds 200ms on calls with pgdb connected * fix(litellm_pre_call_utils.py'): rename call_type to actual call used * perf(proxy_server.py): remove db logic from _get_config_from_file was causing db calls to occur on every llm request, if team_id was set on key * fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db reduces latency/call by ~100ms * fix(proxy_server.py): minor fix on existing_settings not incl alerting * fix(exception_mapping_utils.py): map databricks exception string * fix(auth_checks.py): fix auth check logic * test: correctly mark flaky test * fix(utils.py): handle auth token error for tokenizers.from_pretrained * build: fix map * build: fix map * build: fix json for model map * test: remove eol model * fix(proxy_server.py): fix db config loading logic * fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten * test: skip test if required env var is missing * test: fix test --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com> * test: mark flaky test * test: handle anthropic api instability * test(test_proxy_utils.py): add testing for db config update logic * Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version (#6597) * build(deps): bump cookie and express in /docs/my-website (#6566) Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together. Updates `cookie` from 0.6.0 to 0.7.1 - [Release notes](https://github.com/jshttp/cookie/releases) - [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1) Updates `express` from 4.20.0 to 4.21.1 - [Release notes](https://github.com/expressjs/express/releases) - [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md) - [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1) --- updated-dependencies: - dependency-name: cookie dependency-type: indirect - dependency-name: express dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs(virtual_keys.md): update Dockerfile reference (#6554) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> * (proxy fix) - call connect on prisma client when running setup (#6534) * critical fix - call connect on prisma client when running setup * fix test_proxy_server_prisma_setup * fix test_proxy_server_prisma_setup * Add 3.5 haiku (#6588) * feat: add claude-3-5-haiku-20241022 entries * feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models * add missing entries, remove vision * remove image token costs * Litellm perf improvements 3 (#6573) * perf: move writing key to cache, to background task * perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils adds 200ms on calls with pgdb connected * fix(litellm_pre_call_utils.py'): rename call_type to actual call used * perf(proxy_server.py): remove db logic from _get_config_from_file was causing db calls to occur on every llm request, if team_id was set on key * fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db reduces latency/call by ~100ms * fix(proxy_server.py): minor fix on existing_settings not incl alerting * fix(exception_mapping_utils.py): map databricks exception string * fix(auth_checks.py): fix auth check logic * test: correctly mark flaky test * fix(utils.py): handle auth token error for tokenizers.from_pretrained * build: fix map * build: fix map * build: fix json for model map * fix ImageObject conversion (#6584) * (fix) litellm.text_completion raises a non-blocking error on simple usage (#6546) * unit test test_huggingface_text_completion_logprobs * fix return TextCompletionHandler convert_chat_to_text_completion * fix hf rest api * fix test_huggingface_text_completion_logprobs * fix linting errors * fix importLiteLLMResponseObjectHandler * fix test for LiteLLMResponseObjectHandler * fix test text completion * fix allow using 15 seconds for premium license check * testing fix bedrock deprecated cohere.command-text-v14 * (feat) add `Predicted Outputs` for OpenAI (#6594) * bump openai to openai==1.54.0 * add 'prediction' param * testing fix bedrock deprecated cohere.command-text-v14 * test test_openai_prediction_param.py * test_openai_prediction_param_with_caching * doc Predicted Outputs * doc Predicted Output * (fix) Vertex Improve Performance when using `image_url` (#6593) * fix transformation vertex * test test_process_gemini_image * test_image_completion_request * testing fix - bedrock has deprecated cohere.command-text-v14 * fix vertex pdf * bump: version 1.51.5 → 1.52.0 * Update setuptools in docker and fastapi to latest verison, in order to upgrade starlette version --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com> * fix(langfuse.py): fix linting errors * fix: fix linting errors * fix: fix casting error * fix: fix typing error * fix: add more tests * fix(utils.py): fix return_processed_chunk_logic * Revert "Update setuptools in docker and fastapi to latest verison, in order t…" (#6615) This reverts commit 1a7f7bdfb7. * docs fix clarify team_id on team based logging * doc fix team based logging with langfuse * fix flake8 checks * test: bump sleep time * refactor: replace claude-instant-1.2 with haiku in testing * fix(proxy_server.py): move to using sl payload in track_cost_callback * fix(proxy_server.py): fix linting errors * fix(proxy_server.py): fallback to kwargs(response_cost) if given * test: remove claude-instant-1 from tests * test: fix claude test * docs fix clarify team_id on team based logging * doc fix team based logging with langfuse * build: remove lint.yml --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Vsevolod Karvetskiy <56288164+karvetskiy@users.noreply.github.com> Co-authored-by: seva <seva@inita.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com> Co-authored-by: Jacob Hagstedt P Suorra <Jacobh2@users.noreply.github.com> Co-authored-by: Jacob Hagstedt <wcgs@novonordisk.com>
2024-11-07 04:17:05 +05:30 · 2024-11-07 04:17:05 +05:30 · 136693cac4
commit 136693cac4
parent 66c1ee09cf
32 changed files with 634 additions and 533 deletions
--- a/tests/local_testing/test_add_function_to_prompt.py
+++ b/tests/local_testing/test_add_function_to_prompt.py
@ -13,7 +13,7 @@ import litellm
 ## case 1: set_function_to_prompt not set
 def test_function_call_non_openai_model():
    try:
-        model = "claude-instant-1"
+        model = "claude-3-5-haiku-20241022"
        messages = [{"role": "user", "content": "what's the weather in sf?"}]
        functions = [
            {
@ -43,38 +43,4 @@ def test_function_call_non_openai_model():

 # test_function_call_non_openai_model()

-
-## case 2: add_function_to_prompt set
-@pytest.mark.skip(reason="Anthropic now supports tool calling")
-def test_function_call_non_openai_model_litellm_mod_set():
-    litellm.add_function_to_prompt = True
-    litellm.set_verbose = True
-    try:
-        model = "claude-instant-1.2"
-        messages = [{"role": "user", "content": "what's the weather in sf?"}]
-        functions = [
-            {
-                "name": "get_current_weather",
-                "description": "Get the current weather in a given location",
-                "parameters": {
-                    "type": "object",
-                    "properties": {
-                        "location": {
-                            "type": "string",
-                            "description": "The city and state, e.g. San Francisco, CA",
-                        },
-                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
-                    },
-                    "required": ["location"],
-                },
-            }
-        ]
-        response = litellm.completion(
-            model=model, messages=messages, functions=functions
-        )
-        print(f"response: {response}")
-    except Exception as e:
-        pytest.fail(f"An error occurred {e}")
-
-
 # test_function_call_non_openai_model_litellm_mod_set()
--- a/tests/local_testing/test_alangfuse.py
+++ b/tests/local_testing/test_alangfuse.py
@ -480,28 +480,6 @@ async def test_aaalangfuse_logging_metadata(langfuse_client):
            print("generation_from_langfuse", generation)


-@pytest.mark.skip(reason="beta test - checking langfuse output")
-def test_langfuse_logging():
-    try:
-        pre_langfuse_setup()
-        litellm.set_verbose = True
-        response = completion(
-            model="claude-instant-1.2",
-            messages=[{"role": "user", "content": "Hi 👋 - i'm claude"}],
-            max_tokens=10,
-            temperature=0.2,
-        )
-        print(response)
-        # time.sleep(5)
-        # # check langfuse.log to see if there was a failed response
-        # search_logs("langfuse.log")
-
-    except litellm.Timeout as e:
-        pass
-    except Exception as e:
-        pytest.fail(f"An exception occurred - {e}")
-
-
 # test_langfuse_logging()


--- a/tests/local_testing/test_batch_completions.py
+++ b/tests/local_testing/test_batch_completions.py
@ -69,7 +69,7 @@ def test_batch_completions_models():
 def test_batch_completion_models_all_responses():
    try:
        responses = batch_completion_models_all_responses(
-            models=["j2-light", "claude-instant-1.2"],
+            models=["j2-light", "claude-3-haiku-20240307"],
            messages=[{"role": "user", "content": "write a poem"}],
            max_tokens=10,
        )
--- a/tests/local_testing/test_completion.py
+++ b/tests/local_testing/test_completion.py
@ -343,7 +343,7 @@ def test_completion_claude():
    try:
        # test without max tokens
        response = completion(
-            model="claude-instant-1", messages=messages, request_timeout=10
+            model="claude-3-5-haiku-20241022", messages=messages, request_timeout=10
        )
        # Add any assertions here to check response args
        print(response)
--- a/tests/local_testing/test_custom_callback_input.py
+++ b/tests/local_testing/test_custom_callback_input.py
@ -1562,3 +1562,65 @@ def test_logging_key_masking_gemini():
        trimmed_key = key.split("key=")[1]
        trimmed_key = trimmed_key.replace("*", "")
        assert "PART" == trimmed_key
+
+
+@pytest.mark.parametrize("sync_mode", [True, False])
+@pytest.mark.asyncio
+async def test_standard_logging_payload_stream_usage(sync_mode):
+    """
+    Even if stream_options is not provided, correct usage should be logged
+    """
+    from litellm.types.utils import StandardLoggingPayload
+    from litellm.main import stream_chunk_builder
+
+    stream = True
+    try:
+        # sync completion
+        customHandler = CompletionCustomHandler()
+        litellm.callbacks = [customHandler]
+
+        if sync_mode:
+            patch_event = "log_success_event"
+            return_val = MagicMock()
+        else:
+            patch_event = "async_log_success_event"
+            return_val = AsyncMock()
+
+        with patch.object(customHandler, patch_event, new=return_val) as mock_client:
+            if sync_mode:
+                resp = litellm.completion(
+                    model="anthropic/claude-3-5-sonnet-20240620",
+                    messages=[{"role": "user", "content": "Hey, how's it going?"}],
+                    stream=stream,
+                )
+
+                chunks = []
+                for chunk in resp:
+                    chunks.append(chunk)
+                time.sleep(2)
+            else:
+                resp = await litellm.acompletion(
+                    model="anthropic/claude-3-5-sonnet-20240620",
+                    messages=[{"role": "user", "content": "Hey, how's it going?"}],
+                    stream=stream,
+                )
+
+                chunks = []
+                async for chunk in resp:
+                    chunks.append(chunk)
+                await asyncio.sleep(2)
+
+            mock_client.assert_called_once()
+
+            standard_logging_object: StandardLoggingPayload = (
+                mock_client.call_args.kwargs["kwargs"]["standard_logging_object"]
+            )
+
+            built_response = stream_chunk_builder(chunks=chunks)
+            assert (
+                built_response.usage.total_tokens
+                != standard_logging_object["total_tokens"]
+            )
+            print(f"standard_logging_object usage: {built_response.usage}")
+    except litellm.InternalServerError:
+        pass
--- a/tests/local_testing/test_exceptions.py
+++ b/tests/local_testing/test_exceptions.py
@ -163,7 +163,7 @@ def invalid_auth(model):  # set the model key to an invalid key, depending on th
        elif model == "azure/chatgpt-v-2":
            temporary_key = os.environ["AZURE_API_KEY"]
            os.environ["AZURE_API_KEY"] = "bad-key"
-        elif model == "claude-instant-1":
+        elif model == "claude-3-5-haiku-20241022":
            temporary_key = os.environ["ANTHROPIC_API_KEY"]
            os.environ["ANTHROPIC_API_KEY"] = "bad-key"
        elif model == "command-nightly":
@ -213,7 +213,7 @@ def invalid_auth(model):  # set the model key to an invalid key, depending on th
        elif model == "chatgpt-test":
            os.environ["AZURE_API_KEY"] = temporary_key
            azure = True
-        elif model == "claude-instant-1":
+        elif model == "claude-3-5-haiku-20241022":
            os.environ["ANTHROPIC_API_KEY"] = temporary_key
        elif model == "command-nightly":
            os.environ["COHERE_API_KEY"] = temporary_key
--- a/tests/local_testing/test_langsmith.py
+++ b/tests/local_testing/test_langsmith.py
@ -77,71 +77,6 @@ async def test_langsmith_queue_logging():
        pytest.fail(f"Error occurred: {e}")


-@pytest.mark.skip(reason="Flaky test. covered by unit tests on custom logger.")
-@pytest.mark.asyncio()
-async def test_async_langsmith_logging():
-    try:
-        test_langsmith_logger = LangsmithLogger()
-        run_id = str(uuid.uuid4())
-        litellm.set_verbose = True
-        litellm.callbacks = ["langsmith"]
-        response = await litellm.acompletion(
-            model="claude-instant-1.2",
-            messages=[{"role": "user", "content": "what llm are u"}],
-            max_tokens=10,
-            temperature=0.2,
-            metadata={
-                "id": run_id,
-                "tags": ["tag1", "tag2"],
-                "user_api_key": "6eb81e014497d89f3cc1aa9da7c2b37bda6b7fea68e4b710d33d94201e68970c",
-                "user_api_key_alias": "ishaans-langmsith-key",
-                "user_api_end_user_max_budget": None,
-                "litellm_api_version": "1.40.19",
-                "global_max_parallel_requests": None,
-                "user_api_key_user_id": "admin",
-                "user_api_key_org_id": None,
-                "user_api_key_team_id": "dbe2f686-a686-4896-864a-4c3924458709",
-                "user_api_key_team_alias": "testing-team",
-            },
-        )
-        print(response)
-        await asyncio.sleep(3)
-
-        print("run_id", run_id)
-        logged_run_on_langsmith = test_langsmith_logger.get_run_by_id(run_id=run_id)
-
-        print("logged_run_on_langsmith", logged_run_on_langsmith)
-
-        print("fields in logged_run_on_langsmith", logged_run_on_langsmith.keys())
-
-        input_fields_on_langsmith = logged_run_on_langsmith.get("inputs")
-        extra_fields_on_langsmith = logged_run_on_langsmith.get("extra").get(
-            "invocation_params"
-        )
-
-        print("\nLogged INPUT ON LANGSMITH", input_fields_on_langsmith)
-
-        print("\nextra fields on langsmith", extra_fields_on_langsmith)
-
-        assert isinstance(input_fields_on_langsmith, dict)
-        assert "api_key" not in input_fields_on_langsmith
-        assert "api_key" not in extra_fields_on_langsmith
-
-        # assert user_api_key in extra_fields_on_langsmith
-        assert "user_api_key" in extra_fields_on_langsmith
-        assert "user_api_key_user_id" in extra_fields_on_langsmith
-        assert "user_api_key_team_alias" in extra_fields_on_langsmith
-
-        for cb in litellm.callbacks:
-            if isinstance(cb, LangsmithLogger):
-                await cb.async_httpx_client.client.aclose()
-        # test_langsmith_logger.async_httpx_client.close()
-
-    except Exception as e:
-        print(e)
-        pytest.fail(f"Error occurred: {e}")
-
-
 # test_langsmith_logging()


--- a/tests/local_testing/test_logging.py
+++ b/tests/local_testing/test_logging.py
@ -72,7 +72,7 @@
 # #         old_stdout = sys.stdout
 # #         sys.stdout = new_stdout = io.StringIO()

-# #         response = completion(model="claude-instant-1", messages=messages)
+# #         response = completion(model="claude-3-5-haiku-20241022", messages=messages)

 # #         # Restore stdout
 # #         sys.stdout = old_stdout
@ -154,7 +154,7 @@
 #         old_stdout = sys.stdout
 #         sys.stdout = new_stdout = io.StringIO()

-#         response = completion(model="claude-instant-1", messages=messages, stream=True)
+#         response = completion(model="claude-3-5-haiku-20241022", messages=messages, stream=True)
 #         for idx, chunk in enumerate(response):
 #             pass

@ -255,7 +255,7 @@
 # #     sys.stdout = new_stdout = io.StringIO()

 # #     try:
-# #         response = completion(model="claude-instant-1", messages=messages)
+# #         response = completion(model="claude-3-5-haiku-20241022", messages=messages)
 # #     except AuthenticationError:
 # #         pass

@ -327,7 +327,7 @@
 # #     sys.stdout = new_stdout = io.StringIO()

 # #     try:
-# #         response = completion(model="claude-instant-1", messages=messages)
+# #         response = completion(model="claude-3-5-haiku-20241022", messages=messages)
 # #     except AuthenticationError:
 # #         pass

--- a/tests/local_testing/test_model_response_typing/test.py
+++ b/tests/local_testing/test_model_response_typing/test.py
@ -3,7 +3,7 @@
 # BASE_URL = 'http://localhost:8080'

 # def test_hello_route():
-#     data = {"model": "claude-instant-1", "messages": [{"role": "user", "content": "hey, how's it going?"}]}
+#     data = {"model": "claude-3-5-haiku-20241022", "messages": [{"role": "user", "content": "hey, how's it going?"}]}
 #     headers = {'Content-Type': 'application/json'}
 #     response = requests.get(BASE_URL, headers=headers, data=json.dumps(data))
 #     print(response.text)
--- a/tests/local_testing/test_prometheus.py
+++ b/tests/local_testing/test_prometheus.py
@ -31,63 +31,6 @@ litellm.set_verbose = True
 import time


-@pytest.mark.skip(reason="duplicate test of logging with callbacks")
-@pytest.mark.asyncio()
-async def test_async_prometheus_success_logging():
-    from litellm.integrations.prometheus import PrometheusLogger
-
-    pl = PrometheusLogger()
-    run_id = str(uuid.uuid4())
-
-    litellm.set_verbose = True
-    litellm.callbacks = [pl]
-
-    response = await litellm.acompletion(
-        model="claude-instant-1.2",
-        messages=[{"role": "user", "content": "what llm are u"}],
-        max_tokens=10,
-        mock_response="hi",
-        temperature=0.2,
-        metadata={
-            "id": run_id,
-            "tags": ["tag1", "tag2"],
-            "user_api_key": "6eb81e014497d89f3cc1aa9da7c2b37bda6b7fea68e4b710d33d94201e68970c",
-            "user_api_key_alias": "ishaans-prometheus-key",
-            "user_api_end_user_max_budget": None,
-            "litellm_api_version": "1.40.19",
-            "global_max_parallel_requests": None,
-            "user_api_key_user_id": "admin",
-            "user_api_key_org_id": None,
-            "user_api_key_team_id": "dbe2f686-a686-4896-864a-4c3924458709",
-            "user_api_key_team_alias": "testing-team",
-        },
-    )
-    print(response)
-    await asyncio.sleep(3)
-
-    # get prometheus logger
-    test_prometheus_logger = pl
-    print("done with success request")
-
-    print(
-        "vars of test_prometheus_logger",
-        vars(test_prometheus_logger.litellm_requests_metric),
-    )
-
-    # Get the metrics
-    metrics = {}
-    for metric in REGISTRY.collect():
-        for sample in metric.samples:
-            metrics[sample.name] = sample.value
-
-    print("metrics from prometheus", metrics)
-    assert metrics["litellm_requests_metric_total"] == 1.0
-    assert metrics["litellm_total_tokens_total"] == 30.0
-    assert metrics["litellm_deployment_success_responses_total"] == 1.0
-    assert metrics["litellm_deployment_total_requests_total"] == 1.0
-    assert metrics["litellm_deployment_latency_per_output_token_bucket"] == 1.0
-
-
@pytest.mark.asyncio()
 async def test_async_prometheus_success_logging_with_callbacks():

@ -107,7 +50,7 @@ async def test_async_prometheus_success_logging_with_callbacks():
            initial_metrics[sample.name] = sample.value

    response = await litellm.acompletion(
-        model="claude-instant-1.2",
+        model="claude-3-haiku-20240307",
        messages=[{"role": "user", "content": "what llm are u"}],
        max_tokens=10,
        mock_response="hi",
--- a/tests/local_testing/test_promptlayer_integration.py
+++ b/tests/local_testing/test_promptlayer_integration.py
@ -18,7 +18,7 @@ import time
 #         sys.stdout = new_stdout = io.StringIO()


-#         response = completion(model="claude-instant-1.2",
+#         response = completion(model="claude-3-5-haiku-20241022",
 #                               messages=[{
 #                                   "role": "user",
 #                                   "content": "Hi 👋 - i'm claude"
--- a/tests/local_testing/test_provider_specific_config.py
+++ b/tests/local_testing/test_provider_specific_config.py
@ -56,7 +56,7 @@ def claude_test_completion():
    try:
        # OVERRIDE WITH DYNAMIC MAX TOKENS
        response_1 = litellm.completion(
-            model="claude-instant-1.2",
+            model="claude-3-haiku-20240307",
            messages=[{"content": "Hello, how are you?", "role": "user"}],
            max_tokens=10,
        )
@ -66,7 +66,7 @@ def claude_test_completion():

        # USE CONFIG TOKENS
        response_2 = litellm.completion(
-            model="claude-instant-1.2",
+            model="claude-3-haiku-20240307",
            messages=[{"content": "Hello, how are you?", "role": "user"}],
        )
        # Add any assertions here to check the response
@ -77,7 +77,7 @@ def claude_test_completion():

        try:
            response_3 = litellm.completion(
-                model="claude-instant-1.2",
+                model="claude-3-5-haiku-20241022",
                messages=[{"content": "Hello, how are you?", "role": "user"}],
                n=2,
            )
--- a/tests/local_testing/test_proxy_utils.py
+++ b/tests/local_testing/test_proxy_utils.py
@ -10,7 +10,7 @@ sys.path.insert(
    0, os.path.abspath("../..")
 )  # Adds the parent directory to the system path
 import litellm
-
+from unittest.mock import MagicMock, patch, AsyncMock

 from litellm.proxy._types import LitellmUserRoles, UserAPIKeyAuth
 from litellm.proxy.auth.auth_utils import is_request_body_safe
@ -465,3 +465,48 @@ def test_update_internal_user_params():
        updated_data_json["budget_duration"]
        == litellm.default_internal_user_params["budget_duration"]
    )
+
+
+@pytest.mark.asyncio
+async def test_proxy_config_update_from_db():
+    from litellm.proxy.proxy_server import ProxyConfig
+    from pydantic import BaseModel
+
+    proxy_config = ProxyConfig()
+
+    pc = AsyncMock()
+
+    test_config = {
+        "litellm_settings": {
+            "callbacks": ["prometheus", "otel"],
+        }
+    }
+
+    class ReturnValue(BaseModel):
+        param_name: str
+        param_value: dict
+
+    with patch.object(
+        pc,
+        "get_generic_data",
+        new=AsyncMock(
+            return_value=ReturnValue(
+                param_name="litellm_settings",
+                param_value={
+                    "success_callback": "langfuse",
+                },
+            )
+        ),
+    ):
+        new_config = await proxy_config._update_config_from_db(
+            prisma_client=pc,
+            config=test_config,
+            store_model_in_db=True,
+        )
+
+        assert new_config == {
+            "litellm_settings": {
+                "callbacks": ["prometheus", "otel"],
+                "success_callback": "langfuse",
+            }
+        }
--- a/tests/local_testing/test_router.py
+++ b/tests/local_testing/test_router.py
@ -1807,7 +1807,7 @@ def test_router_anthropic_key_dynamic():
        {
            "model_name": "anthropic-claude",
            "litellm_params": {
-                "model": "claude-instant-1.2",
+                "model": "claude-3-5-haiku-20241022",
                "api_key": anthropic_api_key,
            },
        }
--- a/tests/local_testing/test_router_fallbacks.py
+++ b/tests/local_testing/test_router_fallbacks.py
@ -824,8 +824,8 @@ def test_ausage_based_routing_fallbacks():
                "rpm": OPENAI_RPM,
            },
            {
-                "model_name": "anthropic-claude-instant-1.2",
-                "litellm_params": get_anthropic_params("claude-instant-1.2"),
+                "model_name": "anthropic-claude-3-5-haiku-20241022",
+                "litellm_params": get_anthropic_params("claude-3-5-haiku-20241022"),
                "model_info": {"id": 4},
                "rpm": ANTHROPIC_RPM,
            },
@ -834,7 +834,7 @@ def test_ausage_based_routing_fallbacks():
        fallbacks_list = [
            {"azure/gpt-4-fast": ["azure/gpt-4-basic"]},
            {"azure/gpt-4-basic": ["openai-gpt-4"]},
-            {"openai-gpt-4": ["anthropic-claude-instant-1.2"]},
+            {"openai-gpt-4": ["anthropic-claude-3-5-haiku-20241022"]},
        ]

        router = Router(
@ -864,7 +864,7 @@ def test_ausage_based_routing_fallbacks():
        assert response._hidden_params["model_id"] == "1"

        for i in range(10):
-            # now make 100 mock requests to OpenAI - expect it to fallback to anthropic-claude-instant-1.2
+            # now make 100 mock requests to OpenAI - expect it to fallback to anthropic-claude-3-5-haiku-20241022
            response = router.completion(
                model="azure/gpt-4-fast",
                messages=messages,
--- a/tests/local_testing/test_router_pattern_matching.py
+++ b/tests/local_testing/test_router_pattern_matching.py
@ -17,6 +17,7 @@ from litellm.router import Deployment, LiteLLM_Params, ModelInfo
 from concurrent.futures import ThreadPoolExecutor
 from collections import defaultdict
 from dotenv import load_dotenv
+from unittest.mock import patch, MagicMock, AsyncMock

 load_dotenv()

@ -155,3 +156,35 @@ def test_route_with_exception():

    result = router.route("openai/gpt-3.5-turbo")
    assert result is None
+
+
+def test_router_pattern_match_e2e():
+    """
+    Tests the end to end flow of the router
+    """
+    from litellm.llms.custom_httpx.http_handler import HTTPHandler
+
+    client = HTTPHandler()
+    router = Router(
+        model_list=[
+            {
+                "model_name": "llmengine/*",
+                "litellm_params": {"model": "anthropic/*", "api_key": "test"},
+            }
+        ]
+    )
+
+    with patch.object(client, "post", new=MagicMock()) as mock_post:
+
+        router.completion(
+            model="llmengine/my-custom-model",
+            messages=[{"role": "user", "content": "Hello, how are you?"}],
+            client=client,
+            api_key="test",
+        )
+        mock_post.assert_called_once()
+        print(mock_post.call_args.kwargs["data"])
+        mock_post.call_args.kwargs["data"] == {
+            "model": "gpt-4o",
+            "messages": [{"role": "user", "content": "Hello, how are you?"}],
+        }
--- a/tests/local_testing/test_router_timeout.py
+++ b/tests/local_testing/test_router_timeout.py
@ -38,9 +38,9 @@ def test_router_timeouts():
            "tpm": 80000,
        },
        {
-            "model_name": "anthropic-claude-instant-1.2",
+            "model_name": "anthropic-claude-3-5-haiku-20241022",
            "litellm_params": {
-                "model": "claude-instant-1.2",
+                "model": "claude-3-5-haiku-20241022",
                "api_key": "os.environ/ANTHROPIC_API_KEY",
                "mock_response": "hello world",
            },
@ -49,7 +49,7 @@ def test_router_timeouts():
    ]

    fallbacks_list = [
-        {"openai-gpt-4": ["anthropic-claude-instant-1.2"]},
+        {"openai-gpt-4": ["anthropic-claude-3-5-haiku-20241022"]},
    ]

    # Configure router
--- a/tests/local_testing/test_streaming.py
+++ b/tests/local_testing/test_streaming.py
@ -681,7 +681,7 @@ def test_completion_ollama_hosted_stream():
@pytest.mark.parametrize(
    "model",
    [
-        # "claude-instant-1.2",
+        # "claude-3-5-haiku-20241022",
        # "claude-2",
        # "mistral/mistral-medium",
        "openrouter/openai/gpt-4o-mini",
@ -1112,7 +1112,7 @@ def test_completion_claude_stream_bad_key():
            },
        ]
        response = completion(
-            model="claude-instant-1",
+            model="claude-3-5-haiku-20241022",
            messages=messages,
            stream=True,
            max_tokens=50,
--- a/tests/local_testing/test_token_counter.py
+++ b/tests/local_testing/test_token_counter.py
@ -1,6 +1,6 @@
 #### What this tests ####
 #    This tests litellm.token_counter() function
-
+import traceback
 import os
 import sys
 import time
@ -116,7 +116,9 @@ def test_tokenizers():
        openai_tokens = token_counter(model="gpt-3.5-turbo", text=sample_text)

        # claude tokenizer
-        claude_tokens = token_counter(model="claude-instant-1", text=sample_text)
+        claude_tokens = token_counter(
+            model="claude-3-5-haiku-20241022", text=sample_text
+        )

        # cohere tokenizer
        cohere_tokens = token_counter(model="command-nightly", text=sample_text)
@ -167,8 +169,9 @@ def test_encoding_and_decoding():
        assert openai_text == sample_text

        # claude encoding + decoding
-        claude_tokens = encode(model="claude-instant-1", text=sample_text)
-        claude_text = decode(model="claude-instant-1", tokens=claude_tokens.ids)
+        claude_tokens = encode(model="claude-3-5-haiku-20241022", text=sample_text)
+
+        claude_text = decode(model="claude-3-5-haiku-20241022", tokens=claude_tokens)

        assert claude_text == sample_text

@ -186,7 +189,7 @@ def test_encoding_and_decoding():

        assert llama2_text == sample_text
    except Exception as e:
-        pytest.fail(f"An exception occured: {e}")
+        pytest.fail(f"An exception occured: {e}\n{traceback.format_exc()}")


 # test_encoding_and_decoding()
--- a/tests/local_testing/test_traceloop.py
+++ b/tests/local_testing/test_traceloop.py
@ -26,7 +26,7 @@ def exporter():
    return exporter


-@pytest.mark.parametrize("model", ["claude-instant-1.2", "gpt-3.5-turbo"])
+@pytest.mark.parametrize("model", ["claude-3-5-haiku-20241022", "gpt-3.5-turbo"])
 def test_traceloop_logging(exporter, model):
    litellm.completion(
        model=model,
--- a/tests/local_testing/test_wandb.py
+++ b/tests/local_testing/test_wandb.py
@ -57,7 +57,7 @@ test_wandb_logging_async()
 def test_wandb_logging():
    try:
        response = completion(
-            model="claude-instant-1.2",
+            model="claude-3-5-haiku-20241022",
            messages=[{"role": "user", "content": "Hi 👋 - i'm claude"}],
            max_tokens=10,
            temperature=0.2,
--- a/tests/logging_callback_tests/test_langfuse_unit_tests.py
+++ b/tests/logging_callback_tests/test_langfuse_unit_tests.py
@ -1,19 +1,13 @@
-import json
 import os
 import sys
+import threading
 from datetime import datetime

-from pydantic.main import Model
-
 sys.path.insert(
    0, os.path.abspath("../..")
 )  # Adds the parent directory to the system-path

 import pytest
-import litellm
-import asyncio
-import logging
-from litellm._logging import verbose_logger
 from litellm.integrations.langfuse.langfuse import (
    LangFuseLogger,
 )
@ -217,3 +211,27 @@ def test_get_langfuse_logger_for_request_with_cached_logger():

    assert result == cached_logger
    mock_cache.get_cache.assert_called_once()
+
+@pytest.mark.parametrize("metadata", [
+    {'a': 1, 'b': 2, 'c': 3},
+    {'a': {'nested_a': 1}, 'b': {'nested_b': 2}},
+    {'a': [1, 2, 3], 'b': {4, 5, 6}},
+    {'a': (1, 2), 'b': frozenset([3, 4]), 'c': {'d': [5, 6]}},
+    {'lock': threading.Lock()},
+    {'func': lambda x: x + 1},
+    {
+        'int': 42,
+        'str': 'hello',
+        'list': [1, 2, 3],
+        'set': {4, 5},
+        'dict': {'nested': 'value'},
+        'non_copyable': threading.Lock(),
+        'function': print
+    },
+    ['list', 'not', 'a', 'dict'],
+    {'timestamp': datetime.now()},
+    {},
+    None,
+])
+def test_langfuse_logger_prepare_metadata(metadata):
+    global_langfuse_logger._prepare_metadata(metadata)
--- a/tests/router_unit_tests/test_router_helper_utils.py
+++ b/tests/router_unit_tests/test_router_helper_utils.py
@ -986,3 +986,16 @@ def test_pattern_match_deployment_set_model_name(

    print(updated_model)  # Expected output: "openai/fo::hi:static::hello"
    assert updated_model == expected_model
+
+    updated_models = pattern_router._return_pattern_matched_deployments(
+        match,
+        deployments=[
+            {
+                "model_name": model_name,
+                "litellm_params": {"model": litellm_model},
+            }
+        ],
+    )
+
+    for model in updated_models:
+        assert model["litellm_params"]["model"] == expected_model
--- a/tests/test_keys.py
+++ b/tests/test_keys.py
@ -523,8 +523,8 @@ async def test_key_info_spend_values():


@pytest.mark.asyncio
-@pytest.mark.flaky(retries=3, delay=1)
-async def test_key_info_spend_values_streaming():
+@pytest.mark.flaky(retries=6, delay=2)
+async def test_aaaaakey_info_spend_values_streaming():
    """
    Test to ensure spend is correctly calculated.
    - create key
@ -545,7 +545,7 @@ async def test_key_info_spend_values_streaming():
            completion_tokens=completion_tokens,
        )
        response_cost = prompt_cost + completion_cost
-        await asyncio.sleep(5)  # allow db log to be updated
+        await asyncio.sleep(8)  # allow db log to be updated
        print(f"new_key: {new_key}")
        key_info = await get_key_info(
            session=session, get_key=new_key, call_key=new_key