(feat) openai prompt caching (non streaming) - add prompt_tokens_details in usage response (#6039)

* add prompt_tokens_details in usage response * use _prompt_tokens_details as a param in Usage * fix linting errors * fix type error * fix ci/cd deps * bump deps for openai * bump deps openai * fix llm translation testing * fix llm translation embedding
2025-04-25 10:44:24 +00:00 · 2024-10-03 11:01:10 -07:00 · 2024-10-03 11:01:10 -07:00 · 4e88fd65e1
commit 4e88fd65e1
parent 9fccb4a0da
10 changed files with 1515 additions and 1428 deletions
--- a/tests/llm_translation/test_databricks.py
+++ b/tests/llm_translation/test_databricks.py
@ -46,6 +46,7 @@ def mock_chat_response() -> Dict[str, Any]:
            "completion_tokens": 38,
            "completion_tokens_details": None,
            "total_tokens": 268,
+            "prompt_tokens_details": None,
        },
        "system_fingerprint": None,
    }
@ -201,6 +202,7 @@ def mock_embedding_response() -> Dict[str, Any]:
            "total_tokens": 8,
            "completion_tokens": 0,
            "completion_tokens_details": None,
+            "prompt_tokens_details": None,
        },
    }