LiteLLM Minor Fixes & Improvements (12/23/2024) - P2 (#7386)

* fix(main.py): support 'mock_timeout=true' param

allows mock requests on proxy to have a time delay, for testing

* fix(main.py): ensure mock timeouts raise litellm.Timeout error

triggers retry/fallbacks

* fix: fix fallback + mock timeout testing

* fix(router.py): always return remaining tpm/rpm limits, if limits are known

allows for rate limit headers to be guaranteed

* docs(timeout.md): add docs on mock timeout = true

* fix(main.py): fix linting errors

* test: fix test
This commit is contained in:
Krish Dholakia 2024-12-23 17:41:27 -08:00 committed by GitHub
parent db59e08958
commit 48316520f4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 223 additions and 54 deletions

View file

@ -344,11 +344,18 @@ async def test_get_remaining_model_group_usage():
]
)
for _ in range(2):
await router.acompletion(
resp = await router.acompletion(
model="gemini/gemini-1.5-flash",
messages=[{"role": "user", "content": "Hello, how are you?"}],
mock_response="Hello, I'm good.",
)
assert (
"x-ratelimit-remaining-tokens" in resp._hidden_params["additional_headers"]
)
assert (
"x-ratelimit-remaining-requests"
in resp._hidden_params["additional_headers"]
)
await asyncio.sleep(1)
remaining_usage = await router.get_remaining_model_group_usage(