explain better behavior of usage-based-routing-v2

2025-04-24 10:14:26 +00:00 · 2025-03-10 10:24:29 +01:00 · 2025-03-10 10:24:29 +01:00 · fbfb86f1e6
commit fbfb86f1e6
parent 0fa3ec98ab
1 changed files with 3 additions and 3 deletions
--- a/docs/my-website/docs/routing.md
+++ b/docs/my-website/docs/routing.md
@ -163,9 +163,9 @@ Router provides 4 strategies for routing your calls across multiple deployments:

 **Filters out deployment if tpm/rpm limit exceeded** - If you pass in the deployment's tpm/rpm limits.

-Routes to **deployment with lowest TPM usage** for that minute. 
+Routes to **deployment with lowest TPM usage** for that minute. If two deployments have the same usage, it chooses randomly. This does not automatically favor a higher-limit deployment up front—but if usage spikes, the smaller-limit deployment may hit its cap and get excluded, so the bigger-limit one will remain in the pool longer.

-In production, we use Redis to track usage (TPM/RPM) across multiple deployments. This implementation uses **async redis calls** (redis.incr and redis.mget).
+In production, we use Redis to track usage (TPM/RPM) across multiple deployments. This implementation uses **async redis calls** (`redis.incr` and `redis.mget`).

 For Azure, [you get 6 RPM per 1000 TPM](https://stackoverflow.com/questions/77368844/what-is-the-request-per-minute-rate-limit-for-azure-openai-models-for-gpt-3-5-tu)

@ -1639,4 +1639,4 @@ class RouterGeneralSettings(BaseModel):
    pass_through_all_models: bool = Field(
        default=False
    )  # if passed a model not llm_router model list, pass through the request to litellm.acompletion/embedding
-```
+```