diff --git a/docs/my-website/docs/proxy/caching.md b/docs/my-website/docs/proxy/caching.md index 2be1d8de1..1521f63b0 100644 --- a/docs/my-website/docs/proxy/caching.md +++ b/docs/my-website/docs/proxy/caching.md @@ -225,6 +225,32 @@ litellm_settings: supported_call_types: ["acompletion", "completion", "embedding", "aembedding"] # defaults to all litellm call types ``` + +### Turn on `batch_redis_requests` + +**What it does?** +When a request is made: + +- Check if a key starting with `litellm:::` exists in-memory, if no - get the last 100 cached requests for this key and store it + +- New requests are stored with this `litellm:..` as the namespace + +**Why?** +Reduce number of redis GET requests. This improved latency by 46% in prod load tests. + +**Usage** + +```yaml +litellm_settings: + cache: true + cache_params: + type: redis + ... # remaining redis args (host, port, etc.) + callbacks: ["batch_redis_requests"] # 👈 KEY CHANGE! +``` + +[**SEE CODE**](https://github.com/BerriAI/litellm/blob/main/litellm/proxy/hooks/batch_redis_get.py) + ### Turn on / off caching per request. The proxy support 3 cache-controls: