Merge pull request #2846 from BerriAI/litellm_docs_delete_cache_keys

docs - `delete` cache keys
2024-04-04 14:07:50 -07:00 · 2024-04-04 14:07:50 -07:00 · 12e5118367
commit 12e5118367
parent 2e40ab959d 9e9b617934
1 changed files with 81 additions and 26 deletions
--- a/docs/my-website/docs/proxy/caching.md
+++ b/docs/my-website/docs/proxy/caching.md
@ -265,32 +265,6 @@ litellm_settings:
    supported_call_types: ["acompletion", "completion", "embedding", "aembedding"] # defaults to all litellm call types
 ```

-
-### Turn on `batch_redis_requests` 
-
-**What it does?**
-When a request is made:
-
- Check if a key starting with `litellm:<hashed_api_key>:<call_type>:` exists in-memory, if no - get the last 100 cached requests for this key and store it
-
- New requests are stored with this `litellm:..` as the namespace
-
-**Why?**
-Reduce number of redis GET requests. This improved latency by 46% in prod load tests. 
-
-**Usage**
-
-```yaml
-litellm_settings:
-  cache: true
-  cache_params:
-    type: redis
-    ... # remaining redis args (host, port, etc.)
-  callbacks: ["batch_redis_requests"] # 👈 KEY CHANGE!
-```
-
-[**SEE CODE**](https://github.com/BerriAI/litellm/blob/main/litellm/proxy/hooks/batch_redis_get.py)
-
 ### Turn on / off caching per request.  

 The proxy support 3 cache-controls:
@ -384,6 +358,87 @@ chat_completion = client.chat.completions.create(
 )
 ```

+### Deleting Cache Keys - `/cache/delete` 
+In order to delete a cache key, send a request to `/cache/delete` with the `keys` you want to delete
+
+Example 
+```shell
+curl -X POST "http://0.0.0.0:4000/cache/delete" \
+  -H "Authorization: Bearer sk-1234" \
+  -d '{"keys": ["586bf3f3c1bf5aecb55bd9996494d3bbc69eb58397163add6d49537762a7548d", "key2"]}'
+```
+
+```shell
+# {"status":"success"}
+```
+
+#### Viewing Cache Keys from responses
+You can view the cache_key in the response headers, on cache hits the cache key is sent as the `x-litellm-cache-key` response headers
+```shell
+curl -i --location 'http://0.0.0.0:4000/chat/completions' \
+    --header 'Authorization: Bearer sk-1234' \
+    --header 'Content-Type: application/json' \
+    --data '{
+    "model": "gpt-3.5-turbo",
+    "user": "ishan",
+    "messages": [
+        {
+        "role": "user",
+        "content": "what is litellm"
+        }
+    ],
+}'
+```
+
+Response from litellm proxy 
+```json
+date: Thu, 04 Apr 2024 17:37:21 GMT
+content-type: application/json
+x-litellm-cache-key: 586bf3f3c1bf5aecb55bd9996494d3bbc69eb58397163add6d49537762a7548d
+
+{
+    "id": "chatcmpl-9ALJTzsBlXR9zTxPvzfFFtFbFtG6T",
+    "choices": [
+        {
+            "finish_reason": "stop",
+            "index": 0,
+            "message": {
+                "content": "I'm sorr.."
+                "role": "assistant"
+            }
+        }
+    ],
+    "created": 1712252235,
+}
+             
+```
+
+
+### Turn on `batch_redis_requests` 
+
+**What it does?**
+When a request is made:
+
+- Check if a key starting with `litellm:<hashed_api_key>:<call_type>:` exists in-memory, if no - get the last 100 cached requests for this key and store it
+
+- New requests are stored with this `litellm:..` as the namespace
+
+**Why?**
+Reduce number of redis GET requests. This improved latency by 46% in prod load tests. 
+
+**Usage**
+
+```yaml
+litellm_settings:
+  cache: true
+  cache_params:
+    type: redis
+    ... # remaining redis args (host, port, etc.)
+  callbacks: ["batch_redis_requests"] # 👈 KEY CHANGE!
+```
+
+[**SEE CODE**](https://github.com/BerriAI/litellm/blob/main/litellm/proxy/hooks/batch_redis_get.py)
+
 ## Supported `cache_params` on proxy config.yaml

 ```yaml