fix set Caching Default Off

2025-04-27 11:43:54 +00:00 · 2024-08-24 09:43:39 -07:00 · 2024-08-24 09:43:39 -07:00 · 74f0e60962
commit 74f0e60962
parent feb354d3bc
3 changed files with 79 additions and 19 deletions
--- a/docs/my-website/docs/proxy/caching.md
+++ b/docs/my-website/docs/proxy/caching.md
@ -35,7 +35,7 @@ litellm_settings:
 #### [OPTIONAL] Step 1.5: Add redis namespaces, default ttl 
-## Namespace
+#### Namespace
 If you want to create some folder for your keys, you can set a namespace, like this:
 ```yaml
@ -52,7 +52,7 @@ and keys will be stored like:
 litellm_caching:<hash>
 ```
-## Redis Cluster 
+#### Redis Cluster 
 ```yaml
 model_list:
@ -68,7 +68,7 @@ litellm_settings:
    redis_startup_nodes: [{"host": "127.0.0.1", "port": "7001"}] 
 ```
-## TTL
+#### TTL
 ```yaml
 litellm_settings:
@ -81,7 +81,7 @@ litellm_settings:
 ```
-## SSL
+#### SSL
 just set `REDIS_SSL="True"` in your .env, and LiteLLM will pick this up. 
@ -397,7 +397,7 @@ litellm_settings:
                      # /chat/completions, /completions, /embeddings, /audio/transcriptions
 ```
-### Turn on / off caching per request.  
+### **Turn on / off caching per request. **
 The proxy support 4 cache-controls:
@ -699,6 +699,73 @@ x-litellm-cache-key: 586bf3f3c1bf5aecb55bd9996494d3bbc69eb58397163add6d49537762a
 ```
 ### **Set Caching Default Off - Opt in only **
 1. **Set `mode: default_off` for caching**
 ```yaml
 model_list:
  - model_name: fake-openai-endpoint
    litellm_params:
      model: openai/fake
      api_key: fake-key
      api_base: https://exampleopenaiendpoint-production.up.railway.app/
 # default off mode
 litellm_settings:
  set_verbose: True
  cache: True
  cache_params:
    mode: default_off # 👈 Key change cache is default_off
 ```
 2. **Opting in to cache when cache is default off**
 <Tabs>
 <TabItem value="openai" label="OpenAI Python SDK">
 ```python
 import os
 from openai import OpenAI
 client = OpenAI(api_key=<litellm-api-key>, base_url="http://0.0.0.0:4000")
 chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test",
        }
    ],
    model="gpt-3.5-turbo",
    extra_body = {        # OpenAI python accepts extra args in extra_body
        "cache": {"use-cache": True}
    }
 )
 ```
 </TabItem>
 <TabItem value="curl" label="curl">
 ```shell
 curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-3.5-turbo",
    "cache": {"use-cache": True}
    "messages": [
      {"role": "user", "content": "Say this is a test"}
    ]
  }'
 ```
 </TabItem>
 </Tabs>
 ### Turn on `batch_redis_requests` 
--- a/litellm/proxy/proxy_config.yaml
+++ b/litellm/proxy/proxy_config.yaml
@ -5,16 +5,9 @@ model_list:
      api_key: fake-key
      api_base: https://exampleopenaiendpoint-production.up.railway.app/
-guardrails:
+# default off mode
-  - guardrail_name: "custom-pre-guard"
+litellm_settings:
-    litellm_params:
+  set_verbose: True
-      guardrail: custom_guardrail.myCustomGuardrail  
+  cache: True
-      mode: "pre_call"
+  cache_params:
-  - guardrail_name: "custom-during-guard"
+    mode: default_off
    litellm_params:
      guardrail: custom_guardrail.myCustomGuardrail  
      mode: "during_call"
  - guardrail_name: "custom-post-guard"
    litellm_params:
      guardrail: custom_guardrail.myCustomGuardrail
      mode: "post_call"
--- a/litellm/proxy/proxy_server.py
+++ b/litellm/proxy/proxy_server.py
@ -1604,7 +1604,7 @@ class ProxyConfig:
                    self._init_cache(cache_params=cache_params)
                    if litellm.cache is not None:
                        verbose_proxy_logger.debug(  # noqa
-                            f"{blue_color_code}Set Cache on LiteLLM Proxy: {vars(litellm.cache.cache)}{reset_color_code}"
+                            f"{blue_color_code}Set Cache on LiteLLM Proxy= {vars(litellm.cache.cache)}{vars(litellm.cache)}{reset_color_code}"
                        )
                elif key == "cache" and value is False:
                    pass