Merge pull request #4793 from BerriAI/docs_caching

docs fix control cache on / off per request
2024-07-19 12:30:14 -07:00 · 2024-07-19 12:30:14 -07:00 · f7f4b9bee7
commit f7f4b9bee7
parent 8f7cb63b8a 0f52870a82
1 changed files with 127 additions and 0 deletions
--- a/docs/my-website/docs/proxy/caching.md
+++ b/docs/my-website/docs/proxy/caching.md
@ -294,6 +294,11 @@ The proxy support 4 cache-controls:

 **Turn off caching**

+Set `no-cache=True`, this will not return a cached response
+
+<Tabs>
+<TabItem value="openai" label="OpenAI Python SDK">
+
 ```python
 import os
 from openai import OpenAI
@ -319,9 +324,81 @@ chat_completion = client.chat.completions.create(
    }
 )
 ```
+</TabItem>
+
+<TabItem value="curl" label="curl">
+
+```shell
+curl http://localhost:4000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-1234" \
+  -d '{
+    "model": "gpt-3.5-turbo",
+    "cache": {"no-cache": True},
+    "messages": [
+      {"role": "user", "content": "Say this is a test"}
+    ]
+  }'
+```
+
+</TabItem>
+
+</Tabs>

 **Turn on caching**

+By default cache is always on
+
+<Tabs>
+<TabItem value="openai" label="OpenAI Python SDK">
+
+```python
+import os
+from openai import OpenAI
+
+client = OpenAI(
+    # This is the default and can be omitted
+    api_key=os.environ.get("OPENAI_API_KEY"),
+		base_url="http://0.0.0.0:4000"
+)
+
+chat_completion = client.chat.completions.create(
+    messages=[
+        {
+            "role": "user",
+            "content": "Say this is a test",
+        }
+    ],
+    model="gpt-3.5-turbo"
+)
+```
+</TabItem>
+
+<TabItem value="curl on" label="curl">
+
+```shell
+curl http://localhost:4000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-1234" \
+  -d '{
+    "model": "gpt-3.5-turbo",
+    "messages": [
+      {"role": "user", "content": "Say this is a test"}
+    ]
+  }'
+```
+
+</TabItem>
+
+</Tabs>
+
+**Set `ttl`**
+
+Set `ttl=600`, this will caches response for 10 minutes (600 seconds)
+
+<Tabs>
+<TabItem value="openai" label="OpenAI Python SDK">
+
 ```python
 import os
 from openai import OpenAI
@ -347,6 +424,35 @@ chat_completion = client.chat.completions.create(
    }
 )
 ```
+</TabItem>
+
+<TabItem value="curl on" label="curl">
+
+```shell
+curl http://localhost:4000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-1234" \
+  -d '{
+    "model": "gpt-3.5-turbo",
+    "cache": {"ttl": 600},
+    "messages": [
+      {"role": "user", "content": "Say this is a test"}
+    ]
+  }'
+```
+
+</TabItem>
+
+</Tabs>
+
+
+
+**Set `s-maxage`**
+
+Set `s-maxage`, this will only get responses cached within last 10 minutes 
+
+<Tabs>
+<TabItem value="openai" label="OpenAI Python SDK">

 ```python
 import os
@ -373,6 +479,27 @@ chat_completion = client.chat.completions.create(
    }
 )
 ```
+</TabItem>
+
+<TabItem value="curl on" label="curl">
+
+```shell
+curl http://localhost:4000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-1234" \
+  -d '{
+    "model": "gpt-3.5-turbo",
+    "cache": {"s-maxage": 600},
+    "messages": [
+      {"role": "user", "content": "Say this is a test"}
+    ]
+  }'
+```
+
+</TabItem>
+
+</Tabs>
+

 ### Turn on / off caching per Key.