with caching in docs

2025-04-25 02:34:29 +00:00 · 2023-08-18 14:02:01 -07:00 · 2023-08-18 14:02:01 -07:00 · 83c6d22848
commit 83c6d22848
parent d0ba3ba2e5
2 changed files with 43 additions and 0 deletions
--- a/docs/my-website/docs/caching.md
+++ b/docs/my-website/docs/caching.md
@ -0,0 +1,42 @@
+# Caching Completion() Responses
+
+liteLLM implements exact match caching. It can be enabled by setting
+1. `litellm.caching`: When set to `True`, enables caching for all responses. Keys are the input `messages` and values store in the cache is the corresponding `response`
+
+2. `litellm.caching_with_models`: When set to `True`, enables caching on a per-model basis.Keys are the input `messages + model` and values store in the cache is the corresponding `response` 
+
+## Usage
+1. Caching - cache
+Keys in the cache are `model`, the following example will lead to a cache hit
+```python
+litellm.caching = True
+
+# Make completion calls
+response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
+response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
+
+# response1 == response2, response 1 is cached
+
+# with a diff model
+response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}])
+
+# response3 == response1 == response2, since keys are messages
+```
+
+
+2. Caching with Models - caching_with_models
+Keys in the cache are `messages + model`, the following example will not lead to a cache hit
+```python
+litellm.caching_with_models = True
+
+# Make completion calls
+response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
+response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
+# response1 == response2, response 1 is cached
+
+# with a diff model, this will call the API since the key is not cached
+response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}])
+
+# response3 != response1, since keys are messages + model
+```
+
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -38,6 +38,7 @@ const sidebars = {
    'token_usage',
    'stream',
    'secret',
+    'caching',
    {
      type: 'category',
      label: 'Logging & Observability',