add gpt cache to docs

2023-08-26 16:30:32 -07:00 · 2023-08-26 16:30:32 -07:00 · 628cfa29f3
commit 628cfa29f3
parent 4e8bfeb6f1
3 changed files with 34 additions and 1 deletions
--- a/docs/my-website/docs/caching/caching.md
+++ b/docs/my-website/docs/caching/caching.md
@ -0,0 +1,42 @@
+# Caching
+
+liteLLM implements exact match caching. It can be enabled by setting
+1. `litellm.caching`: When set to `True`, enables caching for all responses. Keys are the input `messages` and values store in the cache is the corresponding `response`
+
+2. `litellm.caching_with_models`: When set to `True`, enables caching on a per-model basis.Keys are the input `messages + model` and values store in the cache is the corresponding `response` 
+
+## Usage
+1. Caching - cache
+Keys in the cache are `model`, the following example will lead to a cache hit
+```python
+litellm.caching = True
+
+# Make completion calls
+response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
+response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
+
+# response1 == response2, response 1 is cached
+
+# with a diff model
+response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}])
+
+# response3 == response1 == response2, since keys are messages
+```
+
+
+2. Caching with Models - caching_with_models
+Keys in the cache are `messages + model`, the following example will not lead to a cache hit
+```python
+litellm.caching_with_models = True
+
+# Make completion calls
+response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
+response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
+# response1 == response2, response 1 is cached
+
+# with a diff model, this will call the API since the key is not cached
+response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}])
+
+# response3 != response1, since keys are messages + model
+```
+
--- a/docs/my-website/docs/caching/gpt_cache.md
+++ b/docs/my-website/docs/caching/gpt_cache.md
@ -0,0 +1,26 @@
+# Using GPTCache with LiteLLM
+
+GPTCache is a Library for Creating Semantic Cache for LLM Queries
+
+GPTCache Docs: https://gptcache.readthedocs.io/en/latest/index.html#
+GPTCache Github: https://github.com/zilliztech/GPTCache
+
+## Usage
+
+### Install GPTCache
+pip install gptcache
+
+### Using GPT Cache with Litellm Completion()
+
+```python
+from gptcache import cache
+from litellm.cache import completion
+
+# Set your .env keys 
+os.environ['OPENAI_API_KEY'] = ""
+cache.init()
+cache.set_openai_key()
+
+messages = [{"role": "user", "content": "what is litellm YC 22?"}]
+```
+