diff --git a/docs/my-website/docs/caching/caching.md b/docs/my-website/docs/caching/caching.md index 48deef320..2147def37 100644 --- a/docs/my-website/docs/caching/caching.md +++ b/docs/my-website/docs/caching/caching.md @@ -1,42 +1,25 @@ # LiteLLM - Caching -liteLLM implements exact match caching. It can be enabled by setting -1. `litellm.caching`: When set to `True`, enables caching for all responses. Keys are the input `messages` and values store in the cache is the corresponding `response` +## LiteLLM Caches `completion()` and `embedding()` calls when switched on -2. `litellm.caching_with_models`: When set to `True`, enables caching on a per-model basis.Keys are the input `messages + model` and values store in the cache is the corresponding `response` +liteLLM implements exact match caching and supports the following Caching: +* In-Memory Caching [Default] +* Redis Caching Local +* Redic Caching Hosted +* GPTCache ## Usage 1. Caching - cache Keys in the cache are `model`, the following example will lead to a cache hit ```python -litellm.caching = True +import litellm +from litellm import completion +from litellm.caching import Cache +litellm.cache = Cache() # Make completion calls response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}]) response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}]) # response1 == response2, response 1 is cached - -# with a diff model -response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}]) - -# response3 == response1 == response2, since keys are messages ``` - - -2. Caching with Models - caching_with_models -Keys in the cache are `messages + model`, the following example will not lead to a cache hit -```python -litellm.caching_with_models = True - -# Make completion calls -response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}]) -response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}]) -# response1 == response2, response 1 is cached - -# with a diff model, this will call the API since the key is not cached -response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}]) - -# response3 != response1, since keys are messages + model -``` -