diff --git a/docs/my-website/docs/caching/caching.md b/docs/my-website/docs/caching/caching.md index 1fa9611dd..c3ea0d7fb 100644 --- a/docs/my-website/docs/caching/caching.md +++ b/docs/my-website/docs/caching/caching.md @@ -5,7 +5,7 @@ liteLLM implements exact match caching and supports the following Caching: * In-Memory Caching [Default] * Redis Caching Local -* Redic Caching Hosted +* Redis Caching Hosted * GPTCache ## Quick Start Usage - Completion @@ -45,6 +45,65 @@ response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "conten # response1 == response2, response 1 is cached ``` +### Custom Cache Keys: + +Define function to return cache key +```python +# this function takes in *args, **kwargs and returns the key you want to use for caching +def custom_get_cache_key(*args, **kwargs): + # return key to use for your cache: + key = kwargs.get("model", "") + str(kwargs.get("messages", "")) + str(kwargs.get("temperature", "")) + str(kwargs.get("logit_bias", "")) + print("key for cache", key) + return key + +``` + +Set your function as litellm.cache.get_cache_key +```python +from litellm.caching import Cache + +cache = Cache(type="redis", host=os.environ['REDIS_HOST'], port=os.environ['REDIS_PORT'], password=os.environ['REDIS_PASSWORD']) + +cache.get_cache_key = custom_get_cache_key # set get_cache_key function for your cache + +litellm.cache = cache # set litellm.cache to your cache + +``` + +### Controlling Caching for each litellm.completion call + +`completion()` lets you pass in `caching` (bool) [default False] to control whether to returned cached responses or not + +Using the caching flag +**Ensure you have initialized litellm.cache to your cache object** + +```python +from litellm import completion + +response2 = completion(model="gpt-3.5-turbo", messages=messages, temperature=0.1, caching=True) + +response3 = completion(model="gpt-3.5-turbo", messages=messages, temperature=0.1, caching=False) + +``` +### Detecting Cached Responses +For resposes that were returned as cache hit, the response includes a param `cache` = True + +Example response with cache hit +``` +{ + 'cache': True, + 'id': 'chatcmpl-7wggdzd6OXhgE2YhcLJHJNZsEWzZ2', + 'created': 1694221467, + 'model': 'gpt-3.5-turbo-0613', + 'choices': [ + { + 'index': 0, 'message': {'role': 'assistant', 'content': 'I\'m sorry, but I couldn\'t find any information about "litellm" or how many stars it has. It is possible that you may be referring to a specific product, service, or platform that I am not familiar with. Can you please provide more context or clarify your question?' + }, 'finish_reason': 'stop'} + ], + 'usage': {'prompt_tokens': 17, 'completion_tokens': 59, 'total_tokens': 76}, +} + +``` ## Caching with Streaming LiteLLM can cache your streamed responses for you