diff --git a/docs/my-website/docs/caching/all_caches.md b/docs/my-website/docs/caching/all_caches.md index eb309f9b8..1b8bbd8e0 100644 --- a/docs/my-website/docs/caching/all_caches.md +++ b/docs/my-website/docs/caching/all_caches.md @@ -212,6 +212,94 @@ If you run the code two times, response1 will use the cache from the first run t + + +## Switch Cache On / Off Per LiteLLM Call + +LiteLLM supports 4 cache-controls: + +- `no-cache`: *Optional(bool)* When `True`, Will not return a cached response, but instead call the actual endpoint. +- `no-store`: *Optional(bool)* When `True`, Will not cache the response. +- `ttl`: *Optional(int)* - Will cache the response for the user-defined amount of time (in seconds). +- `s-maxage`: *Optional(int)* Will only accept cached responses that are within user-defined range (in seconds). + +[Let us know if you need more](https://github.com/BerriAI/litellm/issues/1218) + + + +Example usage `no-cache` - When `True`, Will not return a cached response + +```python +response = litellm.completion( + model="gpt-3.5-turbo", + messages=[ + { + "role": "user", + "content": "hello who are you" + } + ], + cache={"no-cache": True}, + ) +``` + + + + + +Example usage `no-store` - When `True`, Will not cache the response. + +```python +response = litellm.completion( + model="gpt-3.5-turbo", + messages=[ + { + "role": "user", + "content": "hello who are you" + } + ], + cache={"no-store": True}, + ) +``` + + + + +Example usage `ttl` - cache the response for 10 seconds + +```python +response = litellm.completion( + model="gpt-3.5-turbo", + messages=[ + { + "role": "user", + "content": "hello who are you" + } + ], + cache={"ttl": 10}, + ) +``` + + + + +Example usage `s-maxage` - Will only accept cached responses for 60 seconds + +```python +response = litellm.completion( + model="gpt-3.5-turbo", + messages=[ + { + "role": "user", + "content": "hello who are you" + } + ], + cache={"s-maxage": 60}, + ) +``` + + + + ## Cache Context Manager - Enable, Disable, Update Cache