From afa0b4eee82904edf15ac59a6f9733a46bb77e08 Mon Sep 17 00:00:00 2001 From: Ishaan Jaff Date: Mon, 10 Jun 2024 09:18:51 -0700 Subject: [PATCH] docs - cache controls --- docs/my-website/docs/caching/all_caches.md | 88 ++++++++++++++++++++++ 1 file changed, 88 insertions(+) diff --git a/docs/my-website/docs/caching/all_caches.md b/docs/my-website/docs/caching/all_caches.md index eb309f9b8..1b8bbd8e0 100644 --- a/docs/my-website/docs/caching/all_caches.md +++ b/docs/my-website/docs/caching/all_caches.md @@ -212,6 +212,94 @@ If you run the code two times, response1 will use the cache from the first run t + + +## Switch Cache On / Off Per LiteLLM Call + +LiteLLM supports 4 cache-controls: + +- `no-cache`: *Optional(bool)* When `True`, Will not return a cached response, but instead call the actual endpoint. +- `no-store`: *Optional(bool)* When `True`, Will not cache the response. +- `ttl`: *Optional(int)* - Will cache the response for the user-defined amount of time (in seconds). +- `s-maxage`: *Optional(int)* Will only accept cached responses that are within user-defined range (in seconds). + +[Let us know if you need more](https://github.com/BerriAI/litellm/issues/1218) + + + +Example usage `no-cache` - When `True`, Will not return a cached response + +```python +response = litellm.completion( + model="gpt-3.5-turbo", + messages=[ + { + "role": "user", + "content": "hello who are you" + } + ], + cache={"no-cache": True}, + ) +``` + + + + + +Example usage `no-store` - When `True`, Will not cache the response. + +```python +response = litellm.completion( + model="gpt-3.5-turbo", + messages=[ + { + "role": "user", + "content": "hello who are you" + } + ], + cache={"no-store": True}, + ) +``` + + + + +Example usage `ttl` - cache the response for 10 seconds + +```python +response = litellm.completion( + model="gpt-3.5-turbo", + messages=[ + { + "role": "user", + "content": "hello who are you" + } + ], + cache={"ttl": 10}, + ) +``` + + + + +Example usage `s-maxage` - Will only accept cached responses for 60 seconds + +```python +response = litellm.completion( + model="gpt-3.5-turbo", + messages=[ + { + "role": "user", + "content": "hello who are you" + } + ], + cache={"s-maxage": 60}, + ) +``` + + + + ## Cache Context Manager - Enable, Disable, Update Cache