forked from phoenix/litellm-mirror
update docs
This commit is contained in:
parent
8f7f9ca932
commit
d9b17fb063
1 changed files with 10 additions and 27 deletions
|
@ -1,42 +1,25 @@
|
||||||
# LiteLLM - Caching
|
# LiteLLM - Caching
|
||||||
|
|
||||||
liteLLM implements exact match caching. It can be enabled by setting
|
## LiteLLM Caches `completion()` and `embedding()` calls when switched on
|
||||||
1. `litellm.caching`: When set to `True`, enables caching for all responses. Keys are the input `messages` and values store in the cache is the corresponding `response`
|
|
||||||
|
|
||||||
2. `litellm.caching_with_models`: When set to `True`, enables caching on a per-model basis.Keys are the input `messages + model` and values store in the cache is the corresponding `response`
|
liteLLM implements exact match caching and supports the following Caching:
|
||||||
|
* In-Memory Caching [Default]
|
||||||
|
* Redis Caching Local
|
||||||
|
* Redic Caching Hosted
|
||||||
|
* GPTCache
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
1. Caching - cache
|
1. Caching - cache
|
||||||
Keys in the cache are `model`, the following example will lead to a cache hit
|
Keys in the cache are `model`, the following example will lead to a cache hit
|
||||||
```python
|
```python
|
||||||
litellm.caching = True
|
import litellm
|
||||||
|
from litellm import completion
|
||||||
|
from litellm.caching import Cache
|
||||||
|
litellm.cache = Cache()
|
||||||
|
|
||||||
# Make completion calls
|
# Make completion calls
|
||||||
response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
||||||
response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
||||||
|
|
||||||
# response1 == response2, response 1 is cached
|
# response1 == response2, response 1 is cached
|
||||||
|
|
||||||
# with a diff model
|
|
||||||
response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}])
|
|
||||||
|
|
||||||
# response3 == response1 == response2, since keys are messages
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
2. Caching with Models - caching_with_models
|
|
||||||
Keys in the cache are `messages + model`, the following example will not lead to a cache hit
|
|
||||||
```python
|
|
||||||
litellm.caching_with_models = True
|
|
||||||
|
|
||||||
# Make completion calls
|
|
||||||
response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
|
||||||
response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
|
||||||
# response1 == response2, response 1 is cached
|
|
||||||
|
|
||||||
# with a diff model, this will call the API since the key is not cached
|
|
||||||
response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}])
|
|
||||||
|
|
||||||
# response3 != response1, since keys are messages + model
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue