forked from phoenix/litellm-mirror
92 lines
No EOL
2.3 KiB
Markdown
92 lines
No EOL
2.3 KiB
Markdown
# LiteLLM - Local Caching
|
|
|
|
## Caching `completion()` and `embedding()` calls when switched on
|
|
|
|
liteLLM implements exact match caching and supports the following Caching:
|
|
* In-Memory Caching [Default]
|
|
* Redis Caching Local
|
|
* Redis Caching Hosted
|
|
|
|
## Quick Start Usage - Completion
|
|
Caching - cache
|
|
Keys in the cache are `model`, the following example will lead to a cache hit
|
|
```python
|
|
import litellm
|
|
from litellm import completion
|
|
from litellm.caching import Cache
|
|
litellm.cache = Cache()
|
|
|
|
# Make completion calls
|
|
response1 = completion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Tell me a joke."}]
|
|
caching=True
|
|
)
|
|
response2 = completion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Tell me a joke."}],
|
|
caching=True
|
|
)
|
|
|
|
# response1 == response2, response 1 is cached
|
|
```
|
|
|
|
## Custom Key-Value Pairs
|
|
Add custom key-value pairs to your cache.
|
|
|
|
```python
|
|
from litellm.caching import Cache
|
|
cache = Cache()
|
|
|
|
cache.add_cache(cache_key="test-key", result="1234")
|
|
|
|
cache.get_cache(cache_key="test-key)
|
|
```
|
|
|
|
## Caching with Streaming
|
|
LiteLLM can cache your streamed responses for you
|
|
|
|
### Usage
|
|
```python
|
|
import litellm
|
|
from litellm import completion
|
|
from litellm.caching import Cache
|
|
litellm.cache = Cache()
|
|
|
|
# Make completion calls
|
|
response1 = completion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Tell me a joke."}],
|
|
stream=True,
|
|
caching=True)
|
|
for chunk in response1:
|
|
print(chunk)
|
|
response2 = completion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Tell me a joke."}],
|
|
stream=True,
|
|
caching=True)
|
|
for chunk in response2:
|
|
print(chunk)
|
|
```
|
|
|
|
## Usage - Embedding()
|
|
1. Caching - cache
|
|
Keys in the cache are `model`, the following example will lead to a cache hit
|
|
```python
|
|
import time
|
|
import litellm
|
|
from litellm import embedding
|
|
from litellm.caching import Cache
|
|
litellm.cache = Cache()
|
|
|
|
start_time = time.time()
|
|
embedding1 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
|
|
end_time = time.time()
|
|
print(f"Embedding 1 response time: {end_time - start_time} seconds")
|
|
|
|
start_time = time.time()
|
|
embedding2 = embedding(model="text-embedding-ada-002", input=["hello from litellm"*5], caching=True)
|
|
end_time = time.time()
|
|
print(f"Embedding 2 response time: {end_time - start_time} seconds")
|
|
``` |