forked from phoenix/litellm-mirror
83 lines
No EOL
2.5 KiB
Markdown
83 lines
No EOL
2.5 KiB
Markdown
# Caching
|
|
Cache LLM Responses
|
|
|
|
Caching can be enabled by adding the `cache` key in the `config.yaml`
|
|
#### Step 1: Add `cache` to the config.yaml
|
|
```yaml
|
|
model_list:
|
|
- model_name: gpt-3.5-turbo
|
|
litellm_params:
|
|
model: gpt-3.5-turbo
|
|
|
|
litellm_settings:
|
|
set_verbose: True
|
|
cache: True # set cache responses to True, litellm defaults to using a redis cache
|
|
```
|
|
|
|
#### Step 2: Add Redis Credentials to .env
|
|
Set either `REDIS_URL` or the `REDIS_HOST` in your os environment, to enable caching.
|
|
|
|
```shell
|
|
REDIS_URL = "" # REDIS_URL='redis://username:password@hostname:port/database'
|
|
## OR ##
|
|
REDIS_HOST = "" # REDIS_HOST='redis-18841.c274.us-east-1-3.ec2.cloud.redislabs.com'
|
|
REDIS_PORT = "" # REDIS_PORT='18841'
|
|
REDIS_PASSWORD = "" # REDIS_PASSWORD='liteLlmIsAmazing'
|
|
```
|
|
|
|
**Additional kwargs**
|
|
You can pass in any additional redis.Redis arg, by storing the variable + value in your os environment, like this:
|
|
```shell
|
|
REDIS_<redis-kwarg-name> = ""
|
|
```
|
|
|
|
[**See how it's read from the environment**](https://github.com/BerriAI/litellm/blob/4d7ff1b33b9991dcf38d821266290631d9bcd2dd/litellm/_redis.py#L40)
|
|
#### Step 3: Run proxy with config
|
|
```shell
|
|
$ litellm --config /path/to/config.yaml
|
|
```
|
|
|
|
#### Using Caching
|
|
Send the same request twice:
|
|
```shell
|
|
curl http://0.0.0.0:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-3.5-turbo",
|
|
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
|
|
"temperature": 0.7
|
|
}'
|
|
|
|
curl http://0.0.0.0:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-3.5-turbo",
|
|
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
|
|
"temperature": 0.7
|
|
}'
|
|
```
|
|
|
|
#### Control caching per completion request
|
|
Caching can be switched on/off per `/chat/completions` request
|
|
- Caching **on** for completion - pass `caching=True`:
|
|
```shell
|
|
curl http://0.0.0.0:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-3.5-turbo",
|
|
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
|
|
"temperature": 0.7,
|
|
"caching": true
|
|
}'
|
|
```
|
|
- Caching **off** for completion - pass `caching=False`:
|
|
```shell
|
|
curl http://0.0.0.0:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-3.5-turbo",
|
|
"messages": [{"role": "user", "content": "write a poem about litellm!"}],
|
|
"temperature": 0.7,
|
|
"caching": false
|
|
}'
|
|
``` |