forked from phoenix/litellm-mirror
add gpt cache to docs
This commit is contained in:
parent
4e8bfeb6f1
commit
628cfa29f3
3 changed files with 34 additions and 1 deletions
42
docs/my-website/docs/caching/caching.md
Normal file
42
docs/my-website/docs/caching/caching.md
Normal file
|
@ -0,0 +1,42 @@
|
|||
# Caching
|
||||
|
||||
liteLLM implements exact match caching. It can be enabled by setting
|
||||
1. `litellm.caching`: When set to `True`, enables caching for all responses. Keys are the input `messages` and values store in the cache is the corresponding `response`
|
||||
|
||||
2. `litellm.caching_with_models`: When set to `True`, enables caching on a per-model basis.Keys are the input `messages + model` and values store in the cache is the corresponding `response`
|
||||
|
||||
## Usage
|
||||
1. Caching - cache
|
||||
Keys in the cache are `model`, the following example will lead to a cache hit
|
||||
```python
|
||||
litellm.caching = True
|
||||
|
||||
# Make completion calls
|
||||
response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
||||
response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
||||
|
||||
# response1 == response2, response 1 is cached
|
||||
|
||||
# with a diff model
|
||||
response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}])
|
||||
|
||||
# response3 == response1 == response2, since keys are messages
|
||||
```
|
||||
|
||||
|
||||
2. Caching with Models - caching_with_models
|
||||
Keys in the cache are `messages + model`, the following example will not lead to a cache hit
|
||||
```python
|
||||
litellm.caching_with_models = True
|
||||
|
||||
# Make completion calls
|
||||
response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
||||
response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
|
||||
# response1 == response2, response 1 is cached
|
||||
|
||||
# with a diff model, this will call the API since the key is not cached
|
||||
response3 = completion(model="command-nightly", messages=[{"role": "user", "content": "Tell me a joke."}])
|
||||
|
||||
# response3 != response1, since keys are messages + model
|
||||
```
|
||||
|
26
docs/my-website/docs/caching/gpt_cache.md
Normal file
26
docs/my-website/docs/caching/gpt_cache.md
Normal file
|
@ -0,0 +1,26 @@
|
|||
# Using GPTCache with LiteLLM
|
||||
|
||||
GPTCache is a Library for Creating Semantic Cache for LLM Queries
|
||||
|
||||
GPTCache Docs: https://gptcache.readthedocs.io/en/latest/index.html#
|
||||
GPTCache Github: https://github.com/zilliztech/GPTCache
|
||||
|
||||
## Usage
|
||||
|
||||
### Install GPTCache
|
||||
pip install gptcache
|
||||
|
||||
### Using GPT Cache with Litellm Completion()
|
||||
|
||||
```python
|
||||
from gptcache import cache
|
||||
from litellm.cache import completion
|
||||
|
||||
# Set your .env keys
|
||||
os.environ['OPENAI_API_KEY'] = ""
|
||||
cache.init()
|
||||
cache.set_openai_key()
|
||||
|
||||
messages = [{"role": "user", "content": "what is litellm YC 22?"}]
|
||||
```
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue