phoenix/litellm

forked from phoenix/litellm-mirror

ishaan-jaff d9b17fb063 update docs

2023-08-28 22:16:22 -07:00

766 B

Raw Blame History

LiteLLM - Caching

LiteLLM Caches `completion()` and `embedding()` calls when switched on

liteLLM implements exact match caching and supports the following Caching:

In-Memory Caching [Default]
Redis Caching Local
Redic Caching Hosted
GPTCache

Usage

Caching - cache Keys in the cache are model, the following example will lead to a cache hit

import litellm
from litellm import completion
from litellm.caching import Cache
litellm.cache = Cache()

# Make completion calls
response1 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])
response2 = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}])

# response1 == response2, response 1 is cached