# Using GPT Cache with LiteLLM
GPT Cache allows you to slash Your LLM API Costs by 10x ðŸ’°, Boost Speed by 100x âš¡

In this tutorial we demo how to use LiteLLM with GPTCache
* Quick Start Usage
* Advanced Usaged
* Setting custom cache keys



In [None]:
# installation
!pip install litellm gptcache

# Set ENV variables


In [12]:
import os
os.environ['OPENAI_API_KEY'] = ""
os.environ['COHERE_API_KEY'] = ""

# Quick Start Usage
By default GPT Cache uses the content in `messages` as the cache key
 Import GPT Cache

In [4]:
import litellm
from litellm.gpt_cache import completion

### using / setting up gpt cache
from gptcache import cache
cache.init()
cache.set_openai_key()
#########################

In [7]:
## two completion calls
import time
question = "why should i use LiteLLM"
for _ in range(2):
    start_time = time.time()
    response = completion(
      model='gpt-3.5-turbo',
      messages=[
        {
            'role': 'user',
            'content': question
        }
      ],
    )
    print(f'Question: {question}, Response {response}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))

Question: why should i use LiteLLM, Response {
  "id": "chatcmpl-7tJozrtW5UzVHNUcxX6cfzRS4nbxd",
  "object": "chat.completion",
  "created": 1693418589,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "There are several reasons why you might consider using LiteLLM:\n\n1. Simplified document management: LiteLLM offers a user-friendly interface that makes it easy to manage and organize your legal documents. You can track versions, organize files into folders, and quickly find what you need.\n\n2. Collaboration and accessibility: LiteLLM allows multiple users to work on documents simultaneously, making it easier for teams to collaborate and exchange feedback. It also provides flexible accessibility, allowing you to access your documents from anywhere, anytime, as long as you have an internet connection.\n\n3. Time-saving features: The platform offers various time-saving features, such as automated d

# Advanced Usage - Setting custom keys for Cache
By default gptcache uses the `messages` as the cache key

GPTCache allows you to set custom cache keys by setting
```python
cache.init(pre_func=pre_cache_func)
```

In this code snippet below we define a `pre_func` that returns message content + model as key

## Defining a `pre_func` for GPTCache


In [9]:
### using / setting up gpt cache
from gptcache import cache
from gptcache.processor.pre import last_content_without_prompt
from typing import Dict, Any

# use this function to set your cache keys -> gptcache
# data are all the args passed to your completion call
def pre_cache_func(data: Dict[str, Any], **params: Dict[str, Any]) -> Any:
        # use this to set cache key
        print("in pre_cache_func")
        last_content_without_prompt_val = last_content_without_prompt(data, **params)
        print("last content without prompt", last_content_without_prompt_val)
        print("model", data["model"])
        cache_key = last_content_without_prompt_val + data["model"]
        print("cache_key", cache_key)
        return cache_key # using this as cache_key


## Init Cache with `pre_func` to set custom keys

In [10]:
# init GPT Cache with custom pre_func
cache.init(pre_func=pre_cache_func)
cache.set_openai_key()

## Using Cache
* Cache key is `message` + `model`

We make 3 LLM API calls
* 2 to OpenAI
* 1 to Cohere command nightly

In [14]:
messages = [{"role": "user", "content": "why should I use LiteLLM for completions()"}]
response1 = completion(model="gpt-3.5-turbo", messages=messages)
response2 = completion(model="gpt-3.5-turbo", messages=messages)
response3 = completion(model="command-nightly", messages=messages) # calling cohere command nightly

if response1["choices"] != response2["choices"]: # same models should cache
    print(f"Error occurred: Caching for same model+prompt failed")

if response3["choices"] == response2["choices"]: # different models, don't cache
    # if models are different, it should not return cached response
    print(f"Error occurred: Caching for different model+prompt failed")

print("response1", response1)
print("response2", response2)
print("response3", response3)

in pre_cache_func
last content without prompt why should I use LiteLLM for completions()
model gpt-3.5-turbo
cache_key why should I use LiteLLM for completions()gpt-3.5-turbo
in pre_cache_func
last content without prompt why should I use LiteLLM for completions()
model gpt-3.5-turbo
cache_key why should I use LiteLLM for completions()gpt-3.5-turbo
in pre_cache_func
last content without prompt why should I use LiteLLM for completions()
model command-nightly
cache_key why should I use LiteLLM for completions()command-nightly
response1 {
  "id": "chatcmpl-7tKE21PEe43sR6RvZ7pcUmanFwZLf",
  "object": "chat.completion",
  "created": 1693420142,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "There are several reasons why you should use LiteLLM for completions() in your code:\n\n1. Fast and efficient: LiteLLM is implemented in a lightweight manner, making it highly performant. It provides quick and acc