diff --git a/README.md b/README.md index da8328edc..d1f480079 100644 --- a/README.md +++ b/README.md @@ -78,21 +78,14 @@ LiteLLM supports caching `completion()` and `embedding()` calls for all LLMs ```python import litellm from litellm.caching import Cache -litellm.cache = Cache(type="hosted") # init cache to use api.litellm.ai +litellm.cache = Cache() # init cache to use api.litellm.ai -# Make completion calls +# stores this response in cache response1 = litellm.completion( model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Tell me a joke."}] caching=True ) - -response2 = litellm.completion( - model="gpt-3.5-turbo", - messages=[{"role": "user", "content": "Tell me a joke."}], - caching=True -) -# response1 == response2, response 1 is cached ``` ## OpenAI Proxy Server ([Docs](https://docs.litellm.ai/docs/proxy_server))