diff --git a/README.md b/README.md
index 0703abe66..da8328edc 100644
--- a/README.md
+++ b/README.md
@@ -72,6 +72,29 @@ for chunk in result:
   print(chunk['choices'][0]['delta'])
 ```
 
+
+## Caching ([Docs](https://docs.litellm.ai/docs/caching/))
+LiteLLM supports caching `completion()` and `embedding()` calls for all LLMs
+```python
+import litellm
+from litellm.caching import Cache
+litellm.cache = Cache(type="hosted") # init cache to use api.litellm.ai
+
+# Make completion calls
+response1 = litellm.completion(
+    model="gpt-3.5-turbo", 
+    messages=[{"role": "user", "content": "Tell me a joke."}]
+    caching=True
+)
+
+response2 = litellm.completion(
+    model="gpt-3.5-turbo", 
+    messages=[{"role": "user", "content": "Tell me a joke."}],
+    caching=True
+)
+# response1 == response2, response 1 is cached
+```
+
 ## OpenAI Proxy Server ([Docs](https://docs.litellm.ai/docs/proxy_server))
 Spin up a local server to translate openai api calls to any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.)