mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-26 03:04:13 +00:00
docs prompt cache controls
This commit is contained in:
parent
b3f37b860d
commit
9e90676058
5 changed files with 144 additions and 19 deletions
130
docs/my-website/docs/tutorials/prompt_caching.md
Normal file
130
docs/my-website/docs/tutorials/prompt_caching.md
Normal file
|
@ -0,0 +1,130 @@
|
||||||
|
import Image from '@theme/IdealImage';
|
||||||
|
import Tabs from '@theme/Tabs';
|
||||||
|
import TabItem from '@theme/TabItem';
|
||||||
|
|
||||||
|
|
||||||
|
# Auto-Inject Prompt Caching Checkpoints
|
||||||
|
|
||||||
|
Reduce costs by up to 90% by using LiteLLM to auto-inject prompt caching checkpoints.
|
||||||
|
|
||||||
|
<Image img={require('../../img/auto_prompt_caching.png')} style={{ width: '800px', height: 'auto' }} />
|
||||||
|
|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
LiteLLM can automatically inject prompt caching checkpoints into your requests to LLM providers. This allows:
|
||||||
|
|
||||||
|
- **Cached Processing**: Long, static parts of your prompts can be cached to avoid repeated processing
|
||||||
|
- **Cost Reduction**: Only process the dynamic parts of your prompts, significantly reducing API costs
|
||||||
|
- **Seamless Integration**: No need to modify your application code
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
You need to specify `cache_control_injection_points` in your model configuration. This tells LiteLLM:
|
||||||
|
1. Where to add the caching directive (`location`)
|
||||||
|
2. Which message to target (`role`)
|
||||||
|
|
||||||
|
LiteLLM will then automatically add a `cache_control` directive to the specified messages in your requests:
|
||||||
|
|
||||||
|
```json
|
||||||
|
"cache_control": {
|
||||||
|
"type": "ephemeral"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage Example
|
||||||
|
|
||||||
|
In this example, we'll configure caching for system messages by adding the directive to all messages with `role: system`.
|
||||||
|
|
||||||
|
<Tabs>
|
||||||
|
<TabItem value="litellm config.yaml" label="litellm config.yaml">
|
||||||
|
|
||||||
|
```yaml showLineNumbers title="litellm config.yaml"
|
||||||
|
model_list:
|
||||||
|
- model_name: anthropic-auto-inject-cache-system-message
|
||||||
|
litellm_params:
|
||||||
|
model: anthropic/claude-3-5-sonnet-20240620
|
||||||
|
api_key: os.environ/ANTHROPIC_API_KEY
|
||||||
|
cache_control_injection_points:
|
||||||
|
- location: message
|
||||||
|
role: system
|
||||||
|
```
|
||||||
|
</TabItem>
|
||||||
|
|
||||||
|
<TabItem value="UI" label="LiteLLM UI">
|
||||||
|
|
||||||
|
On the LiteLLM UI, you can specify the `cache_control_injection_points` in the `Advanced Settings` tab when adding a model.
|
||||||
|
<Image img={require('../../img/ui_auto_prompt_caching.png')}/>
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
|
</Tabs>
|
||||||
|
|
||||||
|
|
||||||
|
## Detailed Example
|
||||||
|
|
||||||
|
### 1. Original Request to LiteLLM
|
||||||
|
|
||||||
|
In this example, we have a very long, static system message and a varying user message. It's efficient to cache the system message since it rarely changes.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"type": "text",
|
||||||
|
"text": "You are a helpful assistant. This is a set of very long instructions that you will follow. Here is a legal document that you will use to answer the user's question."
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"type": "text",
|
||||||
|
"text": "What is the main topic of this legal document?"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. LiteLLM's Modified Request
|
||||||
|
|
||||||
|
LiteLLM auto-injects the caching directive into the system message based on our configuration:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "system",
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"type": "text",
|
||||||
|
"text": "You are a helpful assistant. This is a set of very long instructions that you will follow. Here is a legal document that you will use to answer the user's question.",
|
||||||
|
"cache_control": {"type": "ephemeral"}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": [
|
||||||
|
{
|
||||||
|
"type": "text",
|
||||||
|
"text": "What is the main topic of this legal document?"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
When the model provider processes this request, it will recognize the caching directive and only process the system message once, caching it for subsequent requests.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
BIN
docs/my-website/img/auto_prompt_caching.png
Normal file
BIN
docs/my-website/img/auto_prompt_caching.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 1.8 MiB |
BIN
docs/my-website/img/ui_auto_prompt_caching.png
Normal file
BIN
docs/my-website/img/ui_auto_prompt_caching.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 85 KiB |
|
@ -444,6 +444,7 @@ const sidebars = {
|
||||||
items: [
|
items: [
|
||||||
"tutorials/openweb_ui",
|
"tutorials/openweb_ui",
|
||||||
"tutorials/msft_sso",
|
"tutorials/msft_sso",
|
||||||
|
"tutorials/prompt_caching",
|
||||||
"tutorials/tag_management",
|
"tutorials/tag_management",
|
||||||
'tutorials/litellm_proxy_aporia',
|
'tutorials/litellm_proxy_aporia',
|
||||||
{
|
{
|
||||||
|
|
|
@ -1,24 +1,18 @@
|
||||||
model_list:
|
model_list:
|
||||||
- model_name: fake-openai-endpoint
|
- model_name: anhropic-auto-inject-cache-user-message
|
||||||
litellm_params:
|
litellm_params:
|
||||||
model: openai/fake
|
model: anhropic/claude-3-5-sonnet-20240620
|
||||||
api_key: fake-key
|
api_key: os.environ/ANTHROPIC_API_KEY
|
||||||
api_base: https://exampleopenaiendpoint-production.up.railway.app/
|
cache_control_injection_points:
|
||||||
- model_name: openai/gpt-4o
|
- location: message
|
||||||
|
role: user
|
||||||
|
|
||||||
|
- model_name: anhropic-auto-inject-cache-system-message
|
||||||
litellm_params:
|
litellm_params:
|
||||||
model: openai/gpt-4o
|
model: anhropic/claude-3-5-sonnet-20240620
|
||||||
api_key: fake-key
|
api_key: os.environ/ANTHROPIC_API_KEY
|
||||||
|
cache_control_injection_points:
|
||||||
|
- location: message
|
||||||
|
role: user
|
||||||
|
|
||||||
litellm_settings:
|
|
||||||
default_team_settings:
|
|
||||||
- team_id: test_dev
|
|
||||||
success_callback: ["langfuse", "s3"]
|
|
||||||
langfuse_secret: secret-test-key
|
|
||||||
langfuse_public_key: public-test-key
|
|
||||||
- team_id: my_workflows
|
|
||||||
success_callback: ["langfuse", "s3"]
|
|
||||||
langfuse_secret: secret-workflows-key
|
|
||||||
langfuse_public_key: public-workflows-key
|
|
||||||
|
|
||||||
router_settings:
|
|
||||||
enable_tag_filtering: True # 👈 Key Change
|
|
Loading…
Add table
Add a link
Reference in a new issue