forked from phoenix/litellm-mirror
doc aporia_w_litellm
This commit is contained in:
parent
a0361b0e76
commit
b7d4031f89
7 changed files with 176 additions and 103 deletions
|
@ -1,98 +0,0 @@
|
||||||
import Image from '@theme/IdealImage';
|
|
||||||
|
|
||||||
# Split traffic betwen GPT-4 and Llama2 in Production!
|
|
||||||
In this tutorial, we'll walk through A/B testing between GPT-4 and Llama2 in production. We'll assume you've deployed Llama2 on Huggingface Inference Endpoints (but any of TogetherAI, Baseten, Ollama, Petals, Openrouter should work as well).
|
|
||||||
|
|
||||||
|
|
||||||
# Relevant Resources:
|
|
||||||
|
|
||||||
* 🚀 [Your production dashboard!](https://admin.litellm.ai/)
|
|
||||||
|
|
||||||
|
|
||||||
* [Deploying models on Huggingface](https://huggingface.co/docs/inference-endpoints/guides/create_endpoint)
|
|
||||||
* [All supported providers on LiteLLM](https://docs.litellm.ai/docs/providers)
|
|
||||||
|
|
||||||
# Code Walkthrough
|
|
||||||
|
|
||||||
In production, we don't know if Llama2 is going to provide:
|
|
||||||
* good results
|
|
||||||
* quickly
|
|
||||||
|
|
||||||
### 💡 Route 20% traffic to Llama2
|
|
||||||
If Llama2 returns poor answers / is extremely slow, we want to roll-back this change, and use GPT-4 instead.
|
|
||||||
|
|
||||||
Instead of routing 100% of our traffic to Llama2, let's **start by routing 20% traffic** to it and see how it does.
|
|
||||||
|
|
||||||
```python
|
|
||||||
## route 20% of responses to Llama2
|
|
||||||
split_per_model = {
|
|
||||||
"gpt-4": 0.8,
|
|
||||||
"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## 👨💻 Complete Code
|
|
||||||
|
|
||||||
### a) For Local
|
|
||||||
If we're testing this in a script - this is what our complete code looks like.
|
|
||||||
```python
|
|
||||||
from litellm import completion_with_split_tests
|
|
||||||
import os
|
|
||||||
|
|
||||||
## set ENV variables
|
|
||||||
os.environ["OPENAI_API_KEY"] = "openai key"
|
|
||||||
os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
|
|
||||||
|
|
||||||
## route 20% of responses to Llama2
|
|
||||||
split_per_model = {
|
|
||||||
"gpt-4": 0.8,
|
|
||||||
"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
|
|
||||||
}
|
|
||||||
|
|
||||||
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
|
||||||
|
|
||||||
completion_with_split_tests(
|
|
||||||
models=split_per_model,
|
|
||||||
messages=messages,
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### b) For Production
|
|
||||||
|
|
||||||
If we're in production, we don't want to keep going to code to change model/test details (prompt, split%, etc.) for our completion function and redeploying changes.
|
|
||||||
|
|
||||||
LiteLLM exposes a client dashboard to do this in a UI - and instantly updates our completion function in prod.
|
|
||||||
|
|
||||||
#### Relevant Code
|
|
||||||
|
|
||||||
```python
|
|
||||||
completion_with_split_tests(..., use_client=True, id="my-unique-id")
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Complete Code
|
|
||||||
|
|
||||||
```python
|
|
||||||
from litellm import completion_with_split_tests
|
|
||||||
import os
|
|
||||||
|
|
||||||
## set ENV variables
|
|
||||||
os.environ["OPENAI_API_KEY"] = "openai key"
|
|
||||||
os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
|
|
||||||
|
|
||||||
## route 20% of responses to Llama2
|
|
||||||
split_per_model = {
|
|
||||||
"gpt-4": 0.8,
|
|
||||||
"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
|
|
||||||
}
|
|
||||||
|
|
||||||
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
|
||||||
|
|
||||||
completion_with_split_tests(
|
|
||||||
models=split_per_model,
|
|
||||||
messages=messages,
|
|
||||||
use_client=True,
|
|
||||||
id="my-unique-id" # Auto-create this @ https://admin.litellm.ai/
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
163
docs/my-website/docs/tutorials/litellm_proxy_aporia.md
Normal file
163
docs/my-website/docs/tutorials/litellm_proxy_aporia.md
Normal file
|
@ -0,0 +1,163 @@
|
||||||
|
import Image from '@theme/IdealImage';
|
||||||
|
import Tabs from '@theme/Tabs';
|
||||||
|
import TabItem from '@theme/TabItem';
|
||||||
|
|
||||||
|
# Use LiteLLM AI Gateway with Aporia Guardrails
|
||||||
|
|
||||||
|
In this tutorial we will use LiteLLM Proxy with Aporia to detect PII in requests and profanity in responses
|
||||||
|
|
||||||
|
## 1. Setup guardrails on Aporia
|
||||||
|
|
||||||
|
### Create Aporia Projects
|
||||||
|
|
||||||
|
Create two projects on [Aporia](https://guardrails.aporia.com/)
|
||||||
|
|
||||||
|
1. Pre LLM API Call - Set all the policies you want to run on pre LLM API call
|
||||||
|
2. Post LLM API Call - Set all the policies you want to run post LLM API call
|
||||||
|
|
||||||
|
|
||||||
|
<Image img={require('../../img/aporia_projs.png')} />
|
||||||
|
|
||||||
|
|
||||||
|
### Pre-Call: Detect PII
|
||||||
|
|
||||||
|
Add the `PII - Prompt` to your Pre LLM API Call project
|
||||||
|
|
||||||
|
<Image img={require('../../img/aporia_pre.png')} />
|
||||||
|
|
||||||
|
### Post-Call: Detect Profanity in Responses
|
||||||
|
|
||||||
|
Add the `Toxicity - Response` to your Post LLM API Call project
|
||||||
|
|
||||||
|
<Image img={require('../../img/aporia_post.png')} />
|
||||||
|
|
||||||
|
|
||||||
|
## 2. Define Guardrails on your LiteLLM config.yaml
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
model_list:
|
||||||
|
- model_name: gpt-3.5-turbo
|
||||||
|
litellm_params:
|
||||||
|
model: openai/gpt-3.5-turbo
|
||||||
|
api_key: os.environ/OPENAI_API_KEY
|
||||||
|
|
||||||
|
guardrails:
|
||||||
|
pre_call: # guardrail only runs on input before LLM API call
|
||||||
|
guardrail: "aporia" # supported values ["aporia", "bedrock", "lakera"]
|
||||||
|
api_key: os.environ/APORIA_API_KEY_1
|
||||||
|
api_base: os.environ/APORIA_API_BASE_1
|
||||||
|
post_call: # guardrail only runs on output after LLM API call
|
||||||
|
guardrail: "aporia" # supported values ["aporia", "bedrock", "lakera"]
|
||||||
|
api_key: os.environ/APORIA_API_KEY_2
|
||||||
|
api_base: os.environ/APORIA_API_BASE_2
|
||||||
|
```
|
||||||
|
|
||||||
|
## 3. Start LiteLLM Gateway
|
||||||
|
|
||||||
|
|
||||||
|
```shell
|
||||||
|
litellm --config config.yaml --detailed_debug
|
||||||
|
```
|
||||||
|
|
||||||
|
## 4. Test request
|
||||||
|
|
||||||
|
<Tabs>
|
||||||
|
<TabItem label="Fails Guardrail" value = "not-allowed">
|
||||||
|
|
||||||
|
Expect this to fail since since `ishaan@berri.ai` in the request is PII
|
||||||
|
|
||||||
|
```shell
|
||||||
|
curl -i http://localhost:4000/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer sk-1234" \
|
||||||
|
-d '{
|
||||||
|
"model": "gpt-3.5-turbo",
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": "hi my email is ishaan@berri.ai"}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
|
|
||||||
|
<TabItem label="Success" value = "allowed">
|
||||||
|
|
||||||
|
```shell
|
||||||
|
curl -i http://localhost:4000/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer sk-1234" \
|
||||||
|
-d '{
|
||||||
|
"model": "gpt-3.5-turbo",
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": "hi what is the weather?"}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
|
|
||||||
|
|
||||||
|
</Tabs>
|
||||||
|
|
||||||
|
## Advanced
|
||||||
|
### Control Guardrails per Project (API Key)
|
||||||
|
|
||||||
|
Use this to control what guardrail/s run per project. In this tutorial we only want the following guardrails to run for 1 project
|
||||||
|
- pre_call: aporia
|
||||||
|
- post_call: aporia
|
||||||
|
|
||||||
|
**Step 1** Create Key with guardrail settings
|
||||||
|
|
||||||
|
<Tabs>
|
||||||
|
<TabItem value="/key/generate" label="/key/generate">
|
||||||
|
|
||||||
|
```shell
|
||||||
|
curl -X POST 'http://0.0.0.0:4000/key/generate' \
|
||||||
|
-H 'Authorization: Bearer sk-1234' \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-D '{
|
||||||
|
"guardrails": {
|
||||||
|
"pre_call": ["aporia"],
|
||||||
|
"post_call": ["aporia"]
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
|
<TabItem value="/key/update" label="/key/update">
|
||||||
|
|
||||||
|
```shell
|
||||||
|
curl --location 'http://0.0.0.0:4000/key/update' \
|
||||||
|
--header 'Authorization: Bearer sk-1234' \
|
||||||
|
--header 'Content-Type: application/json' \
|
||||||
|
--data '{
|
||||||
|
"key": "sk-jNm1Zar7XfNdZXp49Z1kSQ",
|
||||||
|
"guardrails": {
|
||||||
|
"pre_call": ["aporia"],
|
||||||
|
"post_call": ["aporia"]
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
</TabItem>
|
||||||
|
</Tabs>
|
||||||
|
|
||||||
|
**Step 2** Test it with new key
|
||||||
|
|
||||||
|
```shell
|
||||||
|
curl --location 'http://0.0.0.0:4000/chat/completions' \
|
||||||
|
--header 'Authorization: Bearer sk-jNm1Zar7XfNdZXp49Z1kSQ' \
|
||||||
|
--header 'Content-Type: application/json' \
|
||||||
|
--data '{
|
||||||
|
"model": "gpt-3.5-turbo",
|
||||||
|
"messages": [
|
||||||
|
{
|
||||||
|
"role": "user",
|
||||||
|
"content": "my email is ishaan@berri.ai"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
BIN
docs/my-website/img/aporia_post.png
Normal file
BIN
docs/my-website/img/aporia_post.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 250 KiB |
BIN
docs/my-website/img/aporia_pre.png
Normal file
BIN
docs/my-website/img/aporia_pre.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 277 KiB |
BIN
docs/my-website/img/aporia_projs.png
Normal file
BIN
docs/my-website/img/aporia_projs.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 153 KiB |
|
@ -248,6 +248,7 @@ const sidebars = {
|
||||||
type: "category",
|
type: "category",
|
||||||
label: "Tutorials",
|
label: "Tutorials",
|
||||||
items: [
|
items: [
|
||||||
|
'tutorials/litellm_proxy_aporia',
|
||||||
'tutorials/azure_openai',
|
'tutorials/azure_openai',
|
||||||
'tutorials/instructor',
|
'tutorials/instructor',
|
||||||
"tutorials/gradio_integration",
|
"tutorials/gradio_integration",
|
||||||
|
|
|
@ -4,8 +4,15 @@ model_list:
|
||||||
model: openai/gpt-3.5-turbo
|
model: openai/gpt-3.5-turbo
|
||||||
api_key: os.environ/OPENAI_API_KEY
|
api_key: os.environ/OPENAI_API_KEY
|
||||||
|
|
||||||
litellm_settings:
|
guardrails:
|
||||||
guardrails:
|
- guardrail_name: prompt_injection_detection
|
||||||
- prompt_injection:
|
litellm_params:
|
||||||
callbacks: [aporio_prompt_injection]
|
guardrail_name: openai/gpt-3.5-turbo
|
||||||
default_on: true
|
api_key: os.environ/OPENAI_API_KEY
|
||||||
|
api_base: os.environ/OPENAI_API_BASE
|
||||||
|
- guardrail_name: prompt_injection_detection
|
||||||
|
litellm_params:
|
||||||
|
guardrail_name: openai/gpt-3.5-turbo
|
||||||
|
api_key: os.environ/OPENAI_API_KEY
|
||||||
|
api_base: os.environ/OPENAI_API_BASE
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue