forked from phoenix/litellm-mirror
doc aporia_w_litellm
This commit is contained in:
parent
a0361b0e76
commit
b7d4031f89
7 changed files with 176 additions and 103 deletions
|
@ -1,98 +0,0 @@
|
|||
import Image from '@theme/IdealImage';
|
||||
|
||||
# Split traffic betwen GPT-4 and Llama2 in Production!
|
||||
In this tutorial, we'll walk through A/B testing between GPT-4 and Llama2 in production. We'll assume you've deployed Llama2 on Huggingface Inference Endpoints (but any of TogetherAI, Baseten, Ollama, Petals, Openrouter should work as well).
|
||||
|
||||
|
||||
# Relevant Resources:
|
||||
|
||||
* 🚀 [Your production dashboard!](https://admin.litellm.ai/)
|
||||
|
||||
|
||||
* [Deploying models on Huggingface](https://huggingface.co/docs/inference-endpoints/guides/create_endpoint)
|
||||
* [All supported providers on LiteLLM](https://docs.litellm.ai/docs/providers)
|
||||
|
||||
# Code Walkthrough
|
||||
|
||||
In production, we don't know if Llama2 is going to provide:
|
||||
* good results
|
||||
* quickly
|
||||
|
||||
### 💡 Route 20% traffic to Llama2
|
||||
If Llama2 returns poor answers / is extremely slow, we want to roll-back this change, and use GPT-4 instead.
|
||||
|
||||
Instead of routing 100% of our traffic to Llama2, let's **start by routing 20% traffic** to it and see how it does.
|
||||
|
||||
```python
|
||||
## route 20% of responses to Llama2
|
||||
split_per_model = {
|
||||
"gpt-4": 0.8,
|
||||
"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
|
||||
}
|
||||
```
|
||||
|
||||
## 👨💻 Complete Code
|
||||
|
||||
### a) For Local
|
||||
If we're testing this in a script - this is what our complete code looks like.
|
||||
```python
|
||||
from litellm import completion_with_split_tests
|
||||
import os
|
||||
|
||||
## set ENV variables
|
||||
os.environ["OPENAI_API_KEY"] = "openai key"
|
||||
os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
|
||||
|
||||
## route 20% of responses to Llama2
|
||||
split_per_model = {
|
||||
"gpt-4": 0.8,
|
||||
"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
|
||||
}
|
||||
|
||||
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
||||
|
||||
completion_with_split_tests(
|
||||
models=split_per_model,
|
||||
messages=messages,
|
||||
)
|
||||
```
|
||||
|
||||
### b) For Production
|
||||
|
||||
If we're in production, we don't want to keep going to code to change model/test details (prompt, split%, etc.) for our completion function and redeploying changes.
|
||||
|
||||
LiteLLM exposes a client dashboard to do this in a UI - and instantly updates our completion function in prod.
|
||||
|
||||
#### Relevant Code
|
||||
|
||||
```python
|
||||
completion_with_split_tests(..., use_client=True, id="my-unique-id")
|
||||
```
|
||||
|
||||
#### Complete Code
|
||||
|
||||
```python
|
||||
from litellm import completion_with_split_tests
|
||||
import os
|
||||
|
||||
## set ENV variables
|
||||
os.environ["OPENAI_API_KEY"] = "openai key"
|
||||
os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
|
||||
|
||||
## route 20% of responses to Llama2
|
||||
split_per_model = {
|
||||
"gpt-4": 0.8,
|
||||
"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
|
||||
}
|
||||
|
||||
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
||||
|
||||
completion_with_split_tests(
|
||||
models=split_per_model,
|
||||
messages=messages,
|
||||
use_client=True,
|
||||
id="my-unique-id" # Auto-create this @ https://admin.litellm.ai/
|
||||
)
|
||||
```
|
||||
|
||||
|
163
docs/my-website/docs/tutorials/litellm_proxy_aporia.md
Normal file
163
docs/my-website/docs/tutorials/litellm_proxy_aporia.md
Normal file
|
@ -0,0 +1,163 @@
|
|||
import Image from '@theme/IdealImage';
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
# Use LiteLLM AI Gateway with Aporia Guardrails
|
||||
|
||||
In this tutorial we will use LiteLLM Proxy with Aporia to detect PII in requests and profanity in responses
|
||||
|
||||
## 1. Setup guardrails on Aporia
|
||||
|
||||
### Create Aporia Projects
|
||||
|
||||
Create two projects on [Aporia](https://guardrails.aporia.com/)
|
||||
|
||||
1. Pre LLM API Call - Set all the policies you want to run on pre LLM API call
|
||||
2. Post LLM API Call - Set all the policies you want to run post LLM API call
|
||||
|
||||
|
||||
<Image img={require('../../img/aporia_projs.png')} />
|
||||
|
||||
|
||||
### Pre-Call: Detect PII
|
||||
|
||||
Add the `PII - Prompt` to your Pre LLM API Call project
|
||||
|
||||
<Image img={require('../../img/aporia_pre.png')} />
|
||||
|
||||
### Post-Call: Detect Profanity in Responses
|
||||
|
||||
Add the `Toxicity - Response` to your Post LLM API Call project
|
||||
|
||||
<Image img={require('../../img/aporia_post.png')} />
|
||||
|
||||
|
||||
## 2. Define Guardrails on your LiteLLM config.yaml
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: gpt-3.5-turbo
|
||||
litellm_params:
|
||||
model: openai/gpt-3.5-turbo
|
||||
api_key: os.environ/OPENAI_API_KEY
|
||||
|
||||
guardrails:
|
||||
pre_call: # guardrail only runs on input before LLM API call
|
||||
guardrail: "aporia" # supported values ["aporia", "bedrock", "lakera"]
|
||||
api_key: os.environ/APORIA_API_KEY_1
|
||||
api_base: os.environ/APORIA_API_BASE_1
|
||||
post_call: # guardrail only runs on output after LLM API call
|
||||
guardrail: "aporia" # supported values ["aporia", "bedrock", "lakera"]
|
||||
api_key: os.environ/APORIA_API_KEY_2
|
||||
api_base: os.environ/APORIA_API_BASE_2
|
||||
```
|
||||
|
||||
## 3. Start LiteLLM Gateway
|
||||
|
||||
|
||||
```shell
|
||||
litellm --config config.yaml --detailed_debug
|
||||
```
|
||||
|
||||
## 4. Test request
|
||||
|
||||
<Tabs>
|
||||
<TabItem label="Fails Guardrail" value = "not-allowed">
|
||||
|
||||
Expect this to fail since since `ishaan@berri.ai` in the request is PII
|
||||
|
||||
```shell
|
||||
curl -i http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "gpt-3.5-turbo",
|
||||
"messages": [
|
||||
{"role": "user", "content": "hi my email is ishaan@berri.ai"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem label="Success" value = "allowed">
|
||||
|
||||
```shell
|
||||
curl -i http://localhost:4000/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer sk-1234" \
|
||||
-d '{
|
||||
"model": "gpt-3.5-turbo",
|
||||
"messages": [
|
||||
{"role": "user", "content": "hi what is the weather?"}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
|
||||
</Tabs>
|
||||
|
||||
## Advanced
|
||||
### Control Guardrails per Project (API Key)
|
||||
|
||||
Use this to control what guardrail/s run per project. In this tutorial we only want the following guardrails to run for 1 project
|
||||
- pre_call: aporia
|
||||
- post_call: aporia
|
||||
|
||||
**Step 1** Create Key with guardrail settings
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="/key/generate" label="/key/generate">
|
||||
|
||||
```shell
|
||||
curl -X POST 'http://0.0.0.0:4000/key/generate' \
|
||||
-H 'Authorization: Bearer sk-1234' \
|
||||
-H 'Content-Type: application/json' \
|
||||
-D '{
|
||||
"guardrails": {
|
||||
"pre_call": ["aporia"],
|
||||
"post_call": ["aporia"]
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="/key/update" label="/key/update">
|
||||
|
||||
```shell
|
||||
curl --location 'http://0.0.0.0:4000/key/update' \
|
||||
--header 'Authorization: Bearer sk-1234' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data '{
|
||||
"key": "sk-jNm1Zar7XfNdZXp49Z1kSQ",
|
||||
"guardrails": {
|
||||
"pre_call": ["aporia"],
|
||||
"post_call": ["aporia"]
|
||||
}
|
||||
}'
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
**Step 2** Test it with new key
|
||||
|
||||
```shell
|
||||
curl --location 'http://0.0.0.0:4000/chat/completions' \
|
||||
--header 'Authorization: Bearer sk-jNm1Zar7XfNdZXp49Z1kSQ' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data '{
|
||||
"model": "gpt-3.5-turbo",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "my email is ishaan@berri.ai"
|
||||
}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
|
||||
|
BIN
docs/my-website/img/aporia_post.png
Normal file
BIN
docs/my-website/img/aporia_post.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 250 KiB |
BIN
docs/my-website/img/aporia_pre.png
Normal file
BIN
docs/my-website/img/aporia_pre.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 277 KiB |
BIN
docs/my-website/img/aporia_projs.png
Normal file
BIN
docs/my-website/img/aporia_projs.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 153 KiB |
|
@ -248,6 +248,7 @@ const sidebars = {
|
|||
type: "category",
|
||||
label: "Tutorials",
|
||||
items: [
|
||||
'tutorials/litellm_proxy_aporia',
|
||||
'tutorials/azure_openai',
|
||||
'tutorials/instructor',
|
||||
"tutorials/gradio_integration",
|
||||
|
|
|
@ -4,8 +4,15 @@ model_list:
|
|||
model: openai/gpt-3.5-turbo
|
||||
api_key: os.environ/OPENAI_API_KEY
|
||||
|
||||
litellm_settings:
|
||||
guardrails:
|
||||
- prompt_injection:
|
||||
callbacks: [aporio_prompt_injection]
|
||||
default_on: true
|
||||
guardrails:
|
||||
- guardrail_name: prompt_injection_detection
|
||||
litellm_params:
|
||||
guardrail_name: openai/gpt-3.5-turbo
|
||||
api_key: os.environ/OPENAI_API_KEY
|
||||
api_base: os.environ/OPENAI_API_BASE
|
||||
- guardrail_name: prompt_injection_detection
|
||||
litellm_params:
|
||||
guardrail_name: openai/gpt-3.5-turbo
|
||||
api_key: os.environ/OPENAI_API_KEY
|
||||
api_base: os.environ/OPENAI_API_BASE
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue