doc aporia_w_litellm

This commit is contained in:
Ishaan Jaff 2024-08-19 14:36:55 -07:00
parent a0361b0e76
commit b7d4031f89
7 changed files with 176 additions and 103 deletions

View file

@ -1,98 +0,0 @@
import Image from '@theme/IdealImage';
# Split traffic betwen GPT-4 and Llama2 in Production!
In this tutorial, we'll walk through A/B testing between GPT-4 and Llama2 in production. We'll assume you've deployed Llama2 on Huggingface Inference Endpoints (but any of TogetherAI, Baseten, Ollama, Petals, Openrouter should work as well).
# Relevant Resources:
* 🚀 [Your production dashboard!](https://admin.litellm.ai/)
* [Deploying models on Huggingface](https://huggingface.co/docs/inference-endpoints/guides/create_endpoint)
* [All supported providers on LiteLLM](https://docs.litellm.ai/docs/providers)
# Code Walkthrough
In production, we don't know if Llama2 is going to provide:
* good results
* quickly
### 💡 Route 20% traffic to Llama2
If Llama2 returns poor answers / is extremely slow, we want to roll-back this change, and use GPT-4 instead.
Instead of routing 100% of our traffic to Llama2, let's **start by routing 20% traffic** to it and see how it does.
```python
## route 20% of responses to Llama2
split_per_model = {
"gpt-4": 0.8,
"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
}
```
## 👨‍💻 Complete Code
### a) For Local
If we're testing this in a script - this is what our complete code looks like.
```python
from litellm import completion_with_split_tests
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
## route 20% of responses to Llama2
split_per_model = {
"gpt-4": 0.8,
"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
}
messages = [{ "content": "Hello, how are you?","role": "user"}]
completion_with_split_tests(
models=split_per_model,
messages=messages,
)
```
### b) For Production
If we're in production, we don't want to keep going to code to change model/test details (prompt, split%, etc.) for our completion function and redeploying changes.
LiteLLM exposes a client dashboard to do this in a UI - and instantly updates our completion function in prod.
#### Relevant Code
```python
completion_with_split_tests(..., use_client=True, id="my-unique-id")
```
#### Complete Code
```python
from litellm import completion_with_split_tests
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
## route 20% of responses to Llama2
split_per_model = {
"gpt-4": 0.8,
"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
}
messages = [{ "content": "Hello, how are you?","role": "user"}]
completion_with_split_tests(
models=split_per_model,
messages=messages,
use_client=True,
id="my-unique-id" # Auto-create this @ https://admin.litellm.ai/
)
```

View file

@ -0,0 +1,163 @@
import Image from '@theme/IdealImage';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Use LiteLLM AI Gateway with Aporia Guardrails
In this tutorial we will use LiteLLM Proxy with Aporia to detect PII in requests and profanity in responses
## 1. Setup guardrails on Aporia
### Create Aporia Projects
Create two projects on [Aporia](https://guardrails.aporia.com/)
1. Pre LLM API Call - Set all the policies you want to run on pre LLM API call
2. Post LLM API Call - Set all the policies you want to run post LLM API call
<Image img={require('../../img/aporia_projs.png')} />
### Pre-Call: Detect PII
Add the `PII - Prompt` to your Pre LLM API Call project
<Image img={require('../../img/aporia_pre.png')} />
### Post-Call: Detect Profanity in Responses
Add the `Toxicity - Response` to your Post LLM API Call project
<Image img={require('../../img/aporia_post.png')} />
## 2. Define Guardrails on your LiteLLM config.yaml
```yaml
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
guardrails:
pre_call: # guardrail only runs on input before LLM API call
guardrail: "aporia" # supported values ["aporia", "bedrock", "lakera"]
api_key: os.environ/APORIA_API_KEY_1
api_base: os.environ/APORIA_API_BASE_1
post_call: # guardrail only runs on output after LLM API call
guardrail: "aporia" # supported values ["aporia", "bedrock", "lakera"]
api_key: os.environ/APORIA_API_KEY_2
api_base: os.environ/APORIA_API_BASE_2
```
## 3. Start LiteLLM Gateway
```shell
litellm --config config.yaml --detailed_debug
```
## 4. Test request
<Tabs>
<TabItem label="Fails Guardrail" value = "not-allowed">
Expect this to fail since since `ishaan@berri.ai` in the request is PII
```shell
curl -i http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "hi my email is ishaan@berri.ai"}
]
}'
```
</TabItem>
<TabItem label="Success" value = "allowed">
```shell
curl -i http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "hi what is the weather?"}
]
}'
```
</TabItem>
</Tabs>
## Advanced
### Control Guardrails per Project (API Key)
Use this to control what guardrail/s run per project. In this tutorial we only want the following guardrails to run for 1 project
- pre_call: aporia
- post_call: aporia
**Step 1** Create Key with guardrail settings
<Tabs>
<TabItem value="/key/generate" label="/key/generate">
```shell
curl -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-D '{
"guardrails": {
"pre_call": ["aporia"],
"post_call": ["aporia"]
}
}'
```
</TabItem>
<TabItem value="/key/update" label="/key/update">
```shell
curl --location 'http://0.0.0.0:4000/key/update' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"key": "sk-jNm1Zar7XfNdZXp49Z1kSQ",
"guardrails": {
"pre_call": ["aporia"],
"post_call": ["aporia"]
}
}'
```
</TabItem>
</Tabs>
**Step 2** Test it with new key
```shell
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-jNm1Zar7XfNdZXp49Z1kSQ' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "my email is ishaan@berri.ai"
}
]
}'
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 250 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 277 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 153 KiB

View file

@ -248,6 +248,7 @@ const sidebars = {
type: "category", type: "category",
label: "Tutorials", label: "Tutorials",
items: [ items: [
'tutorials/litellm_proxy_aporia',
'tutorials/azure_openai', 'tutorials/azure_openai',
'tutorials/instructor', 'tutorials/instructor',
"tutorials/gradio_integration", "tutorials/gradio_integration",

View file

@ -4,8 +4,15 @@ model_list:
model: openai/gpt-3.5-turbo model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY api_key: os.environ/OPENAI_API_KEY
litellm_settings: guardrails:
guardrails: - guardrail_name: prompt_injection_detection
- prompt_injection: litellm_params:
callbacks: [aporio_prompt_injection] guardrail_name: openai/gpt-3.5-turbo
default_on: true api_key: os.environ/OPENAI_API_KEY
api_base: os.environ/OPENAI_API_BASE
- guardrail_name: prompt_injection_detection
litellm_params:
guardrail_name: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
api_base: os.environ/OPENAI_API_BASE