doc aporia_w_litellm

2024-08-19 14:36:55 -07:00 · 2024-08-19 14:36:55 -07:00 · b7d4031f89
commit b7d4031f89
parent a0361b0e76
7 changed files with 176 additions and 103 deletions
--- a/docs/my-website/docs/tutorials/ab_test_llms.md
+++ b/docs/my-website/docs/tutorials/ab_test_llms.md
@ -1,98 +0,0 @@
 import Image from '@theme/IdealImage';
 # Split traffic betwen GPT-4 and Llama2 in Production!
 In this tutorial, we'll walk through A/B testing between GPT-4 and Llama2 in production. We'll assume you've deployed Llama2 on Huggingface Inference Endpoints (but any of TogetherAI, Baseten, Ollama, Petals, Openrouter should work as well).
 # Relevant Resources: 
 * 🚀 [Your production dashboard!](https://admin.litellm.ai/)
 * [Deploying models on Huggingface](https://huggingface.co/docs/inference-endpoints/guides/create_endpoint)
 * [All supported providers on LiteLLM](https://docs.litellm.ai/docs/providers)
 # Code Walkthrough
 In production, we don't know if Llama2 is going to provide:
 * good results 
 * quickly
 ### 💡 Route 20% traffic to Llama2
 If Llama2 returns poor answers / is extremely slow, we want to roll-back this change, and use GPT-4 instead.
 Instead of routing 100% of our traffic to Llama2, let's **start by routing 20% traffic** to it and see how it does. 
 ```python 
 ## route 20% of responses to Llama2
 split_per_model = {
 	"gpt-4": 0.8, 
 	"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
 }
 ```
 ## 👨‍💻 Complete Code
 ### a) For Local
 If we're testing this in a script - this is what our complete code looks like.
 ```python 
 from litellm import completion_with_split_tests
 import os 
 ## set ENV variables
 os.environ["OPENAI_API_KEY"] = "openai key"
 os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
 ## route 20% of responses to Llama2
 split_per_model = {
 	"gpt-4": 0.8, 
 	"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
 }
 messages = [{ "content": "Hello, how are you?","role": "user"}]
 completion_with_split_tests(
  models=split_per_model, 
  messages=messages, 
 )
 ```
 ### b) For Production
 If we're in production, we don't want to keep going to code to change model/test details (prompt, split%, etc.) for our completion function and redeploying changes. 
 LiteLLM exposes a client dashboard to do this in a UI - and instantly updates our completion function in prod.
 #### Relevant Code 
 ```python
 completion_with_split_tests(..., use_client=True, id="my-unique-id")
 ```
 #### Complete Code
 ```python 
 from litellm import completion_with_split_tests
 import os 
 ## set ENV variables
 os.environ["OPENAI_API_KEY"] = "openai key"
 os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
 ## route 20% of responses to Llama2
 split_per_model = {
 	"gpt-4": 0.8, 
 	"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
 }
 messages = [{ "content": "Hello, how are you?","role": "user"}]
 completion_with_split_tests(
  models=split_per_model, 
  messages=messages, 
  use_client=True, 
  id="my-unique-id" # Auto-create this @ https://admin.litellm.ai/
 )
 ```
--- a/docs/my-website/docs/tutorials/litellm_proxy_aporia.md
+++ b/docs/my-website/docs/tutorials/litellm_proxy_aporia.md
@ -0,0 +1,163 @@
 import Image from '@theme/IdealImage';
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 # Use LiteLLM AI Gateway with Aporia Guardrails
 In this tutorial we will use LiteLLM Proxy with Aporia to detect PII in requests and profanity in responses
 ## 1. Setup guardrails on Aporia
 ### Create Aporia Projects
 Create two projects on [Aporia](https://guardrails.aporia.com/)
 1. Pre LLM API Call - Set all the policies you want to run on pre LLM API call 
 2. Post LLM API Call - Set all the policies you want to run post LLM API call
 <Image img={require('../../img/aporia_projs.png')} />
 ### Pre-Call: Detect PII
 Add the `PII - Prompt` to your Pre LLM API Call project
 <Image img={require('../../img/aporia_pre.png')} />
 ### Post-Call: Detect Profanity in Responses
 Add the `Toxicity - Response` to your Post LLM API Call project
 <Image img={require('../../img/aporia_post.png')} />
 ## 2. Define Guardrails on your LiteLLM config.yaml 
 ```yaml
 model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY
 guardrails:
  pre_call:  # guardrail only runs on input before LLM API call
      guardrail: "aporia"                   # supported values ["aporia", "bedrock", "lakera"]
      api_key: os.environ/APORIA_API_KEY_1
      api_base: os.environ/APORIA_API_BASE_1
  post_call: # guardrail only runs on output after LLM API call
      guardrail: "aporia"                  # supported values ["aporia", "bedrock", "lakera"]
      api_key: os.environ/APORIA_API_KEY_2
      api_base: os.environ/APORIA_API_BASE_2
 ```
 ## 3. Start LiteLLM Gateway 
 ```shell
 litellm --config config.yaml --detailed_debug
 ```
 ## 4. Test request 
 <Tabs>
 <TabItem label="Fails Guardrail" value = "not-allowed">
 Expect this to fail since since `ishaan@berri.ai` in the request is PII
 ```shell
 curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "hi my email is ishaan@berri.ai"}
    ]
  }'
 ```
 </TabItem>
 <TabItem label="Success" value = "allowed">
 ```shell
 curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "user", "content": "hi what is the weather?"}
    ]
  }'
 ```
 </TabItem>
 </Tabs>
 ## Advanced
 ### Control Guardrails per Project (API Key)
 Use this to control what guardrail/s run per project. In this tutorial we only want the following guardrails to run for 1 project
 - pre_call: aporia
 - post_call: aporia
 **Step 1** Create Key with guardrail settings
 <Tabs>
 <TabItem value="/key/generate" label="/key/generate">
 ```shell
 curl -X POST 'http://0.0.0.0:4000/key/generate' \
    -H 'Authorization: Bearer sk-1234' \
    -H 'Content-Type: application/json' \
    -D '{
        "guardrails": {
            "pre_call": ["aporia"],
            "post_call": ["aporia"]
        }
    }'
 ```
 </TabItem>
 <TabItem value="/key/update" label="/key/update">
 ```shell
 curl --location 'http://0.0.0.0:4000/key/update' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
        "key": "sk-jNm1Zar7XfNdZXp49Z1kSQ",
        "guardrails": {
            "pre_call": ["aporia"],
            "post_call": ["aporia"]
        }
 }'
 ```
 </TabItem>
 </Tabs>
 **Step 2** Test it with new key
 ```shell
 curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: Bearer sk-jNm1Zar7XfNdZXp49Z1kSQ' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "my email is ishaan@berri.ai"
        }
    ]
 }'
 ```
--- a/docs/my-website/img/aporia_post.png
+++ b/docs/my-website/img/aporia_post.png
--- a/docs/my-website/img/aporia_pre.png
+++ b/docs/my-website/img/aporia_pre.png
--- a/docs/my-website/img/aporia_projs.png
+++ b/docs/my-website/img/aporia_projs.png
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -248,6 +248,7 @@ const sidebars = {
      type: "category",
      label: "Tutorials",
      items: [
        'tutorials/litellm_proxy_aporia',
        'tutorials/azure_openai',
        'tutorials/instructor',
        "tutorials/gradio_integration",
--- a/litellm/proxy/proxy_config.yaml
+++ b/litellm/proxy/proxy_config.yaml
@ -4,8 +4,15 @@ model_list:
      model: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY
-litellm_settings:
+guardrails:
-  guardrails:
+  - guardrail_name: prompt_injection_detection
-  - prompt_injection:
+    litellm_params:
-      callbacks: [aporio_prompt_injection] 
+      guardrail_name: openai/gpt-3.5-turbo
-      default_on: true
+      api_key: os.environ/OPENAI_API_KEY
      api_base: os.environ/OPENAI_API_BASE
  - guardrail_name: prompt_injection_detection
    litellm_params:
      guardrail_name: openai/gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY
      api_base: os.environ/OPENAI_API_BASE