update tutorial

2023-09-02 10:50:44 -07:00 · 2023-09-02 10:50:44 -07:00 · d4d720010d
commit d4d720010d
parent 0a8aebc0de
2 changed files with 103 additions and 113 deletions
--- a/docs/my-website/docs/tutorials/ab_test_llms.md
+++ b/docs/my-website/docs/tutorials/ab_test_llms.md
@ -1,134 +1,123 @@
 import Image from '@theme/IdealImage';
-# A/B Test LLMs - In Production!
+# Replace GPT-4 with Llama2 in Production!
-
+In this tutorial, we'll walk through replacing our GPT-4 endpoint with Llama2 in production. We'll assume you've deployed Llama2 on Huggingface Inference Endpoints (but any of TogetherAI, Baseten, Ollama, Petals, Openrouter should work as well).
 ## Get Started here: https://admin.litellm.ai/
 ### A/B Test any LLMs in production in 10 lines of code
 <Image img={require('../../img/ab_test_code.png')} />
 ### A/B Testing Dashboard after running code
 <Image img={require('../../img/ab_test_logs.png')} />
-<!-- 
+# Relevant Links: 
-Resources: 
+
-* [Code](https://github.com/BerriAI/litellm/tree/main/cookbook/llm-ab-test-server)
+* 🚀 [Your production dashboard!](https://admin.litellm.ai/)
-* [Sample Dashboard](https://lite-llm-abtest-ui.vercel.app/ishaan_discord@berri.ai)
+
 * [Deploying models on Huggingface](https://huggingface.co/docs/inference-endpoints/guides/create_endpoint)
 * [All supported providers on LiteLLM](https://docs.litellm.ai/docs/completion/supported)
 # Code Walkthrough
 This is the main piece of code that we'll write to handle our A/B test logic. We'll cover specific details in [Setup](#setup)
 ### Define LLMs with their A/B test ratios
 In main.py set select the LLMs you want to AB test in `llm_dict` (and remember to set their API keys in the .env)!
-We support 5+ providers and 100+ LLMs: https://docs.litellm.ai/docs/completion/supported
+## 1. Replace GPT-4 with Llama2 
 LiteLLM is a *drop-in replacement* for the OpenAI python sdk, so let's replace our openai ChatCompletion call with a LiteLLM completion call. 
-```python
+### a) Replace Openai
-llm_dict = {
+Replace this
-    "gpt-4": 0.2,
+```python  
-    "together_ai/togethercomputer/llama-2-70b-chat": 0.4,
+openai.ChatCompletion.create(model="gpt-4", messages=messages)
-    "claude-2": 0.2,
+```
-    "claude-1.2": 0.2
+
 With this
 ```python  
 from litellm import completion
 completion(model="gpt-4", messages=messages)
 ```
 ### b) Replace GPT-4
 Assume Llama2 is deployed at this endpoint: "https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud" on Huggingface. 
 ```python  
 from litellm import completion
 completion(model="huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud", messages=messages)
 ```
 ## 2. 😱 But what if Llama2 isn't good enough?
 In production, we don't know if Llama2 is going to provide:
 * good results 
 * quickly
 ### 💡 Split traffic b/w GPT-4 + Llama2
 If Llama2 returns poor answers / is extremely slow, we want to roll-back this change, and use GPT-4 instead.
 Instead of routing 100% of our traffic to Llama2, let's **start by just routing 20% traffic** to it and see how it does. 
 ```python 
 ## route 20% of responses to Llama2
 split_per_model = {
 	"gpt-4": 0.8, 
 	"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
 }
 ```
-### Select LLM + Make Completion call
+## 3. Complete Code
 Call the model using litellm.completion_with_split_tests, this uses the weights passed in to randomly select one of your provided models. [See implementation code](https://github.com/BerriAI/litellm/blob/9ccdbcbd6f14dd18827f59f8a1f9fd52d70443bb/litellm/utils.py#L1928)
-```python
+### a) For Local
 This is what our complete code looks like.
 ```python 
 from litellm import completion_with_split_tests
 import os 
-response = completion_with_split_tests(model=llm_dict, messages=[{ "content": "Hello, how are you?","role": "user"}])
+## set ENV variables
 os.environ["OPENAI_API_KEY"] = "openai key"
 os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
 ## route 20% of responses to Llama2
 split_per_model = {
 	"gpt-4": 0.8, 
 	"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
 }
 messages = [{ "content": "Hello, how are you?","role": "user"}]
 completion_with_split_tests(
  models=split_per_model, 
  messages=messages, 
 )
 ```
-### Viewing Logs, Feedback 
+### b) For Production
-In order to view logs set `litellm.token=<your-email>`
+
 If we're in production, we don't want to keep going to code to change model/test details (prompt, split%, etc.) and redeploying changes. 
 LiteLLM exposes a client dashboard to do this in a UI - and instantly updates your test config in prod.
 #### Relevant Code 
 ```python
-import litellm
+completion_with_split_tests(..., use_client=True, id="my-unique-id")
 litellm.token='ishaan_discord@berri.ai'
 ```
-Here is what your logs dashboard looks like:
+#### Complete Code
 ```python 
 from litellm import completion_with_split_tests
 import os 
 ## set ENV variables
 os.environ["OPENAI_API_KEY"] = "openai key"
 os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"
 ## route 20% of responses to Llama2
 split_per_model = {
 	"gpt-4": 0.8, 
 	"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
 }
 messages = [{ "content": "Hello, how are you?","role": "user"}]
 completion_with_split_tests(
  models=split_per_model, 
  messages=messages, 
  use_client=True, 
  id="my-unique-id" # Auto-create this @ https://admin.litellm.ai/
 )
 ```
 ### A/B Testing Dashboard after running code - https://admin.litellm.ai/
 <Image img={require('../../img/ab_test_logs.png')} />
 Your logs will be available at: 
 https://lite-llm-abtest-nckmhi7ue-clerkieai.vercel.app/your-token
 ### Live Demo UI
 👉https://lite-llm-abtest-nckmhi7ue-clerkieai.vercel.app/ishaan_discord@berri.ai
 ## Viewing Responses + Custom Scores 
 LiteLLM UI allows you to view responses and set custom scores for each response
 ## Setup
 ### Install LiteLLM
 ```
 pip install litellm
 ```
 ### Clone LiteLLM Git Repo
 ```
 git clone https://github.com/BerriAI/litellm/
 ```
 ### Navigate to LiteLLM-A/B Test Server
 ```
 cd litellm/cookbook/llm-ab-test-server
 ```
 ### Run the Server 
 ```
 python3 main.py
 ```
 ## Testing our Server
 The server follows the Input/Output format set by the OpenAI Chat Completions API
 Here is an example request made the LiteLLM Server
 ### Python
 ```python
 import requests
 import json
 url = "http://localhost:5000/chat/completions"
 payload = json.dumps({
  "messages": [
    {
      "content": "who is CTO of litellm",
      "role": "user"
    }
  ]
 })
 headers = {
  'Content-Type': 'application/json'
 }
 response = requests.request("POST", url, headers=headers, data=payload)
 print(response.text)
 ```
 ### Curl Command
 ```
 curl --location 'http://localhost:5000/chat/completions' \
 --header 'Content-Type: application/json' \
 --data '{
    "messages": [
                    { 
                        "content": "who is CTO of litellm",
                        "role": "user"
                    }
                ]
 }
 '
 ```
 -->
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -49,6 +49,7 @@ const sidebars = {
      type: 'category',
      label: 'Tutorials',
      items: [
        'tutorials/ab_test_llms_locally',
        'tutorials/huggingface_tutorial', 
        'tutorials/TogetherAI_liteLLM', 
        'tutorials/compare_llms',