import Image from '@theme/IdealImage';

# Split traffic betwen GPT-4 and Llama2 in Production!
In this tutorial, we'll walk through A/B testing between GPT-4 and Llama2 in production. We'll assume you've deployed Llama2 on Huggingface Inference Endpoints (but any of TogetherAI, Baseten, Ollama, Petals, Openrouter should work as well).


# Relevant Resources: 

* 🚀 [Your production dashboard!](https://admin.litellm.ai/)


* [Deploying models on Huggingface](https://huggingface.co/docs/inference-endpoints/guides/create_endpoint)
* [All supported providers on LiteLLM](https://docs.litellm.ai/docs/providers)

# Code Walkthrough

In production, we don't know if Llama2 is going to provide:
* good results 
* quickly

### 💡 Route 20% traffic to Llama2
If Llama2 returns poor answers / is extremely slow, we want to roll-back this change, and use GPT-4 instead.

Instead of routing 100% of our traffic to Llama2, let's **start by routing 20% traffic** to it and see how it does. 

```python 
## route 20% of responses to Llama2
split_per_model = {
	"gpt-4": 0.8, 
	"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
}
```

## 👨‍💻 Complete Code

### a) For Local
If we're testing this in a script - this is what our complete code looks like.
```python 
from litellm import completion_with_split_tests
import os 

## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"

## route 20% of responses to Llama2
split_per_model = {
	"gpt-4": 0.8, 
	"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
}

messages = [{ "content": "Hello, how are you?","role": "user"}]

completion_with_split_tests(
  models=split_per_model, 
  messages=messages, 
)
```

### b) For Production

If we're in production, we don't want to keep going to code to change model/test details (prompt, split%, etc.) for our completion function and redeploying changes. 

LiteLLM exposes a client dashboard to do this in a UI - and instantly updates our completion function in prod.

#### Relevant Code 

```python
completion_with_split_tests(..., use_client=True, id="my-unique-id")
```

#### Complete Code

```python 
from litellm import completion_with_split_tests
import os 

## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["HUGGINGFACE_API_KEY"] = "huggingface key"

## route 20% of responses to Llama2
split_per_model = {
	"gpt-4": 0.8, 
	"huggingface/https://my-unique-endpoint.us-east-1.aws.endpoints.huggingface.cloud": 0.2
}

messages = [{ "content": "Hello, how are you?","role": "user"}]

completion_with_split_tests(
  models=split_per_model, 
  messages=messages, 
  use_client=True, 
  id="my-unique-id" # Auto-create this @ https://admin.litellm.ai/
)
```