litellm-mirror/docs/my-website/docs/providers/azure.md

4.2 KiB

Azure OpenAI

API KEYS

import os
os.environ["AZURE_API_KEY"] = ""
os.environ["AZURE_API_BASE"] = ""
os.environ["AZURE_API_VERSION"] = ""

Usage

Open In Colab

Completion - using .env variables

from litellm import completion

## set ENV variables
os.environ["AZURE_API_KEY"] = ""
os.environ["AZURE_API_BASE"] = ""
os.environ["AZURE_API_VERSION"] = ""

# azure call
response = completion(
    model = "azure/<your_deployment_name>", 
    messages = [{ "content": "Hello, how are you?","role": "user"}]
)

Completion - using api_key, api_base, api_version

import litellm

# azure call
response = litellm.completion(
    model = "azure/<your deployment name>",             # model = azure/<your deployment name> 
    api_base = "",                                      # azure api base
    api_version = "",                                   # azure api version
    api_key = "",                                       # azure api key
    messages = [{"role": "user", "content": "good morning"}],
)

Azure OpenAI Chat Completion Models

Model Name Function Call
gpt-4 completion('azure/<your deployment name>', messages)
gpt-4-0314 completion('azure/<your deployment name>', messages)
gpt-4-0613 completion('azure/<your deployment name>', messages)
gpt-4-32k completion('azure/<your deployment name>', messages)
gpt-4-32k-0314 completion('azure/<your deployment name>', messages)
gpt-4-32k-0613 completion('azure/<your deployment name>', messages)
gpt-3.5-turbo completion('azure/<your deployment name>', messages)
gpt-3.5-turbo-0301 completion('azure/<your deployment name>', messages)
gpt-3.5-turbo-0613 completion('azure/<your deployment name>', messages)
gpt-3.5-turbo-16k completion('azure/<your deployment name>', messages)
gpt-3.5-turbo-16k-0613 completion('azure/<your deployment name>', messages)

Azure API Load-Balancing

Use this if you're trying to load-balance across multiple Azure/OpenAI deployments.

Router prevents failed requests, by picking the deployment which is below rate-limit and has the least amount of tokens used.

In production, Router connects to a Redis Cache to track usage across multiple deployments.

Quick Start

pip install litellm
from litellm import Router

model_list = [{ # list of model deployments 
	"model_name": "gpt-3.5-turbo", # openai model name 
	"litellm_params": { # params for litellm completion/embedding call 
		"model": "azure/chatgpt-v-2", 
		"api_key": os.getenv("AZURE_API_KEY"),
		"api_version": os.getenv("AZURE_API_VERSION"),
		"api_base": os.getenv("AZURE_API_BASE")
	},
	"tpm": 240000,
	"rpm": 1800
}, {
    "model_name": "gpt-3.5-turbo", # openai model name 
	"litellm_params": { # params for litellm completion/embedding call 
		"model": "azure/chatgpt-functioncalling", 
		"api_key": os.getenv("AZURE_API_KEY"),
		"api_version": os.getenv("AZURE_API_VERSION"),
		"api_base": os.getenv("AZURE_API_BASE")
	},
	"tpm": 240000,
	"rpm": 1800
}, {
    "model_name": "gpt-3.5-turbo", # openai model name 
	"litellm_params": { # params for litellm completion/embedding call 
		"model": "gpt-3.5-turbo", 
		"api_key": os.getenv("OPENAI_API_KEY"),
	},
	"tpm": 1000000,
	"rpm": 9000
}]

router = Router(model_list=model_list)

# openai.chat.completions.create replacement
response = router.completion(model="gpt-3.5-turbo", 
				messages=[{"role": "user", "content": "Hey, how's it going?"}]

print(response)

Redis Queue

router = Router(model_list=model_list, 
                redis_host=os.getenv("REDIS_HOST"), 
                redis_password=os.getenv("REDIS_PASSWORD"), 
                redis_port=os.getenv("REDIS_PORT"))

print(response)