(cookbook) load test router

This commit is contained in:
ishaan-jaff 2023-11-20 15:43:15 -08:00
parent 6bb4d086c8
commit 356332d0da
3 changed files with 158 additions and 0 deletions

View file

@ -0,0 +1,43 @@
Given this context, what is litellm? LiteLLM about: About
Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs). LiteLLM manages
Translating inputs to the provider's completion and embedding endpoints
Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
10/05/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more
10/16/2023: Self-hosted OpenAI-proxy server Learn more
Usage (Docs)
Important
LiteLLM v1.0.0 is being launched to require openai>=1.0.0. Track this here
Open In Colab
pip install litellm
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)
Streaming (Docs)
liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
print(chunk['choices'][0]['delta'])
# claude 2
result = completion('claude-2', messages, stream=True)
for chunk in result:
print(chunk['choices'][0]['delta'])

View file

@ -0,0 +1,65 @@
Does litellm support ooobagooba llms? how can i call oobagooba llms. Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs). LiteLLM manages
Translating inputs to the provider's completion and embedding endpoints
Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
10/05/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more
10/16/2023: Self-hosted OpenAI-proxy server Learn more
Usage (Docs)
Important
LiteLLM v1.0.0 is being launched to require openai>=1.0.0. Track this here
Open In Colab
pip install litellm
from litellm import completion
import os
## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["COHERE_API_KEY"] = "your-cohere-key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)
Streaming (Docs)
liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
from litellm import completion
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
print(chunk['choices'][0]['delta'])
# claude 2
result = completion('claude-2', messages, stream=True)
for chunk in result:
print(chunk['choices'][0]['delta']) Supported LiteLLM providers Supported Provider (Docs)
Provider Completion Streaming Async Completion Async Streaming
openai ✅ ✅ ✅ ✅
azure ✅ ✅ ✅ ✅
aws - sagemaker ✅ ✅ ✅ ✅
aws - bedrock ✅ ✅ ✅ ✅
cohere ✅ ✅ ✅ ✅
anthropic ✅ ✅ ✅ ✅
huggingface ✅ ✅ ✅ ✅
replicate ✅ ✅ ✅ ✅
together_ai ✅ ✅ ✅ ✅
openrouter ✅ ✅ ✅ ✅
google - vertex_ai ✅ ✅ ✅ ✅
google - palm ✅ ✅ ✅ ✅
ai21 ✅ ✅ ✅ ✅
baseten ✅ ✅ ✅ ✅
vllm ✅ ✅ ✅ ✅
nlp_cloud ✅ ✅ ✅ ✅
aleph alpha ✅ ✅ ✅ ✅
petals ✅ ✅ ✅ ✅
ollama ✅ ✅ ✅ ✅
deepinfra ✅ ✅ ✅ ✅
perplexity-ai ✅ ✅ ✅ ✅
anyscale ✅ ✅ ✅ ✅

View file

@ -0,0 +1,50 @@
What endpoints does the litellm proxy have 💥 OpenAI Proxy Server
LiteLLM Server manages:
Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
Set custom prompt templates + model-specific configs (temperature, max_tokens, etc.)
Quick Start
View all the supported args for the Proxy CLI here
$ litellm --model huggingface/bigcode/starcoder
#INFO: Proxy running on http://0.0.0.0:8000
Test
In a new shell, run, this will make an openai.ChatCompletion request
litellm --test
This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.
Replace openai base
import openai
openai.api_base = "http://0.0.0.0:8000"
print(openai.chat.completions.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))
Supported LLMs
Bedrock
Huggingface (TGI)
Anthropic
VLLM
OpenAI Compatible Server
TogetherAI
Replicate
Petals
Palm
Azure OpenAI
AI21
Cohere
$ export AWS_ACCESS_KEY_ID=""
$ export AWS_REGION_NAME="" # e.g. us-west-2
$ export AWS_SECRET_ACCESS_KEY=""
$ litellm --model bedrock/anthropic.claude-v2
Server Endpoints
POST /chat/completions - chat completions endpoint to call 100+ LLMs
POST /completions - completions endpoint
POST /embeddings - embedding endpoint for Azure, OpenAI, Huggingface endpoints
GET /models - available models on server