mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-24 10:14:26 +00:00
(cookbook) load test router
This commit is contained in:
parent
6bb4d086c8
commit
356332d0da
3 changed files with 158 additions and 0 deletions
43
cookbook/litellm_router/test_questions/question1.txt
Normal file
43
cookbook/litellm_router/test_questions/question1.txt
Normal file
|
@ -0,0 +1,43 @@
|
|||
Given this context, what is litellm? LiteLLM about: About
|
||||
Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs). LiteLLM manages
|
||||
|
||||
Translating inputs to the provider's completion and embedding endpoints
|
||||
Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
|
||||
Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
|
||||
10/05/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more
|
||||
10/16/2023: Self-hosted OpenAI-proxy server Learn more
|
||||
|
||||
Usage (Docs)
|
||||
Important
|
||||
LiteLLM v1.0.0 is being launched to require openai>=1.0.0. Track this here
|
||||
|
||||
Open In Colab
|
||||
pip install litellm
|
||||
from litellm import completion
|
||||
import os
|
||||
|
||||
## set ENV variables
|
||||
os.environ["OPENAI_API_KEY"] = "your-openai-key"
|
||||
os.environ["COHERE_API_KEY"] = "your-cohere-key"
|
||||
|
||||
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
||||
|
||||
# openai call
|
||||
response = completion(model="gpt-3.5-turbo", messages=messages)
|
||||
|
||||
# cohere call
|
||||
response = completion(model="command-nightly", messages=messages)
|
||||
print(response)
|
||||
Streaming (Docs)
|
||||
liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
|
||||
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
|
||||
|
||||
from litellm import completion
|
||||
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
|
||||
for chunk in response:
|
||||
print(chunk['choices'][0]['delta'])
|
||||
|
||||
# claude 2
|
||||
result = completion('claude-2', messages, stream=True)
|
||||
for chunk in result:
|
||||
print(chunk['choices'][0]['delta'])
|
65
cookbook/litellm_router/test_questions/question2.txt
Normal file
65
cookbook/litellm_router/test_questions/question2.txt
Normal file
|
@ -0,0 +1,65 @@
|
|||
Does litellm support ooobagooba llms? how can i call oobagooba llms. Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs). LiteLLM manages
|
||||
|
||||
Translating inputs to the provider's completion and embedding endpoints
|
||||
Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
|
||||
Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
|
||||
10/05/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more
|
||||
10/16/2023: Self-hosted OpenAI-proxy server Learn more
|
||||
|
||||
Usage (Docs)
|
||||
Important
|
||||
LiteLLM v1.0.0 is being launched to require openai>=1.0.0. Track this here
|
||||
|
||||
Open In Colab
|
||||
pip install litellm
|
||||
from litellm import completion
|
||||
import os
|
||||
|
||||
## set ENV variables
|
||||
os.environ["OPENAI_API_KEY"] = "your-openai-key"
|
||||
os.environ["COHERE_API_KEY"] = "your-cohere-key"
|
||||
|
||||
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
||||
|
||||
# openai call
|
||||
response = completion(model="gpt-3.5-turbo", messages=messages)
|
||||
|
||||
# cohere call
|
||||
response = completion(model="command-nightly", messages=messages)
|
||||
print(response)
|
||||
Streaming (Docs)
|
||||
liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
|
||||
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
|
||||
|
||||
from litellm import completion
|
||||
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
|
||||
for chunk in response:
|
||||
print(chunk['choices'][0]['delta'])
|
||||
|
||||
# claude 2
|
||||
result = completion('claude-2', messages, stream=True)
|
||||
for chunk in result:
|
||||
print(chunk['choices'][0]['delta']) Supported LiteLLM providers Supported Provider (Docs)
|
||||
Provider Completion Streaming Async Completion Async Streaming
|
||||
openai ✅ ✅ ✅ ✅
|
||||
azure ✅ ✅ ✅ ✅
|
||||
aws - sagemaker ✅ ✅ ✅ ✅
|
||||
aws - bedrock ✅ ✅ ✅ ✅
|
||||
cohere ✅ ✅ ✅ ✅
|
||||
anthropic ✅ ✅ ✅ ✅
|
||||
huggingface ✅ ✅ ✅ ✅
|
||||
replicate ✅ ✅ ✅ ✅
|
||||
together_ai ✅ ✅ ✅ ✅
|
||||
openrouter ✅ ✅ ✅ ✅
|
||||
google - vertex_ai ✅ ✅ ✅ ✅
|
||||
google - palm ✅ ✅ ✅ ✅
|
||||
ai21 ✅ ✅ ✅ ✅
|
||||
baseten ✅ ✅ ✅ ✅
|
||||
vllm ✅ ✅ ✅ ✅
|
||||
nlp_cloud ✅ ✅ ✅ ✅
|
||||
aleph alpha ✅ ✅ ✅ ✅
|
||||
petals ✅ ✅ ✅ ✅
|
||||
ollama ✅ ✅ ✅ ✅
|
||||
deepinfra ✅ ✅ ✅ ✅
|
||||
perplexity-ai ✅ ✅ ✅ ✅
|
||||
anyscale ✅ ✅ ✅ ✅
|
50
cookbook/litellm_router/test_questions/question3.txt
Normal file
50
cookbook/litellm_router/test_questions/question3.txt
Normal file
|
@ -0,0 +1,50 @@
|
|||
What endpoints does the litellm proxy have 💥 OpenAI Proxy Server
|
||||
LiteLLM Server manages:
|
||||
|
||||
Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
|
||||
Set custom prompt templates + model-specific configs (temperature, max_tokens, etc.)
|
||||
Quick Start
|
||||
View all the supported args for the Proxy CLI here
|
||||
|
||||
$ litellm --model huggingface/bigcode/starcoder
|
||||
|
||||
#INFO: Proxy running on http://0.0.0.0:8000
|
||||
|
||||
Test
|
||||
In a new shell, run, this will make an openai.ChatCompletion request
|
||||
|
||||
litellm --test
|
||||
|
||||
This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.
|
||||
|
||||
Replace openai base
|
||||
import openai
|
||||
|
||||
openai.api_base = "http://0.0.0.0:8000"
|
||||
|
||||
print(openai.chat.completions.create(model="test", messages=[{"role":"user", "content":"Hey!"}]))
|
||||
|
||||
Supported LLMs
|
||||
Bedrock
|
||||
Huggingface (TGI)
|
||||
Anthropic
|
||||
VLLM
|
||||
OpenAI Compatible Server
|
||||
TogetherAI
|
||||
Replicate
|
||||
Petals
|
||||
Palm
|
||||
Azure OpenAI
|
||||
AI21
|
||||
Cohere
|
||||
$ export AWS_ACCESS_KEY_ID=""
|
||||
$ export AWS_REGION_NAME="" # e.g. us-west-2
|
||||
$ export AWS_SECRET_ACCESS_KEY=""
|
||||
|
||||
$ litellm --model bedrock/anthropic.claude-v2
|
||||
|
||||
Server Endpoints
|
||||
POST /chat/completions - chat completions endpoint to call 100+ LLMs
|
||||
POST /completions - completions endpoint
|
||||
POST /embeddings - embedding endpoint for Azure, OpenAI, Huggingface endpoints
|
||||
GET /models - available models on server
|
Loading…
Add table
Add a link
Reference in a new issue