forked from phoenix/litellm-mirror
43 lines
No EOL
1.7 KiB
Text
43 lines
No EOL
1.7 KiB
Text
Given this context, what is litellm? LiteLLM about: About
|
|
Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs). LiteLLM manages
|
|
|
|
Translating inputs to the provider's completion and embedding endpoints
|
|
Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
|
|
Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.
|
|
10/05/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more
|
|
10/16/2023: Self-hosted OpenAI-proxy server Learn more
|
|
|
|
Usage (Docs)
|
|
Important
|
|
LiteLLM v1.0.0 is being launched to require openai>=1.0.0. Track this here
|
|
|
|
Open In Colab
|
|
pip install litellm
|
|
from litellm import completion
|
|
import os
|
|
|
|
## set ENV variables
|
|
os.environ["OPENAI_API_KEY"] = "your-openai-key"
|
|
os.environ["COHERE_API_KEY"] = "your-cohere-key"
|
|
|
|
messages = [{ "content": "Hello, how are you?","role": "user"}]
|
|
|
|
# openai call
|
|
response = completion(model="gpt-3.5-turbo", messages=messages)
|
|
|
|
# cohere call
|
|
response = completion(model="command-nightly", messages=messages)
|
|
print(response)
|
|
Streaming (Docs)
|
|
liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
|
|
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)
|
|
|
|
from litellm import completion
|
|
response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
|
|
for chunk in response:
|
|
print(chunk['choices'][0]['delta'])
|
|
|
|
# claude 2
|
|
result = completion('claude-2', messages, stream=True)
|
|
for chunk in result:
|
|
print(chunk['choices'][0]['delta']) |