litellm/docs/my-website/docs/proxy/quick_start.md
2024-08-08 17:19:12 -07:00

460 lines
9.9 KiB
Markdown

import Image from '@theme/IdealImage';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Quick Start
Quick start CLI, Config, Docker
LiteLLM Server (LLM Gateway) manages:
* **Unified Interface**: Calling 100+ LLMs [Huggingface/Bedrock/TogetherAI/etc.](#other-supported-models) in the OpenAI `ChatCompletions` & `Completions` format
* **Cost tracking**: Authentication, Spend Tracking & Budgets [Virtual Keys](https://docs.litellm.ai/docs/proxy/virtual_keys)
* **Load Balancing**: between [Multiple Models](#multiple-models---quick-start) + [Deployments of the same model](#multiple-instances-of-1-model) - LiteLLM proxy can handle 1.5k+ requests/second during load tests.
```shell
$ pip install 'litellm[proxy]'
```
## Quick Start - LiteLLM Proxy CLI
Run the following command to start the litellm proxy
```shell
$ litellm --model huggingface/bigcode/starcoder
#INFO: Proxy running on http://0.0.0.0:4000
```
:::info
Run with `--detailed_debug` if you need detailed debug logs
```shell
$ litellm --model huggingface/bigcode/starcoder --detailed_debug
:::
### Test
In a new shell, run, this will make an `openai.chat.completions` request. Ensure you're using openai v1.0.0+
```shell
litellm --test
```
This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints.
### Supported LLMs
All LiteLLM supported LLMs are supported on the Proxy. Seel all [supported llms](https://docs.litellm.ai/docs/providers)
<Tabs>
<TabItem value="bedrock" label="AWS Bedrock">
```shell
$ export AWS_ACCESS_KEY_ID=
$ export AWS_REGION_NAME=
$ export AWS_SECRET_ACCESS_KEY=
```
```shell
$ litellm --model bedrock/anthropic.claude-v2
```
</TabItem>
<TabItem value="azure" label="Azure OpenAI">
```shell
$ export AZURE_API_KEY=my-api-key
$ export AZURE_API_BASE=my-api-base
```
```
$ litellm --model azure/my-deployment-name
```
</TabItem>
<TabItem value="openai" label="OpenAI">
```shell
$ export OPENAI_API_KEY=my-api-key
```
```shell
$ litellm --model gpt-3.5-turbo
```
</TabItem>
<TabItem value="ollama" label="Ollama">
```
$ litellm --model ollama/<ollama-model-name>
```
</TabItem>
<TabItem value="openai-proxy" label="OpenAI Compatible Endpoint">
```shell
$ export OPENAI_API_KEY=my-api-key
```
```shell
$ litellm --model openai/<your model name> --api_base <your-api-base> # e.g. http://0.0.0.0:3000
```
</TabItem>
<TabItem value="vertex-ai" label="Vertex AI [Gemini]">
```shell
$ export VERTEX_PROJECT="hardy-project"
$ export VERTEX_LOCATION="us-west"
```
```shell
$ litellm --model vertex_ai/gemini-pro
```
</TabItem>
<TabItem value="huggingface" label="Huggingface (TGI) Deployed">
```shell
$ export HUGGINGFACE_API_KEY=my-api-key #[OPTIONAL]
```
```shell
$ litellm --model huggingface/<your model name> --api_base <your-api-base> # e.g. http://0.0.0.0:3000
```
</TabItem>
<TabItem value="huggingface-local" label="Huggingface (TGI) Local">
```shell
$ litellm --model huggingface/<your model name> --api_base http://0.0.0.0:8001
```
</TabItem>
<TabItem value="aws-sagemaker" label="AWS Sagemaker">
```shell
export AWS_ACCESS_KEY_ID=
export AWS_REGION_NAME=
export AWS_SECRET_ACCESS_KEY=
```
```shell
$ litellm --model sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b
```
</TabItem>
<TabItem value="anthropic" label="Anthropic">
```shell
$ export ANTHROPIC_API_KEY=my-api-key
```
```shell
$ litellm --model claude-instant-1
```
</TabItem>
<TabItem value="vllm-local" label="VLLM">
Assuming you're running vllm locally
```shell
$ litellm --model vllm/facebook/opt-125m
```
</TabItem>
<TabItem value="together_ai" label="TogetherAI">
```shell
$ export TOGETHERAI_API_KEY=my-api-key
```
```shell
$ litellm --model together_ai/lmsys/vicuna-13b-v1.5-16k
```
</TabItem>
<TabItem value="replicate" label="Replicate">
```shell
$ export REPLICATE_API_KEY=my-api-key
```
```shell
$ litellm \
--model replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3
```
</TabItem>
<TabItem value="petals" label="Petals">
```shell
$ litellm --model petals/meta-llama/Llama-2-70b-chat-hf
```
</TabItem>
<TabItem value="palm" label="Palm">
```shell
$ export PALM_API_KEY=my-palm-key
```
```shell
$ litellm --model palm/chat-bison
```
</TabItem>
<TabItem value="ai21" label="AI21">
```shell
$ export AI21_API_KEY=my-api-key
```
```shell
$ litellm --model j2-light
```
</TabItem>
<TabItem value="cohere" label="Cohere">
```shell
$ export COHERE_API_KEY=my-api-key
```
```shell
$ litellm --model command-nightly
```
</TabItem>
</Tabs>
## Quick Start - LiteLLM Proxy + Config.yaml
The config allows you to create a model list and set `api_base`, `max_tokens` (all litellm params). See more details about the config [here](https://docs.litellm.ai/docs/proxy/configs)
### Create a Config for LiteLLM Proxy
Example config
```yaml
model_list:
- model_name: gpt-3.5-turbo # user-facing model alias
litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
model: azure/<your-deployment-name>
api_base: <your-azure-api-endpoint>
api_key: <your-azure-api-key>
- model_name: gpt-3.5-turbo
litellm_params:
model: azure/gpt-turbo-small-ca
api_base: https://my-endpoint-canada-berri992.openai.azure.com/
api_key: <your-azure-api-key>
- model_name: vllm-model
litellm_params:
model: openai/<your-model-name>
api_base: <your-api-base> # e.g. http://0.0.0.0:3000
```
### Run proxy with config
```shell
litellm --config your_config.yaml
```
## Using LiteLLM Proxy - Curl Request, OpenAI Package, Langchain
:::info
LiteLLM is compatible with several SDKs - including OpenAI SDK, Anthropic SDK, Mistral SDK, LLamaIndex, Langchain (Js, Python)
[More examples here](user_keys)
:::
<Tabs>
<TabItem value="Curl" label="Curl Request">
```shell
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}
'
```
</TabItem>
<TabItem value="openai" label="OpenAI v1.0.0+">
```python
import openai
client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
```
</TabItem>
<TabItem value="langchain" label="Langchain">
```python
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage
chat = ChatOpenAI(
openai_api_base="http://0.0.0.0:4000", # set openai_api_base to the LiteLLM Proxy
model = "gpt-3.5-turbo",
temperature=0.1
)
messages = [
SystemMessage(
content="You are a helpful assistant that im using to make a test request to."
),
HumanMessage(
content="test from litellm. tell me why it's amazing in 1 sentence"
),
]
response = chat(messages)
print(response)
```
</TabItem>
<TabItem value="langchain-embedding" label="Langchain Embeddings">
```python
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="sagemaker-embeddings", openai_api_base="http://0.0.0.0:4000", openai_api_key="temp-key")
text = "This is a test document."
query_result = embeddings.embed_query(text)
print(f"SAGEMAKER EMBEDDINGS")
print(query_result[:5])
embeddings = OpenAIEmbeddings(model="bedrock-embeddings", openai_api_base="http://0.0.0.0:4000", openai_api_key="temp-key")
text = "This is a test document."
query_result = embeddings.embed_query(text)
print(f"BEDROCK EMBEDDINGS")
print(query_result[:5])
embeddings = OpenAIEmbeddings(model="bedrock-titan-embeddings", openai_api_base="http://0.0.0.0:4000", openai_api_key="temp-key")
text = "This is a test document."
query_result = embeddings.embed_query(text)
print(f"TITAN EMBEDDINGS")
print(query_result[:5])
```
</TabItem>
<TabItem value="litellm" label="LiteLLM SDK">
This is **not recommended**. There is duplicate logic as the proxy also uses the sdk, which might lead to unexpected errors.
```python
from litellm import completion
response = completion(
model="openai/gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
],
api_key="anything",
base_url="http://0.0.0.0:4000"
)
print(response)
```
</TabItem>
<TabItem value="anthropic-py" label="Anthropic Python SDK">
```python
import os
from anthropic import Anthropic
client = Anthropic(
base_url="http://localhost:4000", # proxy endpoint
api_key="sk-s4xN1IiLTCytwtZFJaYQrA", # litellm proxy virtual key
)
message = client.messages.create(
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Hello, Claude",
}
],
model="claude-3-opus-20240229",
)
print(message.content)
```
</TabItem>
</Tabs>
[**More Info**](./configs.md)
## 📖 Proxy Endpoints - [Swagger Docs](https://litellm-api.up.railway.app/)
- POST `/chat/completions` - chat completions endpoint to call 100+ LLMs
- POST `/completions` - completions endpoint
- POST `/embeddings` - embedding endpoint for Azure, OpenAI, Huggingface endpoints
- GET `/models` - available models on server
- POST `/key/generate` - generate a key to access the proxy
## Debugging Proxy
Events that occur during normal operation
```shell
litellm --model gpt-3.5-turbo --debug
```
Detailed information
```shell
litellm --model gpt-3.5-turbo --detailed_debug
```
### Set Debug Level using env variables
Events that occur during normal operation
```shell
export LITELLM_LOG=INFO
```
Detailed information
```shell
export LITELLM_LOG=DEBUG
```
No Logs
```shell
export LITELLM_LOG=None
```