Merge branch 'main' into litellm_dev_11_13_2024

This commit is contained in:
Krish Dholakia 2024-11-15 11:18:02 +05:30 committed by GitHub
commit 1dcbfda202
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
76 changed files with 2836 additions and 560 deletions

View file

@ -75,6 +75,7 @@ Works for:
- Google AI Studio - Gemini models
- Vertex AI models (Gemini + Anthropic)
- Bedrock Models
- Anthropic API Models
<Tabs>
<TabItem value="sdk" label="SDK">

View file

@ -93,7 +93,7 @@ curl http://0.0.0.0:4000/v1/chat/completions \
## Check Model Support
Call `litellm.get_model_info` to check if a model/provider supports `response_format`.
Call `litellm.get_model_info` to check if a model/provider supports `prefix`.
<Tabs>
<TabItem value="sdk" label="SDK">
@ -116,4 +116,4 @@ curl -X GET 'http://0.0.0.0:4000/v1/model/info' \
-H 'Authorization: Bearer $LITELLM_KEY' \
```
</TabItem>
</Tabs>
</Tabs>

View file

@ -957,3 +957,69 @@ curl http://0.0.0.0:4000/v1/chat/completions \
```
</TabItem>
</Tabs>
## Usage - passing 'user_id' to Anthropic
LiteLLM translates the OpenAI `user` param to Anthropic's `metadata[user_id]` param.
<Tabs>
<TabItem value="sdk" label="SDK">
```python
response = completion(
model="claude-3-5-sonnet-20240620",
messages=messages,
user="user_123",
)
```
</TabItem>
<TabItem value="proxy" label="PROXY">
1. Setup config.yaml
```yaml
model_list:
- model_name: claude-3-5-sonnet-20240620
litellm_params:
model: anthropic/claude-3-5-sonnet-20240620
api_key: os.environ/ANTHROPIC_API_KEY
```
2. Start Proxy
```
litellm --config /path/to/config.yaml
```
3. Test it!
```bash
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <YOUR-LITELLM-KEY>" \
-d '{
"model": "claude-3-5-sonnet-20240620",
"messages": [{"role": "user", "content": "What is Anthropic?"}],
"user": "user_123"
}'
```
</TabItem>
</Tabs>
## All Supported OpenAI Params
```
"stream",
"stop",
"temperature",
"top_p",
"max_tokens",
"max_completion_tokens",
"tools",
"tool_choice",
"extra_headers",
"parallel_tool_calls",
"response_format",
"user"
```

View file

@ -37,7 +37,7 @@ os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key"
messages = [{ "content": "There's a llama in my garden 😱 What should I do?","role": "user"}]
# e.g. Call 'https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct' from Serverless Inference API
response = litellm.completion(
response = completion(
model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct",
messages=[{ "content": "Hello, how are you?","role": "user"}],
stream=True
@ -165,14 +165,14 @@ Steps to use
```python
import os
import litellm
from litellm import completion
os.environ["HUGGINGFACE_API_KEY"] = ""
# TGI model: Call https://huggingface.co/glaiveai/glaive-coder-7b
# add the 'huggingface/' prefix to the model to set huggingface as the provider
# set api base to your deployed api endpoint from hugging face
response = litellm.completion(
response = completion(
model="huggingface/glaiveai/glaive-coder-7b",
messages=[{ "content": "Hello, how are you?","role": "user"}],
api_base="https://wjiegasee9bmqke2.us-east-1.aws.endpoints.huggingface.cloud"
@ -383,6 +383,8 @@ def default_pt(messages):
#### Custom prompt templates
```python
import litellm
# Create your own custom prompt template works
litellm.register_prompt_template(
model="togethercomputer/LLaMA-2-7B-32K",

View file

@ -1,6 +1,13 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Jina AI
https://jina.ai/embeddings/
Supported endpoints:
- /embeddings
- /rerank
## API Key
```python
# env variable
@ -8,6 +15,10 @@ os.environ['JINA_AI_API_KEY']
```
## Sample Usage - Embedding
<Tabs>
<TabItem value="sdk" label="SDK">
```python
from litellm import embedding
import os
@ -19,6 +30,142 @@ response = embedding(
)
print(response)
```
</TabItem>
<TabItem value="proxy" label="PROXY">
1. Add to config.yaml
```yaml
model_list:
- model_name: embedding-model
litellm_params:
model: jina_ai/jina-embeddings-v3
api_key: os.environ/JINA_AI_API_KEY
```
2. Start proxy
```bash
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000/
```
3. Test it!
```bash
curl -L -X POST 'http://0.0.0.0:4000/embeddings' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"input": ["hello world"], "model": "embedding-model"}'
```
</TabItem>
</Tabs>
## Sample Usage - Rerank
<Tabs>
<TabItem value="sdk" label="SDK">
```python
from litellm import rerank
import os
os.environ["JINA_AI_API_KEY"] = "sk-..."
query = "What is the capital of the United States?"
documents = [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country.",
]
response = rerank(
model="jina_ai/jina-reranker-v2-base-multilingual",
query=query,
documents=documents,
top_n=3,
)
print(response)
```
</TabItem>
<TabItem value="proxy" label="PROXY">
1. Add to config.yaml
```yaml
model_list:
- model_name: rerank-model
litellm_params:
model: jina_ai/jina-reranker-v2-base-multilingual
api_key: os.environ/JINA_AI_API_KEY
```
2. Start proxy
```bash
litellm --config /path/to/config.yaml
```
3. Test it!
```bash
curl -L -X POST 'http://0.0.0.0:4000/rerank' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"model": "rerank-model",
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country."
],
"top_n": 3
}'
```
</TabItem>
</Tabs>
## Supported Models
All models listed here https://jina.ai/embeddings/ are supported
## Supported Optional Rerank Parameters
All cohere rerank parameters are supported.
## Supported Optional Embeddings Parameters
```
dimensions
```
## Provider-specific parameters
Pass any jina ai specific parameters as a keyword argument to the `embedding` or `rerank` function, e.g.
<Tabs>
<TabItem value="sdk" label="SDK">
```python
response = embedding(
model="jina_ai/jina-embeddings-v3",
input=["good morning from litellm"],
dimensions=1536,
my_custom_param="my_custom_value", # any other jina ai specific parameters
)
```
</TabItem>
<TabItem value="proxy" label="PROXY">
```bash
curl -L -X POST 'http://0.0.0.0:4000/embeddings' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"input": ["good morning from litellm"], "model": "jina_ai/jina-embeddings-v3", "dimensions": 1536, "my_custom_param": "my_custom_value"}'
```
</TabItem>
</Tabs>

View file

@ -1562,6 +1562,10 @@ curl http://0.0.0.0:4000/v1/chat/completions \
## **Embedding Models**
#### Usage - Embedding
<Tabs>
<TabItem value="sdk" label="SDK">
```python
import litellm
from litellm import embedding
@ -1574,6 +1578,49 @@ response = embedding(
)
print(response)
```
</TabItem>
<TabItem value="proxy" label="LiteLLM PROXY">
1. Add model to config.yaml
```yaml
model_list:
- model_name: snowflake-arctic-embed-m-long-1731622468876
litellm_params:
model: vertex_ai/<your-model-id>
vertex_project: "adroit-crow-413218"
vertex_location: "us-central1"
vertex_credentials: adroit-crow-413218-a956eef1a2a8.json
litellm_settings:
drop_params: True
```
2. Start Proxy
```
$ litellm --config /path/to/config.yaml
```
3. Make Request using OpenAI Python SDK, Langchain Python SDK
```python
import openai
client = openai.OpenAI(api_key="sk-1234", base_url="http://0.0.0.0:4000")
response = client.embeddings.create(
model="snowflake-arctic-embed-m-long-1731622468876",
input = ["good morning from litellm", "this is another item"],
)
print(response)
```
</TabItem>
</Tabs>
#### Supported Embedding Models
All models listed [here](https://github.com/BerriAI/litellm/blob/57f37f743886a0249f630a6792d49dffc2c5d9b7/model_prices_and_context_window.json#L835) are supported
@ -1589,6 +1636,7 @@ All models listed [here](https://github.com/BerriAI/litellm/blob/57f37f743886a02
| textembedding-gecko@003 | `embedding(model="vertex_ai/textembedding-gecko@003", input)` |
| text-embedding-preview-0409 | `embedding(model="vertex_ai/text-embedding-preview-0409", input)` |
| text-multilingual-embedding-preview-0409 | `embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input)` |
| Fine-tuned OR Custom Embedding models | `embedding(model="vertex_ai/<your-model-id>", input)` |
### Supported OpenAI (Unified) Params

View file

@ -791,9 +791,9 @@ general_settings:
| store_model_in_db | boolean | If true, allows `/model/new` endpoint to store model information in db. Endpoint disabled by default. [Doc on `/model/new` endpoint](./model_management.md#create-a-new-model) |
| max_request_size_mb | int | The maximum size for requests in MB. Requests above this size will be rejected. |
| max_response_size_mb | int | The maximum size for responses in MB. LLM Responses above this size will not be sent. |
| proxy_budget_rescheduler_min_time | int | The minimum time (in seconds) to wait before checking db for budget resets. |
| proxy_budget_rescheduler_max_time | int | The maximum time (in seconds) to wait before checking db for budget resets. |
| proxy_batch_write_at | int | Time (in seconds) to wait before batch writing spend logs to the db. |
| proxy_budget_rescheduler_min_time | int | The minimum time (in seconds) to wait before checking db for budget resets. **Default is 597 seconds** |
| proxy_budget_rescheduler_max_time | int | The maximum time (in seconds) to wait before checking db for budget resets. **Default is 605 seconds** |
| proxy_batch_write_at | int | Time (in seconds) to wait before batch writing spend logs to the db. **Default is 10 seconds** |
| alerting_args | dict | Args for Slack Alerting [Doc on Slack Alerting](./alerting.md) |
| custom_key_generate | str | Custom function for key generation [Doc on custom key generation](./virtual_keys.md#custom--key-generate) |
| allowed_ips | List[str] | List of IPs allowed to access the proxy. If not set, all IPs are allowed. |

View file

@ -66,10 +66,16 @@ Removes any field with `user_api_key_*` from metadata.
Found under `kwargs["standard_logging_object"]`. This is a standard payload, logged for every response.
```python
class StandardLoggingPayload(TypedDict):
id: str
trace_id: str # Trace multiple LLM calls belonging to same overall request (e.g. fallbacks/retries)
call_type: str
response_cost: float
response_cost_failure_debug_info: Optional[
StandardLoggingModelCostFailureDebugInformation
]
status: StandardLoggingPayloadStatus
total_tokens: int
prompt_tokens: int
completion_tokens: int
@ -84,13 +90,13 @@ class StandardLoggingPayload(TypedDict):
metadata: StandardLoggingMetadata
cache_hit: Optional[bool]
cache_key: Optional[str]
saved_cache_cost: Optional[float]
request_tags: list
saved_cache_cost: float
request_tags: list
end_user: Optional[str]
requester_ip_address: Optional[str] # IP address of requester
requester_metadata: Optional[dict] # metadata passed in request in the "metadata" field
requester_ip_address: Optional[str]
messages: Optional[Union[str, list, dict]]
response: Optional[Union[str, list, dict]]
error_str: Optional[str]
model_parameters: dict
hidden_params: StandardLoggingHiddenParams
@ -99,12 +105,47 @@ class StandardLoggingHiddenParams(TypedDict):
cache_key: Optional[str]
api_base: Optional[str]
response_cost: Optional[str]
additional_headers: Optional[dict]
additional_headers: Optional[StandardLoggingAdditionalHeaders]
class StandardLoggingAdditionalHeaders(TypedDict, total=False):
x_ratelimit_limit_requests: int
x_ratelimit_limit_tokens: int
x_ratelimit_remaining_requests: int
x_ratelimit_remaining_tokens: int
class StandardLoggingMetadata(StandardLoggingUserAPIKeyMetadata):
"""
Specific metadata k,v pairs logged to integration for easier cost tracking
"""
spend_logs_metadata: Optional[
dict
] # special param to log k,v pairs to spendlogs for a call
requester_ip_address: Optional[str]
requester_metadata: Optional[dict]
class StandardLoggingModelInformation(TypedDict):
model_map_key: str
model_map_value: Optional[ModelInfo]
StandardLoggingPayloadStatus = Literal["success", "failure"]
class StandardLoggingModelCostFailureDebugInformation(TypedDict, total=False):
"""
Debug information, if cost tracking fails.
Avoid logging sensitive information like response or optional params
"""
error_str: Required[str]
traceback_str: Required[str]
model: str
cache_hit: Optional[bool]
custom_llm_provider: Optional[str]
base_model: Optional[str]
call_type: str
custom_pricing: Optional[bool]
```
## Langfuse

View file

@ -1,5 +1,6 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Image from '@theme/IdealImage';
# ⚡ Best Practices for Production
@ -112,7 +113,35 @@ general_settings:
disable_spend_logs: True
```
## 7. Set LiteLLM Salt Key
## 7. Use Helm PreSync Hook for Database Migrations [BETA]
To ensure only one service manages database migrations, use our [Helm PreSync hook for Database Migrations](https://github.com/BerriAI/litellm/blob/main/deploy/charts/litellm-helm/templates/migrations-job.yaml). This ensures migrations are handled during `helm upgrade` or `helm install`, while LiteLLM pods explicitly disable migrations.
1. **Helm PreSync Hook**:
- The Helm PreSync hook is configured in the chart to run database migrations during deployments.
- The hook always sets `DISABLE_SCHEMA_UPDATE=false`, ensuring migrations are executed reliably.
Reference Settings to set on ArgoCD for `values.yaml`
```yaml
db:
useExisting: true # use existing Postgres DB
url: postgresql://ishaanjaffer0324:3rnwpOBau6hT@ep-withered-mud-a5dkdpke.us-east-2.aws.neon.tech/test-argo-cd?sslmode=require # url of existing Postgres DB
```
2. **LiteLLM Pods**:
- Set `DISABLE_SCHEMA_UPDATE=true` in LiteLLM pod configurations to prevent them from running migrations.
Example configuration for LiteLLM pod:
```yaml
env:
- name: DISABLE_SCHEMA_UPDATE
value: "true"
```
## 8. Set LiteLLM Salt Key
If you plan on using the DB, set a salt key for encrypting/decrypting variables in the DB.

View file

@ -748,4 +748,19 @@ curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
"max_tokens": 300,
"mock_testing_fallbacks": true
}'
```
### Disable Fallbacks per key
You can disable fallbacks per key by setting `disable_fallbacks: true` in your key metadata.
```bash
curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"metadata": {
"disable_fallbacks": true
}
}'
```

View file

@ -113,4 +113,5 @@ curl http://0.0.0.0:4000/rerank \
|-------------|--------------------|
| Cohere | [Usage](#quick-start) |
| Together AI| [Usage](../docs/providers/togetherai) |
| Azure AI| [Usage](../docs/providers/azure_ai) |
| Azure AI| [Usage](../docs/providers/azure_ai) |
| Jina AI| [Usage](../docs/providers/jina_ai) |

View file

@ -1,3 +1,6 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Secret Manager
LiteLLM supports reading secrets from Azure Key Vault, Google Secret Manager
@ -59,14 +62,35 @@ os.environ["AWS_REGION_NAME"] = "" # us-east-1, us-east-2, us-west-1, us-west-2
```
2. Enable AWS Secret Manager in config.
<Tabs>
<TabItem value="read_only" label="Read Keys from AWS Secret Manager">
```yaml
general_settings:
master_key: os.environ/litellm_master_key
key_management_system: "aws_secret_manager" # 👈 KEY CHANGE
key_management_settings:
hosted_keys: ["litellm_master_key"] # 👈 Specify which env keys you stored on AWS
```
</TabItem>
<TabItem value="write_only" label="Write Virtual Keys to AWS Secret Manager">
This will only store virtual keys in AWS Secret Manager. No keys will be read from AWS Secret Manager.
```yaml
general_settings:
key_management_system: "aws_secret_manager" # 👈 KEY CHANGE
key_management_settings:
store_virtual_keys: true
access_mode: "write_only" # Literal["read_only", "write_only", "read_and_write"]
```
</TabItem>
</Tabs>
3. Run proxy
```bash
@ -181,16 +205,14 @@ litellm --config /path/to/config.yaml
Use encrypted keys from Google KMS on the proxy
### Usage with LiteLLM Proxy Server
## Step 1. Add keys to env
Step 1. Add keys to env
```
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/credentials.json"
export GOOGLE_KMS_RESOURCE_NAME="projects/*/locations/*/keyRings/*/cryptoKeys/*"
export PROXY_DATABASE_URL_ENCRYPTED=b'\n$\x00D\xac\xb4/\x8e\xc...'
```
## Step 2: Update Config
Step 2: Update Config
```yaml
general_settings:
@ -199,7 +221,7 @@ general_settings:
master_key: sk-1234
```
## Step 3: Start + test proxy
Step 3: Start + test proxy
```
$ litellm --config /path/to/config.yaml
@ -215,3 +237,17 @@ $ litellm --test
<!--
## .env Files
If no secret manager client is specified, Litellm automatically uses the `.env` file to manage sensitive data. -->
## All Secret Manager Settings
All settings related to secret management
```yaml
general_settings:
key_management_system: "aws_secret_manager" # REQUIRED
key_management_settings:
store_virtual_keys: true # OPTIONAL. Defaults to False, when True will store virtual keys in secret manager
access_mode: "write_only" # OPTIONAL. Literal["read_only", "write_only", "read_and_write"]. Defaults to "read_only"
hosted_keys: ["litellm_master_key"] # OPTIONAL. Specify which env keys you stored on AWS
```