forked from phoenix/litellm-mirror
(docs) using /embeddings with Proxy
This commit is contained in:
parent
c8f8bd9e57
commit
cf902a53b4
3 changed files with 232 additions and 191 deletions
|
@ -307,6 +307,126 @@ model_list:
|
|||
$ litellm --config /path/to/config.yaml
|
||||
```
|
||||
|
||||
## Setting Embedding Models
|
||||
|
||||
See supported Embedding Providers & Models [here](https://docs.litellm.ai/docs/embedding/supported_embedding)
|
||||
|
||||
### Use Sagemaker, Bedrock, Azure, OpenAI, XInference
|
||||
#### Create Config.yaml
|
||||
|
||||
<Tabs>
|
||||
|
||||
<TabItem value="sagemaker" label="Sagemaker, Bedrock Embeddings">
|
||||
|
||||
Here's how to route between GPT-J embedding (sagemaker endpoint), Amazon Titan embedding (Bedrock) and Azure OpenAI embedding on the proxy server:
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: sagemaker-embeddings
|
||||
litellm_params:
|
||||
model: "sagemaker/berri-benchmarking-gpt-j-6b-fp16"
|
||||
- model_name: amazon-embeddings
|
||||
litellm_params:
|
||||
model: "bedrock/amazon.titan-embed-text-v1"
|
||||
- model_name: azure-embeddings
|
||||
litellm_params:
|
||||
model: "azure/azure-embedding-model"
|
||||
api_base: "os.environ/AZURE_API_BASE" # os.getenv("AZURE_API_BASE")
|
||||
api_key: "os.environ/AZURE_API_KEY" # os.getenv("AZURE_API_KEY")
|
||||
api_version: "2023-07-01-preview"
|
||||
|
||||
general_settings:
|
||||
master_key: sk-1234 # [OPTIONAL] if set all calls to proxy will require either this key or a valid generated token
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="Hugging Face emb" label="Hugging Face Embeddings">
|
||||
LiteLLM Proxy supports all <a href="https://huggingface.co/models?pipeline_tag=feature-extraction">Feature-Extraction Embedding models</a>.
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: deployed-codebert-base
|
||||
litellm_params:
|
||||
# send request to deployed hugging face inference endpoint
|
||||
model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
|
||||
api_key: hf_LdS # api key for hugging face inference endpoint
|
||||
api_base: https://uysneno1wv2wd4lw.us-east-1.aws.endpoints.huggingface.cloud # your hf inference endpoint
|
||||
- model_name: codebert-base
|
||||
litellm_params:
|
||||
# no api_base set, sends request to hugging face free inference api https://api-inference.huggingface.co/models/
|
||||
model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
|
||||
api_key: hf_LdS # api key for hugging face
|
||||
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="azure" label="Azure OpenAI Embeddings">
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: azure-embedding-model # model group
|
||||
litellm_params:
|
||||
model: azure/azure-embedding-model # model name for litellm.embedding(model=azure/azure-embedding-model) call
|
||||
api_base: your-azure-api-base
|
||||
api_key: your-api-key
|
||||
api_version: 2023-07-01-preview
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="openai" label="OpenAI Embeddings">
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: text-embedding-ada-002 # model group
|
||||
litellm_params:
|
||||
model: text-embedding-ada-002 # model name for litellm.embedding(model=text-embedding-ada-002)
|
||||
api_key: your-api-key-1
|
||||
- model_name: text-embedding-ada-002
|
||||
litellm_params:
|
||||
model: text-embedding-ada-002
|
||||
api_key: your-api-key-2
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="openai emb" label="OpenAI Compatible Embeddings">
|
||||
|
||||
<p>Use this for calling <a href="https://github.com/xorbitsai/inference">/embedding endpoints on OpenAI Compatible Servers</a>.</p>
|
||||
|
||||
**Note add `openai/` prefix to `litellm_params`: `model` so litellm knows to route to OpenAI**
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: text-embedding-ada-002 # model group
|
||||
litellm_params:
|
||||
model: openai/<your-model-name> # model name for litellm.embedding(model=text-embedding-ada-002)
|
||||
api_base: <model-api-base>
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
#### Start Proxy
|
||||
```shell
|
||||
litellm --config config.yaml
|
||||
```
|
||||
|
||||
#### Make Request
|
||||
Sends Request to `deployed-codebert-base`
|
||||
|
||||
```shell
|
||||
curl --location 'http://0.0.0.0:8000/embeddings' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data ' {
|
||||
"model": "deployed-codebert-base",
|
||||
"input": ["write a litellm poem"]
|
||||
}'
|
||||
```
|
||||
|
||||
|
||||
## Router Settings
|
||||
|
||||
Use this to configure things like routing strategy.
|
||||
|
|
|
@ -47,196 +47,9 @@ curl --location 'http://0.0.0.0:8000/v1/embeddings' \
|
|||
}'
|
||||
```
|
||||
|
||||
## `/embeddings` Request Format
|
||||
Input, Output and Exceptions are mapped to the OpenAI format for all supported models
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="Curl" label="Curl Request">
|
||||
|
||||
```shell
|
||||
curl --location 'http://0.0.0.0:8000/embeddings' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data ' {
|
||||
"model": "text-embedding-ada-002",
|
||||
"input": ["write a litellm poem"]
|
||||
}'
|
||||
```
|
||||
</TabItem>
|
||||
<TabItem value="openai" label="OpenAI v1.0.0+">
|
||||
|
||||
```python
|
||||
import openai
|
||||
from openai import OpenAI
|
||||
|
||||
# set base_url to your proxy server
|
||||
# set api_key to send to proxy server
|
||||
client = OpenAI(api_key="<proxy-api-key>", base_url="http://0.0.0.0:8000")
|
||||
|
||||
response = openai.embeddings.create(
|
||||
input=["hello from litellm"],
|
||||
model="text-embedding-ada-002"
|
||||
)
|
||||
|
||||
print(response)
|
||||
|
||||
```
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="langchain-embedding" label="Langchain Embeddings">
|
||||
|
||||
```python
|
||||
from langchain.embeddings import OpenAIEmbeddings
|
||||
|
||||
embeddings = OpenAIEmbeddings(model="sagemaker-embeddings", openai_api_base="http://0.0.0.0:8000", openai_api_key="temp-key")
|
||||
|
||||
|
||||
text = "This is a test document."
|
||||
|
||||
query_result = embeddings.embed_query(text)
|
||||
|
||||
print(f"SAGEMAKER EMBEDDINGS")
|
||||
print(query_result[:5])
|
||||
|
||||
embeddings = OpenAIEmbeddings(model="bedrock-embeddings", openai_api_base="http://0.0.0.0:8000", openai_api_key="temp-key")
|
||||
|
||||
text = "This is a test document."
|
||||
|
||||
query_result = embeddings.embed_query(text)
|
||||
|
||||
print(f"BEDROCK EMBEDDINGS")
|
||||
print(query_result[:5])
|
||||
|
||||
embeddings = OpenAIEmbeddings(model="bedrock-titan-embeddings", openai_api_base="http://0.0.0.0:8000", openai_api_key="temp-key")
|
||||
|
||||
text = "This is a test document."
|
||||
|
||||
query_result = embeddings.embed_query(text)
|
||||
|
||||
print(f"TITAN EMBEDDINGS")
|
||||
print(query_result[:5])
|
||||
```
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
|
||||
## `/embeddings` Response Format
|
||||
|
||||
```json
|
||||
{
|
||||
"object": "list",
|
||||
"data": [
|
||||
{
|
||||
"object": "embedding",
|
||||
"embedding": [
|
||||
0.0023064255,
|
||||
-0.009327292,
|
||||
....
|
||||
-0.0028842222,
|
||||
],
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"model": "text-embedding-ada-002",
|
||||
"usage": {
|
||||
"prompt_tokens": 8,
|
||||
"total_tokens": 8
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
## Supported Models
|
||||
|
||||
See supported Embedding Providers & Models [here](https://docs.litellm.ai/docs/embedding/supported_embedding)
|
||||
|
||||
#### Create Config.yaml
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="Hugging Face emb" label="Hugging Face Embeddings">
|
||||
LiteLLM Proxy supports all <a href="https://huggingface.co/models?pipeline_tag=feature-extraction">Feature-Extraction Embedding models</a>.
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: deployed-codebert-base
|
||||
litellm_params:
|
||||
# send request to deployed hugging face inference endpoint
|
||||
model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
|
||||
api_key: hf_LdS # api key for hugging face inference endpoint
|
||||
api_base: https://uysneno1wv2wd4lw.us-east-1.aws.endpoints.huggingface.cloud # your hf inference endpoint
|
||||
- model_name: codebert-base
|
||||
litellm_params:
|
||||
# no api_base set, sends request to hugging face free inference api https://api-inference.huggingface.co/models/
|
||||
model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
|
||||
api_key: hf_LdS # api key for hugging face
|
||||
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="azure" label="Azure OpenAI Embeddings">
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: azure-embedding-model # model group
|
||||
litellm_params:
|
||||
model: azure/azure-embedding-model # model name for litellm.embedding(model=azure/azure-embedding-model) call
|
||||
api_base: your-azure-api-base
|
||||
api_key: your-api-key
|
||||
api_version: 2023-07-01-preview
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="openai" label="OpenAI Embeddings">
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: text-embedding-ada-002 # model group
|
||||
litellm_params:
|
||||
model: text-embedding-ada-002 # model name for litellm.embedding(model=text-embedding-ada-002)
|
||||
api_key: your-api-key-1
|
||||
- model_name: text-embedding-ada-002
|
||||
litellm_params:
|
||||
model: text-embedding-ada-002
|
||||
api_key: your-api-key-2
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="openai emb" label="OpenAI Compatible Embeddings">
|
||||
|
||||
<p>Use this for calling <a href="https://github.com/xorbitsai/inference">/embedding endpoints on OpenAI Compatible Servers</a>.</p>
|
||||
|
||||
**Note add `openai/` prefix to `litellm_params`: `model` so litellm knows to route to OpenAI**
|
||||
|
||||
```yaml
|
||||
model_list:
|
||||
- model_name: text-embedding-ada-002 # model group
|
||||
litellm_params:
|
||||
model: openai/<your-model-name> # model name for litellm.embedding(model=text-embedding-ada-002)
|
||||
api_base: <model-api-base>
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
#### Start Proxy
|
||||
```shell
|
||||
litellm --config config.yaml
|
||||
```
|
||||
|
||||
#### Make Request
|
||||
Sends Request to `deployed-codebert-base`
|
||||
|
||||
```shell
|
||||
curl --location 'http://0.0.0.0:8000/embeddings' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data ' {
|
||||
"model": "deployed-codebert-base",
|
||||
"input": ["write a litellm poem"]
|
||||
}'
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -3,6 +3,12 @@ import TabItem from '@theme/TabItem';
|
|||
|
||||
# Use with Langchain, OpenAI SDK, Curl
|
||||
|
||||
:::info
|
||||
|
||||
**Input, Output, Exceptions are mapped to the OpenAI format for all supported models**
|
||||
|
||||
:::
|
||||
|
||||
How to send requests to the proxy, pass metadata, allow users to pass in their OpenAI API key
|
||||
|
||||
## `/chat/completions`
|
||||
|
@ -139,7 +145,109 @@ print(response)
|
|||
|
||||
```
|
||||
|
||||
## Pass User LLM API Keys
|
||||
## `/embeddings`
|
||||
|
||||
### Request Format
|
||||
Input, Output and Exceptions are mapped to the OpenAI format for all supported models
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="openai" label="OpenAI Python v1.0.0+">
|
||||
|
||||
```python
|
||||
import openai
|
||||
from openai import OpenAI
|
||||
|
||||
# set base_url to your proxy server
|
||||
# set api_key to send to proxy server
|
||||
client = OpenAI(api_key="<proxy-api-key>", base_url="http://0.0.0.0:8000")
|
||||
|
||||
response = openai.embeddings.create(
|
||||
input=["hello from litellm"],
|
||||
model="text-embedding-ada-002"
|
||||
)
|
||||
|
||||
print(response)
|
||||
|
||||
```
|
||||
</TabItem>
|
||||
<TabItem value="Curl" label="Curl Request">
|
||||
|
||||
```shell
|
||||
curl --location 'http://0.0.0.0:8000/embeddings' \
|
||||
--header 'Content-Type: application/json' \
|
||||
--data ' {
|
||||
"model": "text-embedding-ada-002",
|
||||
"input": ["write a litellm poem"]
|
||||
}'
|
||||
```
|
||||
</TabItem>
|
||||
|
||||
<TabItem value="langchain-embedding" label="Langchain Embeddings">
|
||||
|
||||
```python
|
||||
from langchain.embeddings import OpenAIEmbeddings
|
||||
|
||||
embeddings = OpenAIEmbeddings(model="sagemaker-embeddings", openai_api_base="http://0.0.0.0:8000", openai_api_key="temp-key")
|
||||
|
||||
|
||||
text = "This is a test document."
|
||||
|
||||
query_result = embeddings.embed_query(text)
|
||||
|
||||
print(f"SAGEMAKER EMBEDDINGS")
|
||||
print(query_result[:5])
|
||||
|
||||
embeddings = OpenAIEmbeddings(model="bedrock-embeddings", openai_api_base="http://0.0.0.0:8000", openai_api_key="temp-key")
|
||||
|
||||
text = "This is a test document."
|
||||
|
||||
query_result = embeddings.embed_query(text)
|
||||
|
||||
print(f"BEDROCK EMBEDDINGS")
|
||||
print(query_result[:5])
|
||||
|
||||
embeddings = OpenAIEmbeddings(model="bedrock-titan-embeddings", openai_api_base="http://0.0.0.0:8000", openai_api_key="temp-key")
|
||||
|
||||
text = "This is a test document."
|
||||
|
||||
query_result = embeddings.embed_query(text)
|
||||
|
||||
print(f"TITAN EMBEDDINGS")
|
||||
print(query_result[:5])
|
||||
```
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
|
||||
### Response Format
|
||||
|
||||
```json
|
||||
{
|
||||
"object": "list",
|
||||
"data": [
|
||||
{
|
||||
"object": "embedding",
|
||||
"embedding": [
|
||||
0.0023064255,
|
||||
-0.009327292,
|
||||
....
|
||||
-0.0028842222,
|
||||
],
|
||||
"index": 0
|
||||
}
|
||||
],
|
||||
"model": "text-embedding-ada-002",
|
||||
"usage": {
|
||||
"prompt_tokens": 8,
|
||||
"total_tokens": 8
|
||||
}
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Advanced
|
||||
### Pass User LLM API Keys
|
||||
Allows your users to pass in their OpenAI API key (any LiteLLM supported provider) to make requests
|
||||
|
||||
Here's how to do it:
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue