forked from phoenix/litellm-mirror
(docs) openai proxy configs
This commit is contained in:
parent
4b4d9bf142
commit
376dae73f9
1 changed files with 45 additions and 36 deletions
|
@ -366,57 +366,39 @@ litellm_settings:
|
||||||
set_verbose: True
|
set_verbose: True
|
||||||
```
|
```
|
||||||
|
|
||||||
### Set Custom Prompt Templates
|
### Quick Start - Config
|
||||||
|
|
||||||
LiteLLM by default checks if a model has a [prompt template and applies it](./completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`:
|
Here's how you can use multiple llms with one proxy `config.yaml`.
|
||||||
|
|
||||||
**Step 1**: Save your prompt template in a `config.yaml`
|
|
||||||
```yaml
|
|
||||||
# Model-specific parameters
|
|
||||||
model_list:
|
|
||||||
- model_name: mistral-7b # model alias
|
|
||||||
litellm_params: # actual params for litellm.completion()
|
|
||||||
model: "huggingface/mistralai/Mistral-7B-Instruct-v0.1"
|
|
||||||
api_base: "<your-api-base>"
|
|
||||||
api_key: "<your-api-key>" # [OPTIONAL] for hf inference endpoints
|
|
||||||
initial_prompt_value: "\n"
|
|
||||||
roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
|
|
||||||
final_prompt_value: "\n"
|
|
||||||
bos_token: "<s>"
|
|
||||||
eos_token: "</s>"
|
|
||||||
max_tokens: 4096
|
|
||||||
```
|
|
||||||
|
|
||||||
**Step 2**: Start server with config
|
|
||||||
|
|
||||||
```shell
|
|
||||||
$ litellm --config /path/to/config.yaml
|
|
||||||
```
|
|
||||||
|
|
||||||
### Using Multiple Models
|
|
||||||
|
|
||||||
If you have 1 model running on a local GPU and another that's hosted (e.g. on Runpod), you can call both via the same litellm server by listing them in your `config.yaml`.
|
|
||||||
|
|
||||||
|
#### Step 1: Setup Config
|
||||||
```yaml
|
```yaml
|
||||||
model_list:
|
model_list:
|
||||||
- model_name: zephyr-alpha
|
- model_name: zephyr-alpha
|
||||||
litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
|
litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
|
||||||
model: huggingface/HuggingFaceH4/zephyr-7b-alpha
|
model: huggingface/HuggingFaceH4/zephyr-7b-alpha
|
||||||
api_base: http://0.0.0.0:8001
|
api_base: http://0.0.0.0:8001
|
||||||
- model_name: zephyr-beta
|
- model_name: gpt-4
|
||||||
litellm_params:
|
litellm_params:
|
||||||
model: huggingface/HuggingFaceH4/zephyr-7b-beta
|
model: gpt-4
|
||||||
api_base: https://<my-hosted-endpoint>
|
api_key: sk-1233
|
||||||
|
- model_name: claude-2
|
||||||
|
litellm_params:
|
||||||
|
model: claude-2
|
||||||
|
api_key: sk-claude
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
#### Step 2: Start Proxy with config
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
$ litellm --config /path/to/config.yaml
|
$ litellm --config /path/to/config.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
### Call specific model
|
#### Step 3: Start Proxy with config
|
||||||
|
|
||||||
If you're repo let's you set model name, you can call the specific model by just passing in that model's name -
|
If you're repo let's you set model name, you can call the specific model by just passing in that model's name -
|
||||||
|
|
||||||
|
**Setting model name**
|
||||||
```python
|
```python
|
||||||
import openai
|
import openai
|
||||||
openai.api_base = "http://0.0.0.0:8000"
|
openai.api_base = "http://0.0.0.0:8000"
|
||||||
|
@ -425,8 +407,8 @@ completion = openai.ChatCompletion.create(model="zephyr-alpha", messages=[{"role
|
||||||
print(completion.choices[0].message.content)
|
print(completion.choices[0].message.content)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Setting API Base with model name**
|
||||||
If you're repo only let's you specify api base, then you can add the model name to the api base passed in -
|
If you're repo only let's you specify api base, then you can add the model name to the api base passed in -
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import openai
|
import openai
|
||||||
openai.api_base = "http://0.0.0.0:8000/openai/deployments/zephyr-alpha/chat/completions" # zephyr-alpha will be used
|
openai.api_base = "http://0.0.0.0:8000/openai/deployments/zephyr-alpha/chat/completions" # zephyr-alpha will be used
|
||||||
|
@ -436,10 +418,10 @@ print(completion.choices[0].message.content)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Save Model-specific params (API Base, API Keys, Temperature, etc.)
|
### Save Model-specific params (API Base, API Keys, Temperature, etc.)
|
||||||
Use the [router_config_template.yaml](https://github.com/BerriAI/litellm/blob/main/router_config_template.yaml) to save model-specific information like api_base, api_key, temperature, max_tokens, etc.
|
You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc.
|
||||||
|
|
||||||
**Step 1**: Create a `config.yaml` file
|
**Step 1**: Create a `config.yaml` file
|
||||||
```shell
|
```yaml
|
||||||
model_list:
|
model_list:
|
||||||
- model_name: gpt-3.5-turbo
|
- model_name: gpt-3.5-turbo
|
||||||
litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
|
litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
|
||||||
|
@ -475,6 +457,33 @@ model_list:
|
||||||
api_base: your_api_base # url where model is deployed
|
api_base: your_api_base # url where model is deployed
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Set Custom Prompt Templates
|
||||||
|
|
||||||
|
LiteLLM by default checks if a model has a [prompt template and applies it](./completion/prompt_formatting.md) (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the `config.yaml`:
|
||||||
|
|
||||||
|
**Step 1**: Save your prompt template in a `config.yaml`
|
||||||
|
```yaml
|
||||||
|
# Model-specific parameters
|
||||||
|
model_list:
|
||||||
|
- model_name: mistral-7b # model alias
|
||||||
|
litellm_params: # actual params for litellm.completion()
|
||||||
|
model: "huggingface/mistralai/Mistral-7B-Instruct-v0.1"
|
||||||
|
api_base: "<your-api-base>"
|
||||||
|
api_key: "<your-api-key>" # [OPTIONAL] for hf inference endpoints
|
||||||
|
initial_prompt_value: "\n"
|
||||||
|
roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
|
||||||
|
final_prompt_value: "\n"
|
||||||
|
bos_token: "<s>"
|
||||||
|
eos_token: "</s>"
|
||||||
|
max_tokens: 4096
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 2**: Start server with config
|
||||||
|
|
||||||
|
```shell
|
||||||
|
$ litellm --config /path/to/config.yaml
|
||||||
|
```
|
||||||
|
|
||||||
## Proxy CLI Arguments
|
## Proxy CLI Arguments
|
||||||
|
|
||||||
#### --host
|
#### --host
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue