forked from phoenix/litellm-mirror
Update README.md
This commit is contained in:
parent
928cc4b887
commit
c02f222b20
1 changed files with 1 additions and 38 deletions
39
README.md
39
README.md
|
@ -77,45 +77,8 @@ for part in response:
|
||||||
print(part.choices[0].delta.content or "")
|
print(part.choices[0].delta.content or "")
|
||||||
```
|
```
|
||||||
|
|
||||||
# Router - load balancing([Docs](https://docs.litellm.ai/docs/routing))
|
|
||||||
LiteLLM allows you to load balance between multiple deployments (Azure, OpenAI). It picks the deployment which is below rate-limit and has the least amount of tokens used.
|
|
||||||
```python
|
|
||||||
from litellm import Router
|
|
||||||
|
|
||||||
model_list = [{ # list of model deployments
|
|
||||||
"model_name": "gpt-3.5-turbo", # model alias
|
|
||||||
"litellm_params": { # params for litellm completion/embedding call
|
|
||||||
"model": "azure/chatgpt-v-2", # actual model name
|
|
||||||
"api_key": os.getenv("AZURE_API_KEY"),
|
|
||||||
"api_version": os.getenv("AZURE_API_VERSION"),
|
|
||||||
"api_base": os.getenv("AZURE_API_BASE")
|
|
||||||
}
|
|
||||||
}, {
|
|
||||||
"model_name": "gpt-3.5-turbo",
|
|
||||||
"litellm_params": { # params for litellm completion/embedding call
|
|
||||||
"model": "azure/chatgpt-functioncalling",
|
|
||||||
"api_key": os.getenv("AZURE_API_KEY"),
|
|
||||||
"api_version": os.getenv("AZURE_API_VERSION"),
|
|
||||||
"api_base": os.getenv("AZURE_API_BASE")
|
|
||||||
}
|
|
||||||
}, {
|
|
||||||
"model_name": "gpt-3.5-turbo",
|
|
||||||
"litellm_params": { # params for litellm completion/embedding call
|
|
||||||
"model": "gpt-3.5-turbo",
|
|
||||||
"api_key": os.getenv("OPENAI_API_KEY"),
|
|
||||||
}
|
|
||||||
}]
|
|
||||||
|
|
||||||
router = Router(model_list=model_list)
|
|
||||||
|
|
||||||
# openai.ChatCompletion.create replacement
|
|
||||||
response = router.completion(model="gpt-3.5-turbo",
|
|
||||||
messages=[{"role": "user", "content": "Hey, how's it going?"}])
|
|
||||||
|
|
||||||
print(response)
|
|
||||||
```
|
|
||||||
|
|
||||||
## OpenAI Proxy - ([Docs](https://docs.litellm.ai/docs/simple_proxy))
|
## OpenAI Proxy - ([Docs](https://docs.litellm.ai/docs/simple_proxy))
|
||||||
|
|
||||||
LiteLLM Proxy manages:
|
LiteLLM Proxy manages:
|
||||||
* Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
|
* Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format
|
||||||
* Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
|
* Load balancing - between Multiple Models + Deployments of the same model LiteLLM proxy can handle 1k+ requests/second during load tests
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue