docs(simple_proxy.md): adding docs

2023-11-03 14:03:48 -07:00 · 2023-11-03 14:03:48 -07:00 · 22fd8953c1
commit 22fd8953c1
parent 7ed8f8dac8
1 changed files with 42 additions and 2 deletions
--- a/docs/my-website/docs/simple_proxy.md
+++ b/docs/my-website/docs/simple_proxy.md
@ -4,7 +4,7 @@ import TabItem from '@theme/TabItem';

 # 💥 Evaluate LLMs - OpenAI Compatible Server

-LiteLLM Server, is a simple, fast, and lightweight **OpenAI-compatible server** to call 100+ LLM APIs in the OpenAI Input/Output format
+A simple, fast, and lightweight **OpenAI-compatible server** to call 100+ LLM APIs.

 LiteLLM Server supports:

@ -149,7 +149,7 @@ $ litellm --model command-nightly

 [**Jump to Code**](https://github.com/BerriAI/litellm/blob/fef4146396d5d87006259e00095a62e3900d6bb4/litellm/proxy.py#L36)

-# LM-Evaluation Harness with TGI
+# [TUTORIAL] LM-Evaluation Harness with TGI

 Evaluate LLMs 20x faster with TGI via litellm proxy's `/completions` endpoint. 

@ -209,6 +209,46 @@ model_list:
 $ litellm --config /path/to/config.yaml
 ```

+## Multiple Models 
+
+Evaluate between multiple models. 
+
+If you have 1 model running on a local GPU and another that's hosted (e.g. on Runpod), you can call both via the same litellm server by listing them in your `config.yaml`. 
+
+```yaml
+model_list:
+  - model_name: zephyr-alpha
+    litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
+      model: huggingface/HuggingFaceH4/zephyr-7b-alpha
+      api_base: http://0.0.0.0:8001
+  - model_name: zephyr-beta
+    litellm_params:
+      model: huggingface/HuggingFaceH4/zephyr-7b-beta
+      api_base: https://<my-hosted-endpoint>
+```
+
+### Evaluate model
+
+If you're repo let's you set model name, you can call the specific model by just passing in that model's name - 
+
+```python
+import openai 
+openai.api_base = "http://0.0.0.0:8000" 
+
+completion = openai.ChatCompletion.create(model="zephyr-alpha", messages=[{"role": "user", "content": "Hello world"}])
+print(completion.choices[0].message.content)
+```
+
+If you're repo only let's you specify api base, then you can add the model name to the api base passed in - 
+
+```python
+import openai 
+openai.api_base = "http://0.0.0.0:8000/openai/deployments/zephyr-alpha/chat/completions" # zephyr-alpha will be used 
+
+completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello world"}])
+print(completion.choices[0].message.content)
+```
+
 ## Save Model-specific params (API Base, API Keys, Temperature, etc.)
 Use the [router_config_template.yaml](https://github.com/BerriAI/litellm/blob/main/router_config_template.yaml) to save model-specific information like api_base, api_key, temperature, max_tokens, etc.