(docs) simple proxy

2025-04-24 18:24:20 +00:00 · 2023-11-29 16:36:06 -08:00 · 2023-11-29 16:36:06 -08:00 · 2d0432c5b7
commit 2d0432c5b7
parent 52c9159a54
1 changed files with 13 additions and 155 deletions
--- a/docs/my-website/docs/simple_proxy.md
+++ b/docs/my-website/docs/simple_proxy.md
@ -237,169 +237,27 @@ $ litellm --model command-nightly
 ## Using with OpenAI compatible projects
-LiteLLM allows you to set `openai.api_base` to the proxy server and use all LiteLLM supported LLMs in any OpenAI supported project
+Set `base_url` to the LiteLLM Proxy server
 <Tabs>
-
+<TabItem value="openai" label="OpenAI v1.0.0+">
 <TabItem value="flask evals" label="FLASK Evals">
 FLASK - Fine-grained Language Model Evaluation 
 Use litellm to evaluate any LLM on FLASK https://github.com/kaistAI/FLASK 
 **Step 1: Start the local proxy**
 ```shell
 $ litellm --model huggingface/bigcode/starcoder
 ```
 **Step 2: Set OpenAI API Base & Key**
 ```shell
 $ export OPENAI_API_BASE=http://0.0.0.0:8000
 ```
 **Step 3 Run with FLASK** 
 ```shell
 git clone https://github.com/kaistAI/FLASK
 ```
 ```shell
 cd FLASK/gpt_review
 ```
 Run the eval 
 ```shell
 python gpt4_eval.py -q '../evaluation_set/flask_evaluation.jsonl'
 ```
 </TabItem>
 <TabItem value="FastEval" label="Fast Eval">
 **Step 1: Start the local proxy**
 see supported models [here](https://docs.litellm.ai/docs/simple_proxy)
 ```shell
 $ litellm --model huggingface/bigcode/starcoder
 ```
 **Step 2: Set OpenAI API Base & Key**
 ```shell
 $ export OPENAI_API_BASE=http://0.0.0.0:8000
 ```
 Set this to anything since the proxy has the credentials
 ```shell
 export OPENAI_API_KEY=anything
 ```
 **Step 3 Run with FastEval** 
 **Clone FastEval**
 ```shell
 # Clone this repository, make it the current working directory
 git clone --depth 1 https://github.com/FastEval/FastEval.git
 cd FastEval
 ```
 **Set API Base on FastEval**
 On FastEval make the following **2 line code change** to set `OPENAI_API_BASE`
 https://github.com/FastEval/FastEval/pull/90/files
 ```python
 try:
    api_base = os.environ["OPENAI_API_BASE"] #changed: read api base from .env
    if api_base == None:
        api_base = "https://api.openai.com/v1"
    response = await self.reply_two_attempts_with_different_max_new_tokens(
        conversation=conversation,
        api_base=api_base, # #changed: pass api_base
        api_key=os.environ["OPENAI_API_KEY"],
        temperature=temperature,
        max_new_tokens=max_new_tokens,
 ```
 **Run FastEval**
 Set `-b` to the benchmark you want to run. Possible values are `mt-bench`, `human-eval-plus`, `ds1000`, `cot`, `cot/gsm8k`, `cot/math`, `cot/bbh`, `cot/mmlu` and `custom-test-data`
 Since LiteLLM provides an OpenAI compatible proxy `-t` and `-m` don't need to change
 `-t` will remain openai
 `-m` will remain gpt-3.5
 ```shell
 ./fasteval -b human-eval-plus -t openai -m gpt-3.5-turbo
 ```
 </TabItem>
 <TabItem value="mlflow" label="ML Flow Eval">
 MLflow provides an API `mlflow.evaluate()` to help evaluate your LLMs https://mlflow.org/docs/latest/llms/llm-evaluate/index.html
 #### Pre Requisites
 ```shell
 pip install litellm
 ```
 ```shell
 pip install mlflow
 ```
 #### Step 1: Start LiteLLM Proxy on the CLI
 LiteLLM allows you to create an OpenAI compatible server for all supported LLMs. [More information on litellm proxy here](https://docs.litellm.ai/docs/simple_proxy)
 ```shell
 $ litellm --model huggingface/bigcode/starcoder
 #INFO: Proxy running on http://0.0.0.0:8000
 ```
 #### Step 2: Run ML Flow
 Before running the eval we will set `openai.api_base` to the litellm proxy from Step 1
 ```python
 openai.api_base = "http://0.0.0.0:8000"
 ```
 ```python
 import openai
-import pandas as pd
+client = openai.OpenAI(
-openai.api_key = "anything"             # this can be anything, we set the key on the proxy
+    api_key="anything",
-openai.api_base = "http://0.0.0.0:8000" # set api base to the proxy from step 1
+    base_url="http://0.0.0.0:8000"
 import mlflow
 eval_data = pd.DataFrame(
    {
        "inputs": [
            "What is the largest country",
            "What is the weather in sf?",
        ],
        "ground_truth": [
            "India is a large country",
            "It's cold in SF today"
        ],
    }
 )
-with mlflow.start_run() as run:
+# request sent to model set on litellm proxy, `litellm --model`
-    system_prompt = "Answer the following question in two sentences"
+response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
-    logged_model_info = mlflow.openai.log_model(
+    {
-        model="gpt-3.5",
+        "role": "user",
-        task=openai.ChatCompletion,
+        "content": "this is a test request, write a short poem"
-        artifact_path="model",
+    }
-        messages=[
+])
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": "{question}"},
        ],
    )
    # Use predefined question-answering metrics to evaluate our model.
    results = mlflow.evaluate(
        logged_model_info.model_uri,
        eval_data,
        targets="ground_truth",
        model_type="question-answering",
    )
    print(f"See aggregated evaluation results below: \n{results.metrics}")
    # Evaluation result for each data record is available in `results.tables`.
    eval_table = results.tables["eval_results_table"]
    print(f"See evaluation table below: \n{eval_table}")
 print(response)
 ```
 </TabItem>