(docs) use fast eval

2023-11-11 12:06:17 -08:00 · 2023-11-11 12:06:17 -08:00 · b6e6c7bb86
commit b6e6c7bb86
parent 96bca9a836
1 changed files with 57 additions and 1 deletions
--- a/docs/my-website/docs/tutorials/lm_evaluation_harness.md
+++ b/docs/my-website/docs/tutorials/lm_evaluation_harness.md
@ -1,7 +1,7 @@
 import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';

-# Benchmark LLMs - LM Harness, Flask
+# Benchmark LLMs - LM Harness, FastEval, Flask

 ## LM Harness Benchmarks
 Evaluate LLMs 20x faster with TGI via litellm proxy's `/completions` endpoint. 
@ -9,6 +9,7 @@ Evaluate LLMs 20x faster with TGI via litellm proxy's `/completions` endpoint.
 This tutorial assumes you're using the `big-refactor` branch of [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor)

 **Step 1: Start the local proxy**
+see supported models [here](https://docs.litellm.ai/docs/simple_proxy)
 ```shell
 $ litellm --model huggingface/bigcode/starcoder
 ```
@ -41,6 +42,61 @@ python3 -m lm_eval \
  --task crows_pairs_english_age

 ```
+## FastEval
+
+**Step 1: Start the local proxy**
+see supported models [here](https://docs.litellm.ai/docs/simple_proxy)
+```shell
+$ litellm --model huggingface/bigcode/starcoder
+```
+
+**Step 2: Set OpenAI API Base & Key**
+```shell
+$ export OPENAI_API_BASE=http://0.0.0.0:8000
+```
+
+Set this to anything since the proxy has the credentials
+```shell
+export OPENAI_API_KEY=anything
+```
+
+**Step 3 Run with FastEval** 
+
+**Clone FastEval**
+```shell
+# Clone this repository, make it the current working directory
+git clone --depth 1 https://github.com/FastEval/FastEval.git
+cd FastEval
+```
+
+**Set API Base on FastEval**
+
+On FastEval make the following **2 line code change** to set `OPENAI_API_BASE`
+
+https://github.com/FastEval/FastEval/pull/90/files
+```python
+try:
+    api_base = os.environ["OPENAI_API_BASE"] #changed: read api base from .env
+    if api_base == None:
+        api_base = "https://api.openai.com/v1"
+    response = await self.reply_two_attempts_with_different_max_new_tokens(
+        conversation=conversation,
+        api_base=api_base, # #changed: pass api_base
+        api_key=os.environ["OPENAI_API_KEY"],
+        temperature=temperature,
+        max_new_tokens=max_new_tokens,
+```
+
+**Run FastEval**
+Set `-b` to the benchmark you want to run. Possible values are `mt-bench`, `human-eval-plus`, `ds1000`, `cot`, `cot/gsm8k`, `cot/math`, `cot/bbh`, `cot/mmlu` and `custom-test-data`
+
+Since LiteLLM provides an OpenAI compatible proxy `-t` and `-m` don't need to change
+`-t` will remain openai
+`-m` will remain gpt-3.5
+
+```shell
+./fasteval -b human-eval-plus -t openai -m gpt-3.5-turbo
+```

 ## FLASK - Fine-grained Language Model Evaluation 
 Use litellm to evaluate any LLM on FLASK https://github.com/kaistAI/FLASK