diff --git a/docs/my-website/docs/tutorials/lm_evaluation_harness.md b/docs/my-website/docs/tutorials/lm_evaluation_harness.md index a3d0c082e..60789356e 100644 --- a/docs/my-website/docs/tutorials/lm_evaluation_harness.md +++ b/docs/my-website/docs/tutorials/lm_evaluation_harness.md @@ -1,7 +1,7 @@ import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; -# Benchmark LLMs - LM Harness, Flask +# Benchmark LLMs - LM Harness, FastEval, Flask ## LM Harness Benchmarks Evaluate LLMs 20x faster with TGI via litellm proxy's `/completions` endpoint. @@ -9,6 +9,7 @@ Evaluate LLMs 20x faster with TGI via litellm proxy's `/completions` endpoint. This tutorial assumes you're using the `big-refactor` branch of [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor) **Step 1: Start the local proxy** +see supported models [here](https://docs.litellm.ai/docs/simple_proxy) ```shell $ litellm --model huggingface/bigcode/starcoder ``` @@ -41,6 +42,61 @@ python3 -m lm_eval \ --task crows_pairs_english_age ``` +## FastEval + +**Step 1: Start the local proxy** +see supported models [here](https://docs.litellm.ai/docs/simple_proxy) +```shell +$ litellm --model huggingface/bigcode/starcoder +``` + +**Step 2: Set OpenAI API Base & Key** +```shell +$ export OPENAI_API_BASE=http://0.0.0.0:8000 +``` + +Set this to anything since the proxy has the credentials +```shell +export OPENAI_API_KEY=anything +``` + +**Step 3 Run with FastEval** + +**Clone FastEval** +```shell +# Clone this repository, make it the current working directory +git clone --depth 1 https://github.com/FastEval/FastEval.git +cd FastEval +``` + +**Set API Base on FastEval** + +On FastEval make the following **2 line code change** to set `OPENAI_API_BASE` + +https://github.com/FastEval/FastEval/pull/90/files +```python +try: + api_base = os.environ["OPENAI_API_BASE"] #changed: read api base from .env + if api_base == None: + api_base = "https://api.openai.com/v1" + response = await self.reply_two_attempts_with_different_max_new_tokens( + conversation=conversation, + api_base=api_base, # #changed: pass api_base + api_key=os.environ["OPENAI_API_KEY"], + temperature=temperature, + max_new_tokens=max_new_tokens, +``` + +**Run FastEval** +Set `-b` to the benchmark you want to run. Possible values are `mt-bench`, `human-eval-plus`, `ds1000`, `cot`, `cot/gsm8k`, `cot/math`, `cot/bbh`, `cot/mmlu` and `custom-test-data` + +Since LiteLLM provides an OpenAI compatible proxy `-t` and `-m` don't need to change +`-t` will remain openai +`-m` will remain gpt-3.5 + +```shell +./fasteval -b human-eval-plus -t openai -m gpt-3.5-turbo +``` ## FLASK - Fine-grained Language Model Evaluation Use litellm to evaluate any LLM on FLASK https://github.com/kaistAI/FLASK