forked from phoenix/litellm-mirror
(docs) use fast eval
This commit is contained in:
parent
96bca9a836
commit
b6e6c7bb86
1 changed files with 57 additions and 1 deletions
|
@ -1,7 +1,7 @@
|
||||||
import Tabs from '@theme/Tabs';
|
import Tabs from '@theme/Tabs';
|
||||||
import TabItem from '@theme/TabItem';
|
import TabItem from '@theme/TabItem';
|
||||||
|
|
||||||
# Benchmark LLMs - LM Harness, Flask
|
# Benchmark LLMs - LM Harness, FastEval, Flask
|
||||||
|
|
||||||
## LM Harness Benchmarks
|
## LM Harness Benchmarks
|
||||||
Evaluate LLMs 20x faster with TGI via litellm proxy's `/completions` endpoint.
|
Evaluate LLMs 20x faster with TGI via litellm proxy's `/completions` endpoint.
|
||||||
|
@ -9,6 +9,7 @@ Evaluate LLMs 20x faster with TGI via litellm proxy's `/completions` endpoint.
|
||||||
This tutorial assumes you're using the `big-refactor` branch of [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor)
|
This tutorial assumes you're using the `big-refactor` branch of [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor)
|
||||||
|
|
||||||
**Step 1: Start the local proxy**
|
**Step 1: Start the local proxy**
|
||||||
|
see supported models [here](https://docs.litellm.ai/docs/simple_proxy)
|
||||||
```shell
|
```shell
|
||||||
$ litellm --model huggingface/bigcode/starcoder
|
$ litellm --model huggingface/bigcode/starcoder
|
||||||
```
|
```
|
||||||
|
@ -41,6 +42,61 @@ python3 -m lm_eval \
|
||||||
--task crows_pairs_english_age
|
--task crows_pairs_english_age
|
||||||
|
|
||||||
```
|
```
|
||||||
|
## FastEval
|
||||||
|
|
||||||
|
**Step 1: Start the local proxy**
|
||||||
|
see supported models [here](https://docs.litellm.ai/docs/simple_proxy)
|
||||||
|
```shell
|
||||||
|
$ litellm --model huggingface/bigcode/starcoder
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 2: Set OpenAI API Base & Key**
|
||||||
|
```shell
|
||||||
|
$ export OPENAI_API_BASE=http://0.0.0.0:8000
|
||||||
|
```
|
||||||
|
|
||||||
|
Set this to anything since the proxy has the credentials
|
||||||
|
```shell
|
||||||
|
export OPENAI_API_KEY=anything
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 3 Run with FastEval**
|
||||||
|
|
||||||
|
**Clone FastEval**
|
||||||
|
```shell
|
||||||
|
# Clone this repository, make it the current working directory
|
||||||
|
git clone --depth 1 https://github.com/FastEval/FastEval.git
|
||||||
|
cd FastEval
|
||||||
|
```
|
||||||
|
|
||||||
|
**Set API Base on FastEval**
|
||||||
|
|
||||||
|
On FastEval make the following **2 line code change** to set `OPENAI_API_BASE`
|
||||||
|
|
||||||
|
https://github.com/FastEval/FastEval/pull/90/files
|
||||||
|
```python
|
||||||
|
try:
|
||||||
|
api_base = os.environ["OPENAI_API_BASE"] #changed: read api base from .env
|
||||||
|
if api_base == None:
|
||||||
|
api_base = "https://api.openai.com/v1"
|
||||||
|
response = await self.reply_two_attempts_with_different_max_new_tokens(
|
||||||
|
conversation=conversation,
|
||||||
|
api_base=api_base, # #changed: pass api_base
|
||||||
|
api_key=os.environ["OPENAI_API_KEY"],
|
||||||
|
temperature=temperature,
|
||||||
|
max_new_tokens=max_new_tokens,
|
||||||
|
```
|
||||||
|
|
||||||
|
**Run FastEval**
|
||||||
|
Set `-b` to the benchmark you want to run. Possible values are `mt-bench`, `human-eval-plus`, `ds1000`, `cot`, `cot/gsm8k`, `cot/math`, `cot/bbh`, `cot/mmlu` and `custom-test-data`
|
||||||
|
|
||||||
|
Since LiteLLM provides an OpenAI compatible proxy `-t` and `-m` don't need to change
|
||||||
|
`-t` will remain openai
|
||||||
|
`-m` will remain gpt-3.5
|
||||||
|
|
||||||
|
```shell
|
||||||
|
./fasteval -b human-eval-plus -t openai -m gpt-3.5-turbo
|
||||||
|
```
|
||||||
|
|
||||||
## FLASK - Fine-grained Language Model Evaluation
|
## FLASK - Fine-grained Language Model Evaluation
|
||||||
Use litellm to evaluate any LLM on FLASK https://github.com/kaistAI/FLASK
|
Use litellm to evaluate any LLM on FLASK https://github.com/kaistAI/FLASK
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue