[Evals API][4/n] evals with generation meta-reference impl (#303)

* wip

* dataset validation

* test_scoring

* cleanup

* clean up test

* comments

* error checking

* dataset client

* test client:

* datasetio client

* clean up

* basic scoring function works

* scorer wip

* equality scorer

* score batch impl

* score batch

* update scoring test

* refactor

* validate scorer input

* address comments

* evals with generation

* add all rows scores to ScoringResult

* minor typing

* bugfix

* scoring function def rename

* rebase name

* refactor

* address comments

* Update iOS inference instructions for new quantization

* Small updates to quantization config

* Fix score threshold in faiss

* Bump version to 0.0.45

* Handle both ipv6 and ipv4 interfaces together

* update manifest for build templates

* Update getting_started.md

* chatcompletion & completion input type validation

* inclusion->subsetof

* error checking

* scoring_function -> scoring_fn rename, scorer -> scoring_fn rename

* address comments

* [Evals API][5/n] fixes to generate openapi spec (#323)

* generate openapi

* typing comment, dataset -> dataset_id

* remove custom type

* sample eval run.yaml

---------

Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
This commit is contained in:
Xi Yan 2024-10-25 13:12:39 -07:00 committed by GitHub
parent 426d821e7f
commit abdf7cddf3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
31 changed files with 3371 additions and 1296 deletions

View file

@ -29,7 +29,7 @@ class LLMAsJudgeContext(BaseModel):
@json_schema_type
class ScoringFunctionDef(BaseModel):
class ScoringFnDef(BaseModel):
identifier: str
description: Optional[str] = None
metadata: Dict[str, Any] = Field(
@ -48,7 +48,7 @@ class ScoringFunctionDef(BaseModel):
@json_schema_type
class ScoringFunctionDefWithProvider(ScoringFunctionDef):
class ScoringFnDefWithProvider(ScoringFnDef):
provider_id: str = Field(
description="ID of the provider which serves this dataset",
)
@ -57,14 +57,14 @@ class ScoringFunctionDefWithProvider(ScoringFunctionDef):
@runtime_checkable
class ScoringFunctions(Protocol):
@webmethod(route="/scoring_functions/list", method="GET")
async def list_scoring_functions(self) -> List[ScoringFunctionDefWithProvider]: ...
async def list_scoring_functions(self) -> List[ScoringFnDefWithProvider]: ...
@webmethod(route="/scoring_functions/get", method="GET")
async def get_scoring_function(
self, name: str
) -> Optional[ScoringFunctionDefWithProvider]: ...
) -> Optional[ScoringFnDefWithProvider]: ...
@webmethod(route="/scoring_functions/register", method="POST")
async def register_scoring_function(
self, function_def: ScoringFunctionDefWithProvider
self, function_def: ScoringFnDefWithProvider
) -> None: ...