Ashwin Bharambe
7b1a45ee0f
Fix score threshold in faiss
2024-10-24 14:57:10 -07:00
Ashwin Bharambe
86f1efa680
Small updates to quantization config
2024-10-24 14:57:10 -07:00
Dalton Flanagan
1721d91c95
Update iOS inference instructions for new quantization
2024-10-24 14:57:10 -07:00
Xi Yan
d4887fc746
address comments
2024-10-24 14:49:02 -07:00
Xi Yan
ba0186f2c8
refactor
2024-10-24 14:00:41 -07:00
Xi Yan
3db1b3fbcd
rebase name
2024-10-24 13:53:41 -07:00
Xi Yan
97ca72288c
Merge branch 'evals_5' into evals_6
2024-10-24 13:53:00 -07:00
Xi Yan
6053b8dd34
scoring function def rename
2024-10-24 13:51:11 -07:00
Xi Yan
689990b48b
Merge branch 'evals_5' into evals_6
2024-10-24 13:06:11 -07:00
Xi Yan
42bac85e1f
bugfix
2024-10-24 12:16:28 -07:00
Xi Yan
24dce9cb7a
minor typing
2024-10-24 12:08:57 -07:00
Xi Yan
32a496ab0f
Merge branch 'evals_5' into evals_6
2024-10-24 12:01:41 -07:00
Xi Yan
a3a8f32541
add all rows scores to ScoringResult
2024-10-24 11:53:15 -07:00
Xi Yan
737fcb795f
evals with generation
2024-10-24 11:30:13 -07:00
Xi Yan
071dba8871
Merge branch 'main' into evals_5
2024-10-24 09:18:15 -07:00
Ashwin Bharambe
8aa8847b4a
Bump version to 0.0.44
2024-10-24 08:41:39 -07:00
Ashwin Bharambe
7afe51c84d
New quantized models ( #301 )
2024-10-24 08:38:56 -07:00
Xi Yan
afa0c2b146
address comments
2024-10-23 22:17:38 -07:00
Ashwin Bharambe
05a8d47b98
Add a meta-reference-quantized-gpu distribution
2024-10-23 21:45:50 -07:00
Xi Yan
f5dcc03742
use pytorch/pytorch as base
2024-10-23 20:22:00 -07:00
Xi Yan
59c93548bc
validate scorer input
2024-10-23 17:43:41 -07:00
Xi Yan
0ee82571a8
refactor
2024-10-23 17:30:10 -07:00
Xi Yan
7c803cef86
update scoring test
2024-10-23 17:22:48 -07:00
Xi Yan
3c6555c408
score batch
2024-10-23 16:38:00 -07:00
Xi Yan
eb572faf6f
score batch impl
2024-10-23 16:19:25 -07:00
Xi Yan
4b1d7da030
equality scorer
2024-10-23 16:07:17 -07:00
Xi Yan
cad8c8710b
Merge branch 'main' into evals_5
2024-10-23 15:33:36 -07:00
Xi Yan
caf253e08f
Merge branch 'main' into evals_5
2024-10-23 15:33:00 -07:00
Xi Yan
0cec86453b
Fix issue w/ routing_table api getting added when router api is not specified ( #298 )
...
* fix issue w/ enforcing api
* cleanup
* inference only yaml
2024-10-23 15:27:22 -07:00
Xi Yan
35981a1a3b
scorer wip
2024-10-23 15:02:54 -07:00
Xi Yan
70c08e694d
basic scoring function works
2024-10-23 14:42:28 -07:00
Xi Yan
38e31ab525
clean up
2024-10-23 14:08:21 -07:00
Xi Yan
5930a92dc7
datasetio client
2024-10-23 14:04:51 -07:00
Xi Yan
51d5ad67c4
test client:
2024-10-23 13:55:55 -07:00
Xi Yan
bb43369521
dataset client
2024-10-23 13:53:58 -07:00
Xi Yan
c5db025320
error checking
2024-10-23 13:17:47 -07:00
Xi Yan
d8bbce6f7c
comments
2024-10-23 13:16:08 -07:00
Xi Yan
5e1323b5bf
clean up test
2024-10-23 13:08:42 -07:00
Xi Yan
555f6e1531
cleanup
2024-10-23 13:07:15 -07:00
Xi Yan
92e32f80ad
test_scoring
2024-10-23 13:01:49 -07:00
Xi Yan
7c280e18fb
dataset validation
2024-10-23 12:08:39 -07:00
Dinesh Yeduguru
21f2e9adf5
dont set num_predict for all providers ( #294 )
2024-10-23 11:44:04 -07:00
Ashwin Bharambe
ffb561070d
Support structured output for Together ( #289 )
2024-10-22 22:36:38 -07:00
Xi Yan
aefa84e70a
wip
2024-10-22 20:00:43 -07:00
Sarthak Deshpande
2e5e46d896
Added tests for persistence ( #274 )
2024-10-22 19:41:46 -07:00
Xi Yan
821810657f
[Evals API][2/n] datasets / datasetio meta-reference implementation ( #288 )
...
* skeleton dataset / datasetio
* dataset datasetio
* config
* address comments
* delete dataset_utils
* address comments
* naming fix
2024-10-22 16:12:16 -07:00
Sarthak Deshpande
8a01b9e40c
Added implementations for get_agents_session, delete_agents_session and delete_agents ( #267 )
2024-10-22 13:50:43 -07:00
Suraj Subramanian
b81a3bd46a
Fix import conflict for SamplingParams ( #285 )
...
Conflict between llama_models.llama3.api.datatypes.SamplingParams and vllm.sampling_params.SamplingParams results in errors while processing VLLM engine requests
2024-10-22 12:56:00 -07:00
Ashwin Bharambe
c06718fbd5
Add support for Structured Output / Guided decoding ( #281 )
...
Added support for structured output in the API and added a reference implementation for meta-reference.
A few notes:
* Two formats are specified in the API: Json schema and EBNF based grammar
* Implementation only supports Json for now
We use lm-format-enhancer to provide the implementation right now but may change this especially because BNF grammars aren't supported by that library.
Fireworks has support for structured output and Together has limited supported for it too. Subsequent PRs will add these changes. We would like all our inference providers to provide structured output for llama models since it is an extremely important and highly sought-after need by the developers.
2024-10-22 12:53:34 -07:00
Anush
4c3d33e6f4
feat: Qdrant Vector index support ( #221 )
...
This PR adds support for Qdrant - https://qdrant.tech/ to be used as a vector memory.
I've unit-tested the methods to confirm that they work as intended.
To run Qdrant
```
docker run -p 6333:6333 qdrant/qdrant
```
2024-10-22 12:50:19 -07:00