Xi Yan
65d914d7b9
change everything to docker build.yaml
2024-10-25 12:19:17 -07:00
Xi Yan
98ba12eb15
precommit
2024-10-25 12:15:21 -07:00
Xi Yan
37010c4b58
symlink
2024-10-25 12:12:48 -07:00
Xi Yan
d17c2e5675
symlink
2024-10-25 12:12:13 -07:00
Xi Yan
02b2e1c4a4
manifest
2024-10-25 12:10:15 -07:00
Xi Yan
32df363b48
fix templates
2024-10-25 12:09:45 -07:00
Xi Yan
19adb4070a
add build files to templates
2024-10-25 12:08:32 -07:00
Ashwin Bharambe
afae4e3d8e
Update docker build flow a little
2024-10-25 10:06:21 -07:00
Ashwin Bharambe
5bed6c276c
Move function around
2024-10-25 09:18:22 -07:00
Ashwin Bharambe
a387ca22e2
Update docker_base for meta-reference-gpu
2024-10-25 09:13:33 -07:00
Ashwin Bharambe
70d59b0f5d
Make vllm inference better
...
Tests still don't pass completely (some hang) so I think there are some
potential threading issues maybe
2024-10-24 22:52:47 -07:00
Xi Yan
cb43caa2c3
start_container.sh prefix llamastack->distribution name
2024-10-24 21:29:17 -07:00
Sarthak Deshpande
df141b6ef3
Fix for get_agents_session ( #300 )
2024-10-24 18:36:27 -07:00
Justin Lee
b6d8246b82
added templates and enhanced readme ( #307 )
...
Co-authored-by: Justin Lee <justinai@fb.com>
2024-10-24 17:07:06 -07:00
Dinesh Yeduguru
3e1c3fdb3f
completion() for tgi ( #295 )
2024-10-24 16:02:41 -07:00
Xi Yan
cb84034567
[Evals API][3/n] scoring_functions / scoring meta-reference implementations ( #296 )
...
* wip
* dataset validation
* test_scoring
* cleanup
* clean up test
* comments
* error checking
* dataset client
* test client:
* datasetio client
* clean up
* basic scoring function works
* scorer wip
* equality scorer
* score batch impl
* score batch
* update scoring test
* refactor
* validate scorer input
* address comments
* add all rows scores to ScoringResult
* bugfix
* scoring function def rename
2024-10-24 14:52:30 -07:00
Xi Yan
e70420a06e
Update getting_started.md
2024-10-24 14:19:35 -07:00
Xi Yan
8615bc9e08
update manifest for build templates
2024-10-24 14:04:13 -07:00
Ashwin Bharambe
94728d6983
Handle both ipv6 and ipv4 interfaces together
2024-10-24 13:59:01 -07:00
Ashwin Bharambe
0538cc297e
Bump version to 0.0.45
2024-10-24 12:14:18 -07:00
Ashwin Bharambe
205bcfdd4e
Fix score threshold in faiss
2024-10-24 12:11:58 -07:00
Ashwin Bharambe
161aef0aae
Small updates to quantization config
2024-10-24 12:08:56 -07:00
Dalton Flanagan
8eceebec98
Update iOS inference instructions for new quantization
2024-10-24 14:47:27 -04:00
Ashwin Bharambe
8aa8847b4a
Bump version to 0.0.44
2024-10-24 08:41:39 -07:00
Ashwin Bharambe
7afe51c84d
New quantized models ( #301 )
2024-10-24 08:38:56 -07:00
Ashwin Bharambe
05a8d47b98
Add a meta-reference-quantized-gpu distribution
2024-10-23 21:45:50 -07:00
Xi Yan
f5dcc03742
use pytorch/pytorch as base
2024-10-23 20:22:00 -07:00
Xi Yan
0cec86453b
Fix issue w/ routing_table api getting added when router api is not specified ( #298 )
...
* fix issue w/ enforcing api
* cleanup
* inference only yaml
2024-10-23 15:27:22 -07:00
Dinesh Yeduguru
21f2e9adf5
dont set num_predict for all providers ( #294 )
2024-10-23 11:44:04 -07:00
Ashwin Bharambe
ffb561070d
Support structured output for Together ( #289 )
2024-10-22 22:36:38 -07:00
Sarthak Deshpande
2e5e46d896
Added tests for persistence ( #274 )
2024-10-22 19:41:46 -07:00
Xi Yan
821810657f
[Evals API][2/n] datasets / datasetio meta-reference implementation ( #288 )
...
* skeleton dataset / datasetio
* dataset datasetio
* config
* address comments
* delete dataset_utils
* address comments
* naming fix
2024-10-22 16:12:16 -07:00
Sarthak Deshpande
8a01b9e40c
Added implementations for get_agents_session, delete_agents_session and delete_agents ( #267 )
2024-10-22 13:50:43 -07:00
Suraj Subramanian
b81a3bd46a
Fix import conflict for SamplingParams ( #285 )
...
Conflict between llama_models.llama3.api.datatypes.SamplingParams and vllm.sampling_params.SamplingParams results in errors while processing VLLM engine requests
2024-10-22 12:56:00 -07:00
Ashwin Bharambe
c06718fbd5
Add support for Structured Output / Guided decoding ( #281 )
...
Added support for structured output in the API and added a reference implementation for meta-reference.
A few notes:
* Two formats are specified in the API: Json schema and EBNF based grammar
* Implementation only supports Json for now
We use lm-format-enhancer to provide the implementation right now but may change this especially because BNF grammars aren't supported by that library.
Fireworks has support for structured output and Together has limited supported for it too. Subsequent PRs will add these changes. We would like all our inference providers to provide structured output for llama models since it is an extremely important and highly sought-after need by the developers.
2024-10-22 12:53:34 -07:00
Anush
4c3d33e6f4
feat: Qdrant Vector index support ( #221 )
...
This PR adds support for Qdrant - https://qdrant.tech/ to be used as a vector memory.
I've unit-tested the methods to confirm that they work as intended.
To run Qdrant
```
docker run -p 6333:6333 qdrant/qdrant
```
2024-10-22 12:50:19 -07:00
Suraj Subramanian
668a495aba
Add REST api example for chat_completion ( #286 )
2024-10-22 10:35:20 -07:00
Xi Yan
e45f121c77
[Evals API] [1/n] Initial API ( #287 )
...
* type system api
* datasets api
* fix
* datasetio api
* kill reward scoring
* scoring functions + evals
* move jobs, fix errors
2024-10-22 09:31:19 -07:00
Xi Yan
b279d3bc58
Update README.md
2024-10-22 08:01:33 -07:00
Dinesh Yeduguru
1d241bf3fe
add completion() for ollama ( #280 )
2024-10-21 22:26:33 -07:00
raghotham
e2a5a2e10d
first version of readthedocs ( #278 )
2024-10-22 10:15:58 +05:30
Xi Yan
dbb5ce43fc
Bump version to 0.0.43
2024-10-21 19:10:01 -07:00
Xi Yan
a2ff74a686
telemetry WARNING->WARN fix
2024-10-21 18:52:48 -07:00
Xi Yan
b1451afbc8
Update README.md
2024-10-21 18:21:30 -07:00
Xi Yan
4d2bd2d39e
add more distro templates ( #279 )
...
* verify dockers
* together distro verified
* readme
* fireworks distro
* fireworks compose up
* fireworks verified
2024-10-21 18:15:08 -07:00
Xi Yan
cf27d19dd5
fix sse_generator async
2024-10-21 14:03:42 -07:00
Ashwin Bharambe
1944405dca
Update new_api_provider.md
2024-10-21 14:02:51 -07:00
Ashwin Bharambe
606c48309e
Small updates to encourage integration testing
2024-10-21 13:52:33 -07:00
Xi Yan
cb203b14b4
update README.md
2024-10-21 13:51:39 -07:00
Xi Yan
3a7884345a
Update new_api_provider.md
2024-10-21 13:41:56 -07:00