Commit graph

115 commits

Author SHA1 Message Date
Ashwin Bharambe
ffedb81c11
Significantly simpler and malleable test setup (#360)
* Significantly simpler and malleable test setup

* convert memory tests

* refactor fixtures and add support for composable fixtures

* Fix memory to use the newer fixture organization

* Get agents tests working

* Safety tests work

* yet another refactor to make this more general

now it accepts --inference-model, --safety-model options also

* get multiple providers working for meta-reference (for inference + safety)

* Add README.md

---------

Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-11-04 17:36:43 -08:00
Dinesh Yeduguru
c9bf1d7d0b
pgvector fixes (#369)
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-04 17:01:09 -08:00
Xi Yan
c810a4184d
[docs] update documentations (#356)
* move docs -> source

* Add files via upload

* mv image

* Add files via upload

* colocate iOS setup doc

* delete image

* Add files via upload

* fix

* delete image

* Add files via upload

* Update developer_cookbook.md

* toctree

* wip subfolder

* docs update

* subfolder

* updates

* name

* updates

* index

* updates

* refactor structure

* depth

* docs

* content

* docs

* getting started

* distributions

* fireworks

* fireworks

* update

* theme

* theme

* theme

* pdj theme

* pytorch theme

* css

* theme

* agents example

* format

* index

* headers

* copy button

* test tabs

* test tabs

* fix

* tabs

* tab

* tabs

* sphinx_design

* quick start commands

* size

* width

* css

* css

* download models

* asthetic fix

* tab format

* update

* css

* width

* css

* docs

* tab based

* tab

* tabs

* docs

* style

* image

* css

* color

* typo

* update docs

* missing links

* list templates

* links

* links update

* troubleshooting

* fix

* distributions

* docs

* fix table

* kill llamastack-local-gpu/cpu

* Update index.md

* Update index.md

* mv ios_setup.md

* Update ios_setup.md

* Add remote_or_local.gif

* Update ios_setup.md

* release notes

* typos

* Add ios_setup to index

* nav bar

* hide torctree

* ios image

* links update

* rename

* rename

* docs

* rename

* links

* distributions

* distributions

* distributions

* distributions

* remove release

* remote

---------

Co-authored-by: dltn <6599399+dltn@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-11-04 16:52:38 -08:00
Dinesh Yeduguru
ac93dd89cf
fix bedrock impl (#359)
* fix bedrock impl

* fix linter errors

* fix return type and remove debug print
2024-11-03 07:32:30 -08:00
Ashwin Bharambe
bf4f97a2e1 Fix vLLM adapter chat_completion signature 2024-11-01 13:09:03 -07:00
Dalton Flanagan
adecb2a2d3 update for message parsing on ios 2024-11-01 14:37:19 -04:00
Ashwin Bharambe
37b330b4ef
add dynamic clients for all APIs (#348)
* add dynamic clients for all APIs

* fix openapi generator

* inference + memory + agents tests now pass with "remote" providers

* Add docstring which fixes openapi generator :/
2024-10-31 14:46:25 -07:00
Ashwin Bharambe
eccd7dc4a9 Avoid warnings from pydantic for overriding schema
Also fix structured output in completions
2024-10-28 21:39:48 -07:00
Xi Yan
ed833bb758
[Evals API][7/n] braintrust scoring provider (#333)
* wip scoring refactor

* llm as judge, move folders

* test full generation + eval

* extract score regex to llm context

* remove prints, cleanup braintrust in this branch

* braintrust skeleton

* datasetio test fix

* braintrust provider

* remove prints

* dependencies

* change json -> class

* json -> class

* remove initialize

* address nits

* check identifier prefix

* braintrust scoring identifier check, rebase

* udpate MANIFEST

* manifest

* remove braintrust scoring_fn

* remove comments

* tests

* imports fix
2024-10-28 18:59:35 -07:00
Xi Yan
7b8748c53e
[Evals API][6/n] meta-reference llm as judge, registration for ScoringFnDefs (#330)
* wip scoring refactor

* llm as judge, move folders

* test full generation + eval

* extract score regex to llm context

* remove prints, cleanup braintrust in this branch

* change json -> class

* remove initialize

* address nits

* check identifier prefix

* udpate MANIFEST
2024-10-28 14:08:42 -07:00
Ashwin Bharambe
b7d2b83d55 Allow passing provider_registry to resolve_impls() 2024-10-28 11:58:16 -07:00
Dalton Flanagan
44c05c6e7d add vision instruct models for fireworks 2024-10-27 17:54:54 +00:00
Dinesh Yeduguru
9b85d9a841
completion() for fireworks (#329) 2024-10-25 16:12:10 -07:00
Dinesh Yeduguru
7ec79f3b9d
completion() for together (#324)
* completion() for together

* test fixes

* fix client building
2024-10-25 14:21:12 -07:00
Xi Yan
abdf7cddf3
[Evals API][4/n] evals with generation meta-reference impl (#303)
* wip

* dataset validation

* test_scoring

* cleanup

* clean up test

* comments

* error checking

* dataset client

* test client:

* datasetio client

* clean up

* basic scoring function works

* scorer wip

* equality scorer

* score batch impl

* score batch

* update scoring test

* refactor

* validate scorer input

* address comments

* evals with generation

* add all rows scores to ScoringResult

* minor typing

* bugfix

* scoring function def rename

* rebase name

* refactor

* address comments

* Update iOS inference instructions for new quantization

* Small updates to quantization config

* Fix score threshold in faiss

* Bump version to 0.0.45

* Handle both ipv6 and ipv4 interfaces together

* update manifest for build templates

* Update getting_started.md

* chatcompletion & completion input type validation

* inclusion->subsetof

* error checking

* scoring_function -> scoring_fn rename, scorer -> scoring_fn rename

* address comments

* [Evals API][5/n] fixes to generate openapi spec (#323)

* generate openapi

* typing comment, dataset -> dataset_id

* remove custom type

* sample eval run.yaml

---------

Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-10-25 13:12:39 -07:00
Sachin Mehta
c05fbf14b3
Added hadamard transform for spinquant (#326)
* Added hadamard transform for spinquant

* Changed from config to model_args

* Added an assertion for model args

* Use enum.value to check against str

* pre-commit

---------

Co-authored-by: Sachin Mehta <sacmehta@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-10-25 12:58:48 -07:00
Xi Yan
07f9bf723f
fix broken --list-templates with adding build.yaml files for packaging (#327)
* add build files to templates

* fix templates

* manifest

* symlink

* symlink

* precommit

* change everything to docker build.yaml

* remove image_type in templates

* fix build from templates CLI

* fix readmes
2024-10-25 12:51:22 -07:00
Ashwin Bharambe
70d59b0f5d Make vllm inference better
Tests still don't pass completely (some hang) so I think there are some
potential threading issues maybe
2024-10-24 22:52:47 -07:00
Sarthak Deshpande
df141b6ef3
Fix for get_agents_session (#300) 2024-10-24 18:36:27 -07:00
Dinesh Yeduguru
3e1c3fdb3f
completion() for tgi (#295) 2024-10-24 16:02:41 -07:00
Xi Yan
cb84034567
[Evals API][3/n] scoring_functions / scoring meta-reference implementations (#296)
* wip

* dataset validation

* test_scoring

* cleanup

* clean up test

* comments

* error checking

* dataset client

* test client:

* datasetio client

* clean up

* basic scoring function works

* scorer wip

* equality scorer

* score batch impl

* score batch

* update scoring test

* refactor

* validate scorer input

* address comments

* add all rows scores to ScoringResult

* bugfix

* scoring function def rename
2024-10-24 14:52:30 -07:00
Ashwin Bharambe
205bcfdd4e Fix score threshold in faiss 2024-10-24 12:11:58 -07:00
Dalton Flanagan
8eceebec98
Update iOS inference instructions for new quantization 2024-10-24 14:47:27 -04:00
Ashwin Bharambe
7afe51c84d
New quantized models (#301) 2024-10-24 08:38:56 -07:00
Ashwin Bharambe
05a8d47b98 Add a meta-reference-quantized-gpu distribution 2024-10-23 21:45:50 -07:00
Dinesh Yeduguru
21f2e9adf5
dont set num_predict for all providers (#294) 2024-10-23 11:44:04 -07:00
Ashwin Bharambe
ffb561070d
Support structured output for Together (#289) 2024-10-22 22:36:38 -07:00
Sarthak Deshpande
2e5e46d896
Added tests for persistence (#274) 2024-10-22 19:41:46 -07:00
Xi Yan
821810657f
[Evals API][2/n] datasets / datasetio meta-reference implementation (#288)
* skeleton dataset / datasetio

* dataset datasetio

* config

* address comments

* delete dataset_utils

* address comments

* naming fix
2024-10-22 16:12:16 -07:00
Sarthak Deshpande
8a01b9e40c
Added implementations for get_agents_session, delete_agents_session and delete_agents (#267) 2024-10-22 13:50:43 -07:00
Suraj Subramanian
b81a3bd46a
Fix import conflict for SamplingParams (#285)
Conflict between llama_models.llama3.api.datatypes.SamplingParams and vllm.sampling_params.SamplingParams results in errors while processing VLLM engine requests
2024-10-22 12:56:00 -07:00
Ashwin Bharambe
c06718fbd5
Add support for Structured Output / Guided decoding (#281)
Added support for structured output in the API and added a reference implementation for meta-reference.

A few notes:

* Two formats are specified in the API: Json schema and EBNF based grammar
* Implementation only supports Json for now
We use lm-format-enhancer to provide the implementation right now but may change this especially because BNF grammars aren't supported by that library.
Fireworks has support for structured output and Together has limited supported for it too. Subsequent PRs will add these changes. We would like all our inference providers to provide structured output for llama models since it is an extremely important and highly sought-after need by the developers.
2024-10-22 12:53:34 -07:00
Anush
4c3d33e6f4
feat: Qdrant Vector index support (#221)
This PR adds support for Qdrant - https://qdrant.tech/ to be used as a vector memory.

I've unit-tested the methods to confirm that they work as intended.

To run Qdrant

```
docker run -p 6333:6333 qdrant/qdrant
```
2024-10-22 12:50:19 -07:00
Dinesh Yeduguru
1d241bf3fe
add completion() for ollama (#280) 2024-10-21 22:26:33 -07:00
Xi Yan
a2ff74a686 telemetry WARNING->WARN fix 2024-10-21 18:52:48 -07:00
Xi Yan
4d2bd2d39e
add more distro templates (#279)
* verify dockers

* together distro verified

* readme

* fireworks distro

* fireworks compose up

* fireworks verified
2024-10-21 18:15:08 -07:00
Xi Yan
23210e8679
llama stack distributions / templates / docker refactor (#266)
* docker compose ollama

* comment

* update compose file

* readme for distributions

* readme

* move distribution folders

* move distribution/templates to distributions/

* rename

* kill distribution/templates

* readme

* readme

* build/developer cookbook/new api provider

* developer cookbook

* readme

* readme

* [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264)

* fix case where memory bank is registered without provider_id

* memory test

* agents unit test

* Add an option to not use elastic agents for meta-reference inference (#269)

* Allow overridding checkpoint_dir via config

* Small rename

* Make all methods `async def` again; add completion() for meta-reference (#270)

PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def".

The rationale was that this allowed the user (within llama-stack) of this to use it as:

```
async for chunk in api.chat_completion(params)
```

However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like:

```
async for chunk in await api.chat_completion(params)
```

Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)

* Improve an important error message

* update ollama for llama-guard3

* Add vLLM inference provider for OpenAI compatible vLLM server (#178)

This PR adds vLLM inference provider for OpenAI compatible vLLM server.

* Create .readthedocs.yaml

Trying out readthedocs

* Update event_logger.py (#275)

spelling error

* vllm

* build templates

* delete templates

* tmp add back build to avoid merge conflicts

* vllm

* vllm

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: raghotham <rsm@meta.com>
Co-authored-by: nehal-a2z <nehal@coderabbit.ai>
2024-10-21 11:17:53 -07:00
Yuan Tang
a27a2cd2af
Add vLLM inference provider for OpenAI compatible vLLM server (#178)
This PR adds vLLM inference provider for OpenAI compatible vLLM server.
2024-10-20 18:43:25 -07:00
Ashwin Bharambe
59c43736e8 update ollama for llama-guard3 2024-10-19 17:26:18 -07:00
Ashwin Bharambe
2089427d60
Make all methods async def again; add completion() for meta-reference (#270)
PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def".

The rationale was that this allowed the user (within llama-stack) of this to use it as:

```
async for chunk in api.chat_completion(params)
```

However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like:

```
async for chunk in await api.chat_completion(params)
```

Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)
2024-10-18 20:50:59 -07:00
Ashwin Bharambe
95a96afe34 Small rename 2024-10-18 14:41:38 -07:00
Ashwin Bharambe
71a905e93f Allow overridding checkpoint_dir via config 2024-10-18 14:28:06 -07:00
Ashwin Bharambe
33afd34e6f
Add an option to not use elastic agents for meta-reference inference (#269) 2024-10-18 12:51:10 -07:00
Xi Yan
be3c5c034d
[bugfix] fix case for agent when memory bank registered without specifying provider_id (#264)
* fix case where memory bank is registered without provider_id

* memory test

* agents unit test
2024-10-17 17:28:17 -07:00
Ashwin Bharambe
9fcf5d58e0 Allow overriding MODEL_IDS for inference test 2024-10-17 10:03:27 -07:00
Ashwin Bharambe
09b793c4d6 Fix fp8 implementation which had bit-rotten a bit
I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load
from fp8" codepath.

YAML I tested with:

```
providers:
  - provider_id: quantized
    provider_type: meta-reference-quantized
    config:
      model: Llama3.1-8B-Instruct
      quantization:
        type: fp8
```
2024-10-15 13:57:01 -07:00
Yuan Tang
80ada04f76
Remove request arg from chat completion response processing (#240)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-10-15 13:03:17 -07:00
Yuan Tang
2128e61da2
Fix incorrect completion() signature for Databricks provider (#236) 2024-10-11 08:47:57 -07:00
Xi Yan
ca29980c6b fix agents context retriever 2024-10-10 20:17:29 -07:00
Ashwin Bharambe
1ff0476002 Split off meta-reference-quantized provider 2024-10-10 16:03:19 -07:00