Commit graph

452 commits

Author SHA1 Message Date
Suraj Subramanian
b78ee3a0a5
fix duplicate deploy in compose.yaml (#417) 2024-11-11 10:51:14 -08:00
Ashwin Bharambe
c1f7ba3aed
Split safety into (llama-guard, prompt-guard, code-scanner) (#400)
Splits the meta-reference safety implementation into three distinct providers:

- inline::llama-guard
- inline::prompt-guard
- inline::code-scanner

Note that this PR is a backward incompatible change to the llama stack server. I have added deprecation_error field to ProviderSpec -- the server reads it and immediately barfs. This is used to direct the user with a specific message on what action to perform. An automagical "config upgrade" is a bit too much work to implement right now :/

(Note that we will be gradually prefixing all inline providers with inline:: -- I am only doing this for this set of new providers because otherwise existing configuration files will break even more badly.)
2024-11-11 09:29:18 -08:00
Justin Lee
6d38b1690b
added quickstart w ollama and toolcalling using together (#413)
* added quickstart w ollama and toolcalling using together

* corrected url for colab

---------

Co-authored-by: Justin Lee <justinai@fb.com>
2024-11-09 10:52:26 -08:00
Xi Yan
b0b9c905b3 docs 2024-11-09 10:22:41 -08:00
Xi Yan
cc61fd8083 docs 2024-11-09 09:00:18 -08:00
Xi Yan
0c14761453 docs 2024-11-09 08:57:51 -08:00
Ashwin Bharambe
4986e46188
Distributions updates (slight updates to ollama, add inline-vllm and remote-vllm) (#408)
* remote vllm distro

* add inline-vllm details, fix things

* Write some docs
2024-11-08 18:09:39 -08:00
Xi Yan
ba82021d4b precommit 2024-11-08 17:58:58 -08:00
Xi Yan
1ebf6447c5 add missing inits 2024-11-08 17:54:24 -08:00
Xi Yan
89c3129f0b add missing inits 2024-11-08 17:49:29 -08:00
Xi Yan
f6aaa9c708 Bump version to 0.0.50 2024-11-08 17:28:39 -08:00
Justin Lee
65371a5067
[Docs] Zero-to-Hero notebooks and quick start documentation (#368)
Co-authored-by: Kai Wu <kaiwu@meta.com>
Co-authored-by: Sanyam Bhutani <sanyambhutani@meta.com>
Co-authored-by: Justin Lee <justinai@fb.com>
2024-11-08 17:16:44 -08:00
Dinesh Yeduguru
ec644d3418
migrate model to Resource and new registration signature (#410)
* resource oriented object design for models

* add back llama_model field

* working tests

* register singature fix

* address feedback

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-08 16:12:57 -08:00
Xi Yan
bd0622ef10 update docs 2024-11-08 12:47:05 -08:00
Dalton Flanagan
5625aef48a
Add pip install helper for test and direct scenarios (#404)
* initial branch commit

* pip install helptext

* remove print

* pre-commit
2024-11-08 15:18:21 -05:00
Dinesh Yeduguru
d800a16acd
Resource oriented design for shields (#399)
* init

* working bedrock tests

* bedrock test for inference fixes

* use env vars for bedrock guardrail vars

* add register in meta reference

* use correct shield impl in meta ref

* dont add together fixture

* right naming

* minor updates

* improved registration flow

* address feedback

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-08 12:16:11 -08:00
Xi Yan
7ee9f8d8ac rename 2024-11-08 10:34:48 -08:00
Xi Yan
b1d7376730 kill tgi/cpu 2024-11-08 10:33:45 -08:00
Xi Yan
6192bf43a4
[Evals API][10/n] API updates for EvalTaskDef + new test migration (#379)
* wip

* scoring fn api

* eval api

* eval task

* evaluate api update

* pre commit

* unwrap context -> config

* config field doc

* typo

* naming fix

* separate benchmark / app eval

* api name

* rename

* wip tests

* wip

* datasetio test

* delete unused

* fixture

* scoring resolve

* fix scoring register

* scoring test pass

* score batch

* scoring fix

* fix eval

* test eval works

* remove type ignore

* api refactor

* add default task_eval_id for routing

* add eval_id for jobs

* remove type ignore

* only keep 1 run_eval

* fix optional

* register task required

* register task required

* delete old tests

* delete old tests

* fixture return impl
2024-11-07 21:24:12 -08:00
Xi Yan
8350f2df4c
[docs] refactor remote-hosted distro (#402)
* move docs

* docs
2024-11-07 19:16:38 -08:00
Dalton Flanagan
345ae07317
Factor out create_dist_registry (#398) 2024-11-07 16:13:19 -05:00
Ashwin Bharambe
694c142b89
Add provider deprecation support; change directory structure (#397)
* Add provider deprecation support; change directory structure

* fix a couple dangling imports

* move the meta_reference safety dir also
2024-11-07 13:04:53 -08:00
Xi Yan
36e2538eb0
fix together inference validator (#393) 2024-11-07 11:31:53 -08:00
Yufei (Benny) Chen
31c5fbda5e
[LlamaStack][Fireworks] Update client and add unittest (#390) 2024-11-07 10:11:28 -08:00
Ashwin Bharambe
cfcc0a871c Slightly update PR template 2024-11-06 22:49:01 -08:00
Ashwin Bharambe
489f74a70b Allow simpler initialization of RemoteProviderConfig; fix issue in httpx client 2024-11-06 19:19:26 -08:00
Ashwin Bharambe
064d2a5287
Remove the safety adapter for Together; we can just use "meta-reference" (#387) 2024-11-06 17:36:57 -08:00
Xi Yan
8fc2d212a2
fix safety signature mismatch (#388)
* fix safety sig

* shield_type->identifier
2024-11-06 16:30:47 -08:00
Ashwin Bharambe
7c340f0236 rename test_inference -> test_text_inference 2024-11-06 16:12:50 -08:00
Ashwin Bharambe
3b54ce3499 remote::vllm now works with vision models 2024-11-06 16:07:17 -08:00
Ashwin Bharambe
994732e2e0
impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00
Ashwin Bharambe
b10e9f46bb
Enable remote::vllm (#384)
* Enable remote::vllm

* Kill the giant list of hard coded models
2024-11-06 14:42:44 -08:00
Dinesh Yeduguru
093c9f1987
add bedrock distribution code (#358)
* add bedrock distribution code

* fix linter error

* add bedrock shields support

* linter fixes

* working bedrock safety

* change to return only one violation

* remove env var reading

* refereshable boto credentials

* remove env vars

* address raghu's feedback

* fix session_ttl passing

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-06 14:39:11 -08:00
Dinesh Yeduguru
6ebd553da5
fix routing tables look up key for memory bank (#383)
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-06 13:32:46 -08:00
Xi Yan
748606195b
Kill llama stack configure (#371)
* remove configure

* build msg

* wip

* build->run

* delete prints

* docs

* fix docs, kill configure

* precommit

* update fireworks build

* docs

* clean up build

* comments

* fix

* test

* remove baking build.yaml into docker

* fix msg, urls

* configure msg
2024-11-06 13:32:10 -08:00
Ashwin Bharambe
d289afdbde Fix exception in server when client SSE connection closes 2024-11-06 11:00:34 -08:00
Ashwin Bharambe
cde9bc1388
Enable vision models for (Together, Fireworks, Meta-Reference, Ollama) (#376)
* Enable vision models for Together and Fireworks

* Works with ollama 0.4.0 pre-release with the vision model

* localize media for meta_reference inference

* Fix
2024-11-05 16:22:33 -08:00
Xi Yan
db30809141 precommit 2024-11-05 15:26:13 -08:00
Xi Yan
0706f6c82f add Llama3.2-3B-Instruct:int4-qlora-eo8 2024-11-05 15:22:26 -08:00
Xi Yan
16b7fa4614 quantized model docs 2024-11-05 15:21:13 -08:00
Dinesh Yeduguru
4dd01eeaa1
fix postgres config validation (#380)
* fix postgres config validation

* dont remove types

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-05 15:09:04 -08:00
Dinesh Yeduguru
a2351bf2e9
add ability to persist memory banks created for faiss (#375)
* init

* add tests

* fix tests'

* more fixes

* add tests

* make the default path more faiss specific

* fix linter

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-05 14:50:23 -08:00
Dinesh Yeduguru
dcd8cfe0f3
add postgres kvstoreimpl (#374)
* add postgres kvstoreimpl

* make table name configurable

* add validator for table name

* linter fix

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-05 11:42:21 -08:00
Ashwin Bharambe
8de845a96d Kill everything from tests/ 2024-11-04 22:10:16 -08:00
Ashwin Bharambe
f08efc23a6 Kill non-integration older tests 2024-11-04 22:06:34 -08:00
Steve Grubb
122793ab92
Correct a traceback in vllm (#366)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/adapters/inference/vllm/vllm.py", line 136, in _stream_chat_completion
async for chunk in process_chat_completion_stream_response(
TypeError: process_chat_completion_stream_response() takes 2 positional arguments but 3 were given

This corrects the error by deleting the request variable
2024-11-04 20:49:35 -08:00
Ashwin Bharambe
3ca294c359 Bump version to 0.0.49 2024-11-04 20:38:00 -08:00
Ashwin Bharambe
a81178f1f5 The server now depends on SQLite by default 2024-11-04 20:35:53 -08:00
Ashwin Bharambe
9a57a009ee Need to await for get_object_from_identifier() now 2024-11-04 20:33:12 -08:00
Ashwin Bharambe
7cf4c905f3 add support for remote providers in tests 2024-11-04 20:30:46 -08:00