llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	f4426f6a43	Fix bug in `llama stack build`; SERVER_DEPENDENCIES were dropped	2024-11-11 20:12:13 -08:00
Ashwin Bharambe	506b99242a	Allow specifying TEST / PYPI VERSION for docker name	2024-11-11 19:56:42 -08:00
Ashwin Bharambe	36da9a600e	add explicit platform	2024-11-11 19:30:15 -08:00
Ashwin Bharambe	218803b7c8	add pypi version to docker tag	2024-11-11 19:20:31 -08:00
Ashwin Bharambe	343458479d	Make sure TEST_PYPI_VERSION is used in docker builds	2024-11-11 18:40:13 -08:00
Ashwin Bharambe	285cd26fb2	Replace colon in path so it doesn't cause issue on Windows	2024-11-11 17:33:53 -08:00
Dinesh Yeduguru	0a3b3d5fb6	migrate scoring fns to resource (#422 ) * fix after rebase * remove print --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-11 17:28:48 -08:00
Dinesh Yeduguru	3802edfc50	migrate evals to resource (#421 ) * migrate evals to resource * remove listing of providers's evals * change the order of params in register * fix after rebase * linter fix --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-11 17:24:03 -08:00
Dinesh Yeduguru	b95cb5308f	migrate dataset to resource (#420 ) * migrate dataset to resource * remove auto discovery * remove listing of providers's datasets * fix after rebase --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-11 17:14:41 -08:00
Dinesh Yeduguru	38cce97597	migrate memory banks to Resource and new registration (#411 ) * migrate memory banks to Resource and new registration * address feedback * address feedback * fix tests * pgvector fix * pgvector fix v2 * remove auto discovery * change register signature to make params required * update client * client fix * use annotated union to parse * remove base MemoryBank inheritence --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-11 17:10:44 -08:00
Ashwin Bharambe	c1f7ba3aed	Split safety into (llama-guard, prompt-guard, code-scanner) (#400 ) Splits the meta-reference safety implementation into three distinct providers: - inline::llama-guard - inline::prompt-guard - inline::code-scanner Note that this PR is a backward incompatible change to the llama stack server. I have added deprecation_error field to ProviderSpec -- the server reads it and immediately barfs. This is used to direct the user with a specific message on what action to perform. An automagical "config upgrade" is a bit too much work to implement right now :/ (Note that we will be gradually prefixing all inline providers with inline:: -- I am only doing this for this set of new providers because otherwise existing configuration files will break even more badly.)	2024-11-11 09:29:18 -08:00
Dinesh Yeduguru	ec644d3418	migrate model to Resource and new registration signature (#410 ) * resource oriented object design for models * add back llama_model field * working tests * register singature fix * address feedback --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-08 16:12:57 -08:00
Dalton Flanagan	5625aef48a	Add pip install helper for test and direct scenarios (#404 ) * initial branch commit * pip install helptext * remove print * pre-commit	2024-11-08 15:18:21 -05:00
Dinesh Yeduguru	d800a16acd	Resource oriented design for shields (#399 ) * init * working bedrock tests * bedrock test for inference fixes * use env vars for bedrock guardrail vars * add register in meta reference * use correct shield impl in meta ref * dont add together fixture * right naming * minor updates * improved registration flow * address feedback --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-08 12:16:11 -08:00
Xi Yan	6192bf43a4	[Evals API][10/n] API updates for EvalTaskDef + new test migration (#379 ) * wip * scoring fn api * eval api * eval task * evaluate api update * pre commit * unwrap context -> config * config field doc * typo * naming fix * separate benchmark / app eval * api name * rename * wip tests * wip * datasetio test * delete unused * fixture * scoring resolve * fix scoring register * scoring test pass * score batch * scoring fix * fix eval * test eval works * remove type ignore * api refactor * add default task_eval_id for routing * add eval_id for jobs * remove type ignore * only keep 1 run_eval * fix optional * register task required * register task required * delete old tests * delete old tests * fixture return impl	2024-11-07 21:24:12 -08:00
Dalton Flanagan	345ae07317	Factor out create_dist_registry (#398 )	2024-11-07 16:13:19 -05:00
Ashwin Bharambe	694c142b89	Add provider deprecation support; change directory structure (#397 ) * Add provider deprecation support; change directory structure * fix a couple dangling imports * move the meta_reference safety dir also	2024-11-07 13:04:53 -08:00
Ashwin Bharambe	489f74a70b	Allow simpler initialization of `RemoteProviderConfig`; fix issue in httpx client	2024-11-06 19:19:26 -08:00
Dinesh Yeduguru	093c9f1987	add bedrock distribution code (#358 ) * add bedrock distribution code * fix linter error * add bedrock shields support * linter fixes * working bedrock safety * change to return only one violation * remove env var reading * refereshable boto credentials * remove env vars * address raghu's feedback * fix session_ttl passing --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-06 14:39:11 -08:00
Dinesh Yeduguru	6ebd553da5	fix routing tables look up key for memory bank (#383 ) Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-06 13:32:46 -08:00
Xi Yan	748606195b	Kill `llama stack configure` (#371 ) * remove configure * build msg * wip * build->run * delete prints * docs * fix docs, kill configure * precommit * update fireworks build * docs * clean up build * comments * fix * test * remove baking build.yaml into docker * fix msg, urls * configure msg	2024-11-06 13:32:10 -08:00
Ashwin Bharambe	d289afdbde	Fix exception in server when client SSE connection closes	2024-11-06 11:00:34 -08:00
Ashwin Bharambe	a81178f1f5	The server now depends on SQLite by default	2024-11-04 20:35:53 -08:00
Ashwin Bharambe	9a57a009ee	Need to await for get_object_from_identifier() now	2024-11-04 20:33:12 -08:00
Ashwin Bharambe	7cf4c905f3	add support for remote providers in tests	2024-11-04 20:30:46 -08:00
Ashwin Bharambe	0763a0b85f	Fix for the fix!	2024-11-04 20:06:01 -08:00
Ashwin Bharambe	fb2678b134	Fix shield_type and routing table breakage	2024-11-04 19:57:15 -08:00
Ashwin Bharambe	ffedb81c11	Significantly simpler and malleable test setup (#360 ) * Significantly simpler and malleable test setup * convert memory tests * refactor fixtures and add support for composable fixtures * Fix memory to use the newer fixture organization * Get agents tests working * Safety tests work * yet another refactor to make this more general now it accepts --inference-model, --safety-model options also * get multiple providers working for meta-reference (for inference + safety) * Add README.md --------- Co-authored-by: Ashwin Bharambe <ashwin@meta.com>	2024-11-04 17:36:43 -08:00
Dinesh Yeduguru	663883cc29	persist registered objects with distribution (#354 ) * persist registered objects with distribution * linter fixes * comment * use annotate and field discriminator * workign tests * donot use global state * precommit failures fixed * add back Any * fix imports * remove unnecessary changes in ollama * precommit failures fixed * make kvstore configurable for dist and rename registry * add comment about registry list return * fix linter errors * use registry to hydrate * remove debug print * linter fixes * remove kvstore.db * rename distribution_registry_store --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-04 17:25:06 -08:00
Ashwin Bharambe	37b330b4ef	add dynamic clients for all APIs (#348 ) * add dynamic clients for all APIs * fix openapi generator * inference + memory + agents tests now pass with "remote" providers * Add docstring which fixes openapi generator :/	2024-10-31 14:46:25 -07:00
Steve Grubb	f04b566c5c	Do not cache pip (#349 ) Pip has a 3.3GB cache of torch and friends. Do not keep this in the image.	2024-10-31 09:52:40 -07:00
Ashwin Bharambe	4aa1bf6a60	Kill --name from llama stack build (#340 )	2024-10-28 23:07:32 -07:00
Ashwin Bharambe	b7d2b83d55	Allow passing provider_registry to resolve_impls()	2024-10-28 11:58:16 -07:00
Xi Yan	abdf7cddf3	[Evals API][4/n] evals with generation meta-reference impl (#303 ) * wip * dataset validation * test_scoring * cleanup * clean up test * comments * error checking * dataset client * test client: * datasetio client * clean up * basic scoring function works * scorer wip * equality scorer * score batch impl * score batch * update scoring test * refactor * validate scorer input * address comments * evals with generation * add all rows scores to ScoringResult * minor typing * bugfix * scoring function def rename * rebase name * refactor * address comments * Update iOS inference instructions for new quantization * Small updates to quantization config * Fix score threshold in faiss * Bump version to 0.0.45 * Handle both ipv6 and ipv4 interfaces together * update manifest for build templates * Update getting_started.md * chatcompletion & completion input type validation * inclusion->subsetof * error checking * scoring_function -> scoring_fn rename, scorer -> scoring_fn rename * address comments * [Evals API][5/n] fixes to generate openapi spec (#323) * generate openapi * typing comment, dataset -> dataset_id * remove custom type * sample eval run.yaml --------- Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2024-10-25 13:12:39 -07:00
Xi Yan	07f9bf723f	fix broken --list-templates with adding build.yaml files for packaging (#327 ) * add build files to templates * fix templates * manifest * symlink * symlink * precommit * change everything to docker build.yaml * remove image_type in templates * fix build from templates CLI * fix readmes	2024-10-25 12:51:22 -07:00
Ashwin Bharambe	afae4e3d8e	Update docker build flow a little	2024-10-25 10:06:21 -07:00
Xi Yan	cb43caa2c3	start_container.sh prefix llamastack->distribution name	2024-10-24 21:29:17 -07:00
Xi Yan	cb84034567	[Evals API][3/n] scoring_functions / scoring meta-reference implementations (#296 ) * wip * dataset validation * test_scoring * cleanup * clean up test * comments * error checking * dataset client * test client: * datasetio client * clean up * basic scoring function works * scorer wip * equality scorer * score batch impl * score batch * update scoring test * refactor * validate scorer input * address comments * add all rows scores to ScoringResult * bugfix * scoring function def rename	2024-10-24 14:52:30 -07:00
Ashwin Bharambe	94728d6983	Handle both ipv6 and ipv4 interfaces together	2024-10-24 13:59:01 -07:00
Ashwin Bharambe	05a8d47b98	Add a meta-reference-quantized-gpu distribution	2024-10-23 21:45:50 -07:00
Xi Yan	0cec86453b	Fix issue w/ routing_table api getting added when router api is not specified (#298 ) * fix issue w/ enforcing api * cleanup * inference only yaml	2024-10-23 15:27:22 -07:00
Xi Yan	821810657f	[Evals API][2/n] datasets / datasetio meta-reference implementation (#288 ) * skeleton dataset / datasetio * dataset datasetio * config * address comments * delete dataset_utils * address comments * naming fix	2024-10-22 16:12:16 -07:00
Ashwin Bharambe	c06718fbd5	Add support for Structured Output / Guided decoding (#281 ) Added support for structured output in the API and added a reference implementation for meta-reference. A few notes: * Two formats are specified in the API: Json schema and EBNF based grammar * Implementation only supports Json for now We use lm-format-enhancer to provide the implementation right now but may change this especially because BNF grammars aren't supported by that library. Fireworks has support for structured output and Together has limited supported for it too. Subsequent PRs will add these changes. We would like all our inference providers to provide structured output for llama models since it is an extremely important and highly sought-after need by the developers.	2024-10-22 12:53:34 -07:00
Xi Yan	4d2bd2d39e	add more distro templates (#279 ) * verify dockers * together distro verified * readme * fireworks distro * fireworks compose up * fireworks verified	2024-10-21 18:15:08 -07:00
Xi Yan	cf27d19dd5	fix sse_generator async	2024-10-21 14:03:42 -07:00
Xi Yan	af75618348	remove distribution/templates	2024-10-21 13:23:58 -07:00
Xi Yan	23210e8679	llama stack distributions / templates / docker refactor (#266 ) * docker compose ollama * comment * update compose file * readme for distributions * readme * move distribution folders * move distribution/templates to distributions/ * rename * kill distribution/templates * readme * readme * build/developer cookbook/new api provider * developer cookbook * readme * readme * [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264) * fix case where memory bank is registered without provider_id * memory test * agents unit test * Add an option to not use elastic agents for meta-reference inference (#269) * Allow overridding checkpoint_dir via config * Small rename * Make all methods `async def` again; add completion() for meta-reference (#270) PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def". The rationale was that this allowed the user (within llama-stack) of this to use it as: ``` async for chunk in api.chat_completion(params) ``` However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like: ``` async for chunk in await api.chat_completion(params) ``` Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :) * Improve an important error message * update ollama for llama-guard3 * Add vLLM inference provider for OpenAI compatible vLLM server (#178) This PR adds vLLM inference provider for OpenAI compatible vLLM server. * Create .readthedocs.yaml Trying out readthedocs * Update event_logger.py (#275) spelling error * vllm * build templates * delete templates * tmp add back build to avoid merge conflicts * vllm * vllm --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin@meta.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: raghotham <rsm@meta.com> Co-authored-by: nehal-a2z <nehal@coderabbit.ai>	2024-10-21 11:17:53 -07:00
Yuan Tang	a27a2cd2af	Add vLLM inference provider for OpenAI compatible vLLM server (#178 ) This PR adds vLLM inference provider for OpenAI compatible vLLM server.	2024-10-20 18:43:25 -07:00
Ashwin Bharambe	8cfbb9d38b	Improve an important error message	2024-10-19 17:19:54 -07:00
Ashwin Bharambe	2089427d60	Make all methods `async def` again; add completion() for meta-reference (#270 ) PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def". The rationale was that this allowed the user (within llama-stack) of this to use it as: ``` async for chunk in api.chat_completion(params) ``` However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like: ``` async for chunk in await api.chat_completion(params) ``` Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)	2024-10-18 20:50:59 -07:00

1 2

95 commits