llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-21 08:12:25 +00:00

Author	SHA1	Message	Date
Xi Yan	d1633dc412	huggingface provider	2024-11-07 15:20:22 -08:00
Xi Yan	6b889651d6	Merge branch 'main' into eval_task_register	2024-11-07 14:41:29 -08:00
Xi Yan	f05db9a25c	add eval_id for jobs	2024-11-07 14:30:46 -08:00
Xi Yan	ea80f623fb	add default task_eval_id for routing	2024-11-07 14:19:33 -08:00
Xi Yan	51c20f9c29	api refactor	2024-11-07 13:54:26 -08:00
Dalton Flanagan	345ae07317	Factor out create_dist_registry (#398 )	2024-11-07 16:13:19 -05:00
Xi Yan	97dcd5704c	Merge branch 'main' into eval_task_register	2024-11-07 13:08:58 -08:00
Ashwin Bharambe	694c142b89	Add provider deprecation support; change directory structure (#397 ) * Add provider deprecation support; change directory structure * fix a couple dangling imports * move the meta_reference safety dir also	2024-11-07 13:04:53 -08:00
Xi Yan	283b5c1def	Merge branch 'main' into eval_task_register	2024-11-06 21:50:09 -08:00
Xi Yan	3f1ac29d57	test eval works	2024-11-06 21:40:38 -08:00
Xi Yan	413a1b6d8b	fix eval	2024-11-06 21:10:54 -08:00
Ashwin Bharambe	489f74a70b	Allow simpler initialization of `RemoteProviderConfig`; fix issue in httpx client	2024-11-06 19:19:26 -08:00
Xi Yan	56239fce90	scoring fix	2024-11-06 18:07:16 -08:00
Xi Yan	0351072531	fix scoring register	2024-11-06 17:18:16 -08:00
Dinesh Yeduguru	093c9f1987	add bedrock distribution code (#358 ) * add bedrock distribution code * fix linter error * add bedrock shields support * linter fixes * working bedrock safety * change to return only one violation * remove env var reading * refereshable boto credentials * remove env vars * address raghu's feedback * fix session_ttl passing --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-06 14:39:11 -08:00
Dinesh Yeduguru	6ebd553da5	fix routing tables look up key for memory bank (#383 ) Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-06 13:32:46 -08:00
Xi Yan	748606195b	Kill `llama stack configure` (#371 ) * remove configure * build msg * wip * build->run * delete prints * docs * fix docs, kill configure * precommit * update fireworks build * docs * clean up build * comments * fix * test * remove baking build.yaml into docker * fix msg, urls * configure msg	2024-11-06 13:32:10 -08:00
Ashwin Bharambe	d289afdbde	Fix exception in server when client SSE connection closes	2024-11-06 11:00:34 -08:00
Ashwin Bharambe	a81178f1f5	The server now depends on SQLite by default	2024-11-04 20:35:53 -08:00
Ashwin Bharambe	9a57a009ee	Need to await for get_object_from_identifier() now	2024-11-04 20:33:12 -08:00
Ashwin Bharambe	7cf4c905f3	add support for remote providers in tests	2024-11-04 20:30:46 -08:00
Ashwin Bharambe	0763a0b85f	Fix for the fix!	2024-11-04 20:06:01 -08:00
Ashwin Bharambe	fb2678b134	Fix shield_type and routing table breakage	2024-11-04 19:57:15 -08:00
Ashwin Bharambe	ffedb81c11	Significantly simpler and malleable test setup (#360 ) * Significantly simpler and malleable test setup * convert memory tests * refactor fixtures and add support for composable fixtures * Fix memory to use the newer fixture organization * Get agents tests working * Safety tests work * yet another refactor to make this more general now it accepts --inference-model, --safety-model options also * get multiple providers working for meta-reference (for inference + safety) * Add README.md --------- Co-authored-by: Ashwin Bharambe <ashwin@meta.com>	2024-11-04 17:36:43 -08:00
Dinesh Yeduguru	663883cc29	persist registered objects with distribution (#354 ) * persist registered objects with distribution * linter fixes * comment * use annotate and field discriminator * workign tests * donot use global state * precommit failures fixed * add back Any * fix imports * remove unnecessary changes in ollama * precommit failures fixed * make kvstore configurable for dist and rename registry * add comment about registry list return * fix linter errors * use registry to hydrate * remove debug print * linter fixes * remove kvstore.db * rename distribution_registry_store --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-04 17:25:06 -08:00
Ashwin Bharambe	37b330b4ef	add dynamic clients for all APIs (#348 ) * add dynamic clients for all APIs * fix openapi generator * inference + memory + agents tests now pass with "remote" providers * Add docstring which fixes openapi generator :/	2024-10-31 14:46:25 -07:00
Steve Grubb	f04b566c5c	Do not cache pip (#349 ) Pip has a 3.3GB cache of torch and friends. Do not keep this in the image.	2024-10-31 09:52:40 -07:00
Ashwin Bharambe	4aa1bf6a60	Kill --name from llama stack build (#340 )	2024-10-28 23:07:32 -07:00
Ashwin Bharambe	b7d2b83d55	Allow passing provider_registry to resolve_impls()	2024-10-28 11:58:16 -07:00
Xi Yan	abdf7cddf3	[Evals API][4/n] evals with generation meta-reference impl (#303 ) * wip * dataset validation * test_scoring * cleanup * clean up test * comments * error checking * dataset client * test client: * datasetio client * clean up * basic scoring function works * scorer wip * equality scorer * score batch impl * score batch * update scoring test * refactor * validate scorer input * address comments * evals with generation * add all rows scores to ScoringResult * minor typing * bugfix * scoring function def rename * rebase name * refactor * address comments * Update iOS inference instructions for new quantization * Small updates to quantization config * Fix score threshold in faiss * Bump version to 0.0.45 * Handle both ipv6 and ipv4 interfaces together * update manifest for build templates * Update getting_started.md * chatcompletion & completion input type validation * inclusion->subsetof * error checking * scoring_function -> scoring_fn rename, scorer -> scoring_fn rename * address comments * [Evals API][5/n] fixes to generate openapi spec (#323) * generate openapi * typing comment, dataset -> dataset_id * remove custom type * sample eval run.yaml --------- Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2024-10-25 13:12:39 -07:00
Xi Yan	07f9bf723f	fix broken --list-templates with adding build.yaml files for packaging (#327 ) * add build files to templates * fix templates * manifest * symlink * symlink * precommit * change everything to docker build.yaml * remove image_type in templates * fix build from templates CLI * fix readmes	2024-10-25 12:51:22 -07:00
Ashwin Bharambe	afae4e3d8e	Update docker build flow a little	2024-10-25 10:06:21 -07:00
Xi Yan	cb43caa2c3	start_container.sh prefix llamastack->distribution name	2024-10-24 21:29:17 -07:00
Xi Yan	cb84034567	[Evals API][3/n] scoring_functions / scoring meta-reference implementations (#296 ) * wip * dataset validation * test_scoring * cleanup * clean up test * comments * error checking * dataset client * test client: * datasetio client * clean up * basic scoring function works * scorer wip * equality scorer * score batch impl * score batch * update scoring test * refactor * validate scorer input * address comments * add all rows scores to ScoringResult * bugfix * scoring function def rename	2024-10-24 14:52:30 -07:00
Ashwin Bharambe	94728d6983	Handle both ipv6 and ipv4 interfaces together	2024-10-24 13:59:01 -07:00
Ashwin Bharambe	05a8d47b98	Add a meta-reference-quantized-gpu distribution	2024-10-23 21:45:50 -07:00
Xi Yan	0cec86453b	Fix issue w/ routing_table api getting added when router api is not specified (#298 ) * fix issue w/ enforcing api * cleanup * inference only yaml	2024-10-23 15:27:22 -07:00
Xi Yan	821810657f	[Evals API][2/n] datasets / datasetio meta-reference implementation (#288 ) * skeleton dataset / datasetio * dataset datasetio * config * address comments * delete dataset_utils * address comments * naming fix	2024-10-22 16:12:16 -07:00
Ashwin Bharambe	c06718fbd5	Add support for Structured Output / Guided decoding (#281 ) Added support for structured output in the API and added a reference implementation for meta-reference. A few notes: * Two formats are specified in the API: Json schema and EBNF based grammar * Implementation only supports Json for now We use lm-format-enhancer to provide the implementation right now but may change this especially because BNF grammars aren't supported by that library. Fireworks has support for structured output and Together has limited supported for it too. Subsequent PRs will add these changes. We would like all our inference providers to provide structured output for llama models since it is an extremely important and highly sought-after need by the developers.	2024-10-22 12:53:34 -07:00
Xi Yan	4d2bd2d39e	add more distro templates (#279 ) * verify dockers * together distro verified * readme * fireworks distro * fireworks compose up * fireworks verified	2024-10-21 18:15:08 -07:00
Xi Yan	cf27d19dd5	fix sse_generator async	2024-10-21 14:03:42 -07:00
Xi Yan	af75618348	remove distribution/templates	2024-10-21 13:23:58 -07:00
Xi Yan	23210e8679	llama stack distributions / templates / docker refactor (#266 ) * docker compose ollama * comment * update compose file * readme for distributions * readme * move distribution folders * move distribution/templates to distributions/ * rename * kill distribution/templates * readme * readme * build/developer cookbook/new api provider * developer cookbook * readme * readme * [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264) * fix case where memory bank is registered without provider_id * memory test * agents unit test * Add an option to not use elastic agents for meta-reference inference (#269) * Allow overridding checkpoint_dir via config * Small rename * Make all methods `async def` again; add completion() for meta-reference (#270) PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def". The rationale was that this allowed the user (within llama-stack) of this to use it as: ``` async for chunk in api.chat_completion(params) ``` However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like: ``` async for chunk in await api.chat_completion(params) ``` Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :) * Improve an important error message * update ollama for llama-guard3 * Add vLLM inference provider for OpenAI compatible vLLM server (#178) This PR adds vLLM inference provider for OpenAI compatible vLLM server. * Create .readthedocs.yaml Trying out readthedocs * Update event_logger.py (#275) spelling error * vllm * build templates * delete templates * tmp add back build to avoid merge conflicts * vllm * vllm --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin@meta.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: raghotham <rsm@meta.com> Co-authored-by: nehal-a2z <nehal@coderabbit.ai>	2024-10-21 11:17:53 -07:00
Yuan Tang	a27a2cd2af	Add vLLM inference provider for OpenAI compatible vLLM server (#178 ) This PR adds vLLM inference provider for OpenAI compatible vLLM server.	2024-10-20 18:43:25 -07:00
Ashwin Bharambe	8cfbb9d38b	Improve an important error message	2024-10-19 17:19:54 -07:00
Ashwin Bharambe	2089427d60	Make all methods `async def` again; add completion() for meta-reference (#270 ) PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def". The rationale was that this allowed the user (within llama-stack) of this to use it as: ``` async for chunk in api.chat_completion(params) ``` However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like: ``` async for chunk in await api.chat_completion(params) ``` Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)	2024-10-18 20:50:59 -07:00
Ashwin Bharambe	95a96afe34	Small rename	2024-10-18 14:41:38 -07:00
Xi Yan	be3c5c034d	[bugfix] fix case for agent when memory bank registered without specifying provider_id (#264 ) * fix case where memory bank is registered without provider_id * memory test * agents unit test	2024-10-17 17:28:17 -07:00
Xi Yan	d787d1e84f	config templates restructure, docs (#262 ) * wip * config templates * readmes	2024-10-16 23:25:10 -07:00
Xi Yan	c4d5d6bb91	Docker compose scripts for remote adapters (#241 ) * tgi docker compose * path * wait for tgi server to start before starting server * update provider-id * move scripts to distribution/ folder * add readme * readme	2024-10-15 16:32:53 -07:00

1 2

91 commits