llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	f45705cd10	Some lightweight cleanup and renaming for bedrock safety adapter	2024-09-24 19:29:56 -07:00
Ashwin Bharambe	a2465f3f9c	Revert parts of `0d2eb3bd25`	2024-09-24 19:20:51 -07:00
rsgrewal-aws	059e50b389	[aws-bedrock] Support for Bedrock Safety adapter (#96 )	2024-09-24 19:16:55 -07:00
Yogish Baliga	b85d675c6f	Adding safety adapter for Together	2024-09-24 18:35:48 -07:00
Ashwin Bharambe	0d2eb3bd25	Use inference APIs for running llama guard Test Plan: First, start a TGI container with `meta-llama/Llama-Guard-3-8B` model serving on port 5099. See https://github.com/meta-llama/llama-stack/pull/53 and its description for how. Then run llama-stack with the following run config: ``` image_name: safety docker_image: null conda_env: safety apis_to_serve: - models - inference - shields - safety api_providers: inference: providers: - remote::tgi safety: providers: - meta-reference telemetry: provider_id: meta-reference config: {} routing_table: inference: - provider_id: remote::tgi config: url: http://localhost:5099 api_token: null hf_endpoint_name: null routing_key: Llama-Guard-3-8B safety: - provider_id: meta-reference config: llama_guard_shield: model: Llama-Guard-3-8B excluded_categories: [] disable_input_check: false disable_output_check: false prompt_guard_shield: null routing_key: llama_guard ``` Now simply run `python -m llama_stack.apis.safety.client localhost <port>` and check that the llama_guard shield calls run correctly. (The injection_shield calls fail as expected since we have not set up a router for them.)	2024-09-24 17:02:57 -07:00
Xi Yan	c4534217c8	fix cli describe	2024-09-24 14:41:19 -07:00
Ashwin Bharambe	00352bd251	Respect passed in embedding model	2024-09-24 14:40:28 -07:00
Ashwin Bharambe	bda974e660	Make the "all-remote" distribution lightweight in dependencies and size	2024-09-24 14:18:57 -07:00
Ashwin Bharambe	445536de64	Add httpx to core server deps	2024-09-24 10:42:04 -07:00
Ashwin Bharambe	8d511cdf91	Make build_conda_env a bit more robust	2024-09-24 10:12:07 -07:00
Xi Yan	d04cd97aba	remove providers/impls/sqlite/*	2024-09-24 01:03:40 -07:00
Ashwin Bharambe	e617273d8c	attribute changed (model_args -> arch_args)	2024-09-23 21:44:26 -07:00
Ashwin Bharambe	f136f802b1	Somewhat better error handling	2024-09-23 21:40:14 -07:00
Xi Yan	f92ff86b96	fix shields in agents safety	2024-09-23 21:22:22 -07:00
Ashwin Bharambe	c9005e95ed	Another attempt at a proper bugfix for safety violations	2024-09-23 19:06:30 -07:00
Xi Yan	e5bdd6615a	bug fix for safety violation	2024-09-23 18:17:15 -07:00
Xi Yan	70fb70a71c	fix URL issue with agents	2024-09-23 16:44:25 -07:00
Ashwin Bharambe	ec4fc800cc	[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92 ) This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack. Most important bits: * Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis. * We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are: - routing model A to ollama and model B to a remote provider like Together - routing shield A to local impl while shield B to a remote provider like Bedrock - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis * Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.	2024-09-23 14:22:22 -07:00
Hardik Shah	8bf8c07eb3	Respect user sent instructions in agent config and add them to system prompt	2024-09-21 16:46:10 -07:00
Xi Yan	06abd7e6c8	update MemoryToolDefinition	2024-09-20 17:51:53 -07:00
Ashwin Bharambe	942cb87a3c	remove apis/stack.py	2024-09-20 09:37:08 -07:00
Hardik Shah	7e9e6117e3	do not assume CONDA_PREFIX exists during configuration	2024-09-19 23:39:34 -07:00
Hardik Shah	8fa49593e0	Allow TGI adaptor to have non-standard llama model names (#84 ) Co-authored-by: Hardik Shah <hjshah@fb.com>	2024-09-19 21:42:15 -07:00
Hardik Shah	42d29f3a5a	Allow TGI adaptor to have non-standard llama model names	2024-09-19 21:37:02 -07:00
Xi Yan	59af1c8fec	fix memory url parsing (#81 )	2024-09-19 13:35:03 -07:00
Ashwin Bharambe	132f9429b1	Add a test for CLI, but not fully done so disabled	2024-09-19 13:27:07 -07:00
Ashwin Bharambe	8b3ffa33de	Add another test case	2024-09-19 13:02:57 -07:00
Ashwin Bharambe	abb43936ab	Add a test runner and 2 very simple tests for agents	2024-09-19 12:22:48 -07:00
Xi Yan	543222ac39	update inference prompt msg	2024-09-19 12:03:24 -07:00
Xi Yan	a30b919ae1	update inference prompt msg	2024-09-19 12:03:24 -07:00
Ashwin Bharambe	9eb01dd664	Add DOCKER_BINARY / DOCKER_OPTS to all scripts	2024-09-19 10:26:41 -07:00
Xi Yan	ca4b87aa05	fix memory client	2024-09-19 09:29:40 -07:00
Xi Yan	6302a1ee90	fix prompt with name args (#80 )	2024-09-18 23:48:31 -07:00
Ashwin Bharambe	c63d6cbd08	list(...keys()) so dict_keys does not show up	2024-09-18 23:24:07 -07:00
Ashwin Bharambe	f5eda1decf	Add default for max_seq_len	2024-09-18 21:59:10 -07:00
Ashwin Bharambe	9ab27e852b	Bug fixes for memory	2024-09-18 21:54:02 -07:00
Ashwin Bharambe	8cdc2f0cfb	No RunShieldRequest	2024-09-18 20:38:21 -07:00
Ashwin Bharambe	dff9eab48f	Remove "APIs to serve" prompt	2024-09-18 18:26:26 -07:00
Xi Yan	f5d5e32d62	fix docker configure	2024-09-18 17:23:37 -07:00
Xi Yan	1128f69674	CLI: add build templates support, move imports (#77 ) * list templates implementation * relative path * finalize templates * remove imports * remove templates from name, name templates * fix docker * fix docker	2024-09-18 14:25:53 -07:00
Xi Yan	6b21523c28	CLI - add back build wizard, configure with name instead of build.yaml (#74 ) * add back wizard for build * conda build path move * polish message * run with name only * prompt for build * improve comments * update msgs * add new lines * move build.yaml * address comments * validator for providers * move imports * Please enter -> enter * comments, get started guide * nits * fix cprint import * fix imports	2024-09-18 11:41:56 -07:00
Xi Yan	e6fdb9df29	fix context retriever (#75 )	2024-09-18 08:24:36 -07:00
Ashwin Bharambe	055770a791	Stop asking for "apis to serve" as part of configure	2024-09-17 22:41:10 -07:00
Ashwin Bharambe	9fd431e710	make shield imports more lazy	2024-09-17 21:27:37 -07:00
Ashwin Bharambe	3e27131a69	Don't import `pkg_resources` until you need it	2024-09-17 20:01:22 -07:00
Ashwin Bharambe	25adc83de8	Fix for safety	2024-09-17 19:56:58 -07:00
Ashwin Bharambe	9487ad8294	API Updates (#73 ) * API Keys passed from Client instead of distro configuration * delete distribution registry * Rename the "package" word away * Introduce a "Router" layer for providers Some providers need to be factorized and considered as thin routing layers on top of other providers. Consider two examples: - The inference API should be a routing layer over inference providers, routed using the "model" key - The memory banks API is another instance where various memory bank types will be provided by independent providers (e.g., a vector store is served by Chroma while a keyvalue memory can be served by Redis or PGVector) This commit introduces a generalized routing layer for this purpose. * update `apis_to_serve` * llama_toolchain -> llama_stack * Codemod from llama_toolchain -> llama_stack - added providers/registry - cleaned up api/ subdirectories and moved impls away - restructured api/api.py - from llama_stack.apis.<api> import foo should work now - update imports to do llama_stack.apis.<api> - update many other imports - added __init__, fixed some registry imports - updated registry imports - create_agentic_system -> create_agent - AgenticSystem -> Agent * Moved some stuff out of common/; re-generated OpenAPI spec * llama-toolchain -> llama-stack (hyphens) * add control plane API * add redis adapter + sqlite provider * move core -> distribution * Some more toolchain -> stack changes * small naming shenanigans * Removing custom tool and agent utilities and moving them client side * Move control plane to distribution server for now * Remove control plane from API list * no codeshield dependency randomly plzzzzz * Add "fire" as a dependency * add back event loggers * stack configure fixes * use brave instead of bing in the example client * add init file so it gets packaged * add init files so it gets packaged * Update MANIFEST * bug fix --------- Co-authored-by: Hardik Shah <hjshah@fb.com> Co-authored-by: Xi Yan <xiyan@meta.com> Co-authored-by: Ashwin Bharambe <ashwin@meta.com>	2024-09-17 19:51:35 -07:00

... 21 22 23 24 25

1247 commits