llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-06 04:34:57 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	eb2d8a31a5	Add a RoutableProvider protocol, support for multiple routing keys (#163 ) * Update configure.py to use multiple routing keys for safety * Refactor distribution/datatypes into a providers/datatypes * Cleanup	2024-09-30 17:30:21 -07:00
Xi Yan	d28c3dfe0f	[CLI] simplify docker run (#159 ) * bake run.yaml inside docker, simplify run * add docker template examples * delete generated Dockerfile * unique deps * clean up debug * default entrypoint * address comments, update output msg * update msg * build output msg * configure msg * unique special_deps * remove quotes in configure	2024-09-30 15:04:04 -07:00
Russell Bryant	8db49de961	docker: Install in editable mode for dev purposes (#160 ) While rebuilding a stack using the `docker` image type and having `LLAMA_STACK_DIR` set so it installs `llama_stack` from my local source, I noticed that once built, it just used the image build cache and didn't pull in changes to my source. 1. Install in editable mode (`pip install -e`) for dev purposes. 2. Mount the source into the container for `configure` and `run` so that the editable install works. Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-30 11:56:31 -07:00
Russell Bryant	cb36be320f	Fix podman+selinux compatibility (#132 ) When I ran `llama stack configure` for my `docker` based stack on my system using podman + SELinux (CentOS Stream 9), The `podman run` command failed due to SELinux blocking access to the volume mount. As a simple fix, disable SELinux label checking. Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-29 20:19:44 -07:00
Ashwin Bharambe	5bf679cab6	Pull (extract) provider data from the provider instead of pushing from the top (#148 )	2024-09-29 20:00:51 -07:00
Xi Yan	6a8c2ae1df	[CLI] remove dependency on CONDA_PREFIX in CLI (#144 ) * remove dependency on CONDA_PREFIX in CLI * lint * typo * more robust	2024-09-28 16:46:47 -07:00
Xi Yan	4ae8c63a2b	pre-commit lint	2024-09-28 16:04:41 -07:00
Xi Yan	6236634d84	[bugfix] fix duplicate api endpoints (#139 ) * fix server api to serve * remove print	2024-09-27 15:32:50 -07:00
Xi Yan	208b861289	add env for LLAMA_STACK_CONFIG_DIR (#137 )	2024-09-27 14:16:46 -07:00
Xi Yan	ca7602a642	fix #100	2024-09-25 15:11:56 -07:00
Lucain	615ed4bfbc	Make TGI adapter compatible with HF Inference API (#97 )	2024-09-25 14:08:31 -07:00
Ashwin Bharambe	56aed59eb4	Support for Llama3.2 models and Swift SDK (#98 )	2024-09-25 10:29:58 -07:00
poegej	95abbf576b	Bump version to 0.0.24 (#94 ) Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2024-09-25 09:31:12 -07:00
Yogish Baliga	b85d675c6f	Adding safety adapter for Together	2024-09-24 18:35:48 -07:00
Ashwin Bharambe	0d2eb3bd25	Use inference APIs for running llama guard Test Plan: First, start a TGI container with `meta-llama/Llama-Guard-3-8B` model serving on port 5099. See https://github.com/meta-llama/llama-stack/pull/53 and its description for how. Then run llama-stack with the following run config: ``` image_name: safety docker_image: null conda_env: safety apis_to_serve: - models - inference - shields - safety api_providers: inference: providers: - remote::tgi safety: providers: - meta-reference telemetry: provider_id: meta-reference config: {} routing_table: inference: - provider_id: remote::tgi config: url: http://localhost:5099 api_token: null hf_endpoint_name: null routing_key: Llama-Guard-3-8B safety: - provider_id: meta-reference config: llama_guard_shield: model: Llama-Guard-3-8B excluded_categories: [] disable_input_check: false disable_output_check: false prompt_guard_shield: null routing_key: llama_guard ``` Now simply run `python -m llama_stack.apis.safety.client localhost <port>` and check that the llama_guard shield calls run correctly. (The injection_shield calls fail as expected since we have not set up a router for them.)	2024-09-24 17:02:57 -07:00
Ashwin Bharambe	bda974e660	Make the "all-remote" distribution lightweight in dependencies and size	2024-09-24 14:18:57 -07:00
Ashwin Bharambe	445536de64	Add httpx to core server deps	2024-09-24 10:42:04 -07:00
Ashwin Bharambe	8d511cdf91	Make build_conda_env a bit more robust	2024-09-24 10:12:07 -07:00
Ashwin Bharambe	f136f802b1	Somewhat better error handling	2024-09-23 21:40:14 -07:00
Ashwin Bharambe	ec4fc800cc	[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92 ) This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack. Most important bits: * Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis. * We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are: - routing model A to ollama and model B to a remote provider like Together - routing shield A to local impl while shield B to a remote provider like Bedrock - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis * Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.	2024-09-23 14:22:22 -07:00
Ashwin Bharambe	abb43936ab	Add a test runner and 2 very simple tests for agents	2024-09-19 12:22:48 -07:00
Ashwin Bharambe	9eb01dd664	Add DOCKER_BINARY / DOCKER_OPTS to all scripts	2024-09-19 10:26:41 -07:00
Ashwin Bharambe	dff9eab48f	Remove "APIs to serve" prompt	2024-09-18 18:26:26 -07:00
Xi Yan	1128f69674	CLI: add build templates support, move imports (#77 ) * list templates implementation * relative path * finalize templates * remove imports * remove templates from name, name templates * fix docker * fix docker	2024-09-18 14:25:53 -07:00
Ashwin Bharambe	055770a791	Stop asking for "apis to serve" as part of configure	2024-09-17 22:41:10 -07:00
Ashwin Bharambe	9487ad8294	API Updates (#73 ) * API Keys passed from Client instead of distro configuration * delete distribution registry * Rename the "package" word away * Introduce a "Router" layer for providers Some providers need to be factorized and considered as thin routing layers on top of other providers. Consider two examples: - The inference API should be a routing layer over inference providers, routed using the "model" key - The memory banks API is another instance where various memory bank types will be provided by independent providers (e.g., a vector store is served by Chroma while a keyvalue memory can be served by Redis or PGVector) This commit introduces a generalized routing layer for this purpose. * update `apis_to_serve` * llama_toolchain -> llama_stack * Codemod from llama_toolchain -> llama_stack - added providers/registry - cleaned up api/ subdirectories and moved impls away - restructured api/api.py - from llama_stack.apis.<api> import foo should work now - update imports to do llama_stack.apis.<api> - update many other imports - added __init__, fixed some registry imports - updated registry imports - create_agentic_system -> create_agent - AgenticSystem -> Agent * Moved some stuff out of common/; re-generated OpenAPI spec * llama-toolchain -> llama-stack (hyphens) * add control plane API * add redis adapter + sqlite provider * move core -> distribution * Some more toolchain -> stack changes * small naming shenanigans * Removing custom tool and agent utilities and moving them client side * Move control plane to distribution server for now * Remove control plane from API list * no codeshield dependency randomly plzzzzz * Add "fire" as a dependency * add back event loggers * stack configure fixes * use brave instead of bing in the example client * add init file so it gets packaged * add init files so it gets packaged * Update MANIFEST * bug fix --------- Co-authored-by: Hardik Shah <hjshah@fb.com> Co-authored-by: Xi Yan <xiyan@meta.com> Co-authored-by: Ashwin Bharambe <ashwin@meta.com>	2024-09-17 19:51:35 -07:00

26 commits