llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Xi Yan	23210e8679	llama stack distributions / templates / docker refactor (#266 ) * docker compose ollama * comment * update compose file * readme for distributions * readme * move distribution folders * move distribution/templates to distributions/ * rename * kill distribution/templates * readme * readme * build/developer cookbook/new api provider * developer cookbook * readme * readme * [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264) * fix case where memory bank is registered without provider_id * memory test * agents unit test * Add an option to not use elastic agents for meta-reference inference (#269) * Allow overridding checkpoint_dir via config * Small rename * Make all methods `async def` again; add completion() for meta-reference (#270) PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def". The rationale was that this allowed the user (within llama-stack) of this to use it as: ``` async for chunk in api.chat_completion(params) ``` However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like: ``` async for chunk in await api.chat_completion(params) ``` Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :) * Improve an important error message * update ollama for llama-guard3 * Add vLLM inference provider for OpenAI compatible vLLM server (#178) This PR adds vLLM inference provider for OpenAI compatible vLLM server. * Create .readthedocs.yaml Trying out readthedocs * Update event_logger.py (#275) spelling error * vllm * build templates * delete templates * tmp add back build to avoid merge conflicts * vllm * vllm --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin@meta.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: raghotham <rsm@meta.com> Co-authored-by: nehal-a2z <nehal@coderabbit.ai>	2024-10-21 11:17:53 -07:00
Ashwin Bharambe	1ff0476002	Split off meta-reference-quantized provider	2024-10-10 16:03:19 -07:00
Ashwin Bharambe	6bb57e72a7	Remove "routing_table" and "routing_key" concepts for the user (#201 ) This PR makes several core changes to the developer experience surrounding Llama Stack. Background: PR #92 introduced the notion of "routing" to the Llama Stack. It introduces three object types: (1) models, (2) shields and (3) memory banks. Each of these objects can be associated with a distinct provider. So you can get model A to be inferenced locally while model B, C can be inference remotely (e.g.) However, this had a few drawbacks: you could not address the provider instances -- i.e., if you configured "meta-reference" with a given model, you could not assign an identifier to this instance which you could re-use later. the above meant that you could not register a "routing_key" (e.g. model) dynamically and say "please use this existing provider I have already configured" for a new model. the terms "routing_table" and "routing_key" were exposed directly to the user. in my view, this is way too much overhead for a new user (which almost everyone is.) people come to the stack wanting to do ML and encounter a completely unexpected term. What this PR does: This PR structures the run config with only a single prominent key: - providers Providers are instances of configured provider types. Here's an example which shows two instances of the remote::tgi provider which are serving two different models. providers: inference: - provider_id: foo provider_type: remote::tgi config: { ... } - provider_id: bar provider_type: remote::tgi config: { ... } Secondly, the PR adds dynamic registration of { models \| shields \| memory_banks } to the API surface. The distribution still acts like a "routing table" (as previously) except that it asks the backing providers for a listing of these objects. For example it asks a TGI or Ollama inference adapter what models it is serving. Only the models that are being actually served can be requested by the user for inference. Otherwise, the Stack server will throw an error. When dynamically registering these objects, you can use the provider IDs shown above. Info about providers can be obtained using the Api.inspect set of endpoints (/providers, /routes, etc.) The above examples shows the correspondence between inference providers and models registry items. Things work similarly for the safety <=> shields and memory <=> memory_banks pairs. Registry: This PR also makes it so that Providers need to implement additional methods for registering and listing objects. For example, each Inference provider is now expected to implement the ModelsProtocolPrivate protocol (naming is not great!) which consists of two methods register_model list_models The goal is to inform the provider that a certain model needs to be supported so the provider can make any relevant backend changes if needed (or throw an error if the model cannot be supported.) There are many other cleanups included some of which are detailed in a follow-up comment.	2024-10-10 10:24:13 -07:00
Dalton Flanagan	441052b0fd	avoid jq since non-standard on macOS	2024-10-04 10:11:43 -04:00
Xi Yan	62d266f018	[CLI] avoid configure twice (#171 ) * avoid configure twice * cleanup tmp config * update output msg * address comment * update msg * script update	2024-10-03 11:20:54 -07:00
Xi Yan	b9b1e8b08b	[bugfix] conda path lookup (#179 ) * fix conda lookup * comments	2024-10-03 10:45:16 -07:00
Ashwin Bharambe	e9f6150588	A bit cleanup to avoid breakages	2024-10-02 21:31:09 -07:00
Ashwin Bharambe	988a9cada3	Don't ask for Api.inspect in stack build	2024-10-02 21:10:56 -07:00
Ashwin Bharambe	fe4aabd690	provider_id => provider_type, adapter_id => adapter_type	2024-10-02 14:05:59 -07:00
Ashwin Bharambe	df68db644b	Refactoring distribution/distribution.py This file was becoming too large and unclear what it housed. Split it into pieces.	2024-10-02 14:03:02 -07:00
Xi Yan	73decb3781	re-build from name	2024-09-30 16:22:52 -07:00
Xi Yan	4897bf2f85	allow --name to re-build from config	2024-09-30 16:18:12 -07:00
Xi Yan	d28c3dfe0f	[CLI] simplify docker run (#159 ) * bake run.yaml inside docker, simplify run * add docker template examples * delete generated Dockerfile * unique deps * clean up debug * default entrypoint * address comments, update output msg * update msg * build output msg * configure msg * unique special_deps * remove quotes in configure	2024-09-30 15:04:04 -07:00
Xi Yan	f6a6598d1a	[bugfix] fix #146 (#147 ) * more robust image type * lint	2024-09-28 17:47:00 -07:00
Xi Yan	6a8c2ae1df	[CLI] remove dependency on CONDA_PREFIX in CLI (#144 ) * remove dependency on CONDA_PREFIX in CLI * lint * typo * more robust	2024-09-28 16:46:47 -07:00
Ashwin Bharambe	fe460ba103	Avoid importing a lot of stuff	2024-09-28 16:06:10 -07:00
Xi Yan	4ae8c63a2b	pre-commit lint	2024-09-28 16:04:41 -07:00
Russell Bryant	f70c88ab7a	configure: Fix a error msg typo (#131 ) I got this error message and noticed the typo in the message. It directed the user to run `llama stack build first`, which is not a valid command. Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-27 14:00:25 -07:00
Russell Bryant	fb9e6371ec	Validate `name` in `llama stack build` (#128 ) The first time I ran `llama stack build`, I quickly hit enter at the first prompt asking for a name, assuming it would use the default given in the help text. This caused a failure later on that wasn't very obvious. I was using the `docker` format and a blank name caused an invalid tag format that failed the image build. This change adds validation for the `name` parameter to ensure it's not empty before proceeding. Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-09-27 13:30:55 -07:00
Xi Yan	ca7602a642	fix #100	2024-09-25 15:11:56 -07:00
Ashwin Bharambe	f45705cd10	Some lightweight cleanup and renaming for bedrock safety adapter	2024-09-24 19:29:56 -07:00
Ashwin Bharambe	ec4fc800cc	[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92 ) This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack. Most important bits: * Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis. * We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are: - routing model A to ollama and model B to a remote provider like Together - routing shield A to local impl while shield B to a remote provider like Bedrock - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis * Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.	2024-09-23 14:22:22 -07:00
Hardik Shah	7e9e6117e3	do not assume CONDA_PREFIX exists during configuration	2024-09-19 23:39:34 -07:00
Xi Yan	6302a1ee90	fix prompt with name args (#80 )	2024-09-18 23:48:31 -07:00
Ashwin Bharambe	c63d6cbd08	list(...keys()) so dict_keys does not show up	2024-09-18 23:24:07 -07:00
Ashwin Bharambe	9ab27e852b	Bug fixes for memory	2024-09-18 21:54:02 -07:00
Xi Yan	f5d5e32d62	fix docker configure	2024-09-18 17:23:37 -07:00
Xi Yan	1128f69674	CLI: add build templates support, move imports (#77 ) * list templates implementation * relative path * finalize templates * remove imports * remove templates from name, name templates * fix docker * fix docker	2024-09-18 14:25:53 -07:00
Xi Yan	6b21523c28	CLI - add back build wizard, configure with name instead of build.yaml (#74 ) * add back wizard for build * conda build path move * polish message * run with name only * prompt for build * improve comments * update msgs * add new lines * move build.yaml * address comments * validator for providers * move imports * Please enter -> enter * comments, get started guide * nits * fix cprint import * fix imports	2024-09-18 11:41:56 -07:00
Ashwin Bharambe	3e27131a69	Don't import `pkg_resources` until you need it	2024-09-17 20:01:22 -07:00
Ashwin Bharambe	9487ad8294	API Updates (#73 ) * API Keys passed from Client instead of distro configuration * delete distribution registry * Rename the "package" word away * Introduce a "Router" layer for providers Some providers need to be factorized and considered as thin routing layers on top of other providers. Consider two examples: - The inference API should be a routing layer over inference providers, routed using the "model" key - The memory banks API is another instance where various memory bank types will be provided by independent providers (e.g., a vector store is served by Chroma while a keyvalue memory can be served by Redis or PGVector) This commit introduces a generalized routing layer for this purpose. * update `apis_to_serve` * llama_toolchain -> llama_stack * Codemod from llama_toolchain -> llama_stack - added providers/registry - cleaned up api/ subdirectories and moved impls away - restructured api/api.py - from llama_stack.apis.<api> import foo should work now - update imports to do llama_stack.apis.<api> - update many other imports - added __init__, fixed some registry imports - updated registry imports - create_agentic_system -> create_agent - AgenticSystem -> Agent * Moved some stuff out of common/; re-generated OpenAPI spec * llama-toolchain -> llama-stack (hyphens) * add control plane API * add redis adapter + sqlite provider * move core -> distribution * Some more toolchain -> stack changes * small naming shenanigans * Removing custom tool and agent utilities and moving them client side * Move control plane to distribution server for now * Remove control plane from API list * no codeshield dependency randomly plzzzzz * Add "fire" as a dependency * add back event loggers * stack configure fixes * use brave instead of bing in the example client * add init file so it gets packaged * add init files so it gets packaged * Update MANIFEST * bug fix --------- Co-authored-by: Hardik Shah <hjshah@fb.com> Co-authored-by: Xi Yan <xiyan@meta.com> Co-authored-by: Ashwin Bharambe <ashwin@meta.com>	2024-09-17 19:51:35 -07:00

31 commits