Commit graph

31 commits

Author SHA1 Message Date
Xi Yan
23210e8679
llama stack distributions / templates / docker refactor (#266)
* docker compose ollama

* comment

* update compose file

* readme for distributions

* readme

* move distribution folders

* move distribution/templates to distributions/

* rename

* kill distribution/templates

* readme

* readme

* build/developer cookbook/new api provider

* developer cookbook

* readme

* readme

* [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264)

* fix case where memory bank is registered without provider_id

* memory test

* agents unit test

* Add an option to not use elastic agents for meta-reference inference (#269)

* Allow overridding checkpoint_dir via config

* Small rename

* Make all methods `async def` again; add completion() for meta-reference (#270)

PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def".

The rationale was that this allowed the user (within llama-stack) of this to use it as:

```
async for chunk in api.chat_completion(params)
```

However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like:

```
async for chunk in await api.chat_completion(params)
```

Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)

* Improve an important error message

* update ollama for llama-guard3

* Add vLLM inference provider for OpenAI compatible vLLM server (#178)

This PR adds vLLM inference provider for OpenAI compatible vLLM server.

* Create .readthedocs.yaml

Trying out readthedocs

* Update event_logger.py (#275)

spelling error

* vllm

* build templates

* delete templates

* tmp add back build to avoid merge conflicts

* vllm

* vllm

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: raghotham <rsm@meta.com>
Co-authored-by: nehal-a2z <nehal@coderabbit.ai>
2024-10-21 11:17:53 -07:00
Ashwin Bharambe
1ff0476002 Split off meta-reference-quantized provider 2024-10-10 16:03:19 -07:00
Ashwin Bharambe
6bb57e72a7
Remove "routing_table" and "routing_key" concepts for the user (#201)
This PR makes several core changes to the developer experience surrounding Llama Stack.

Background: PR #92 introduced the notion of "routing" to the Llama Stack. It introduces three object types: (1) models, (2) shields and (3) memory banks. Each of these objects can be associated with a distinct provider. So you can get model A to be inferenced locally while model B, C can be inference remotely (e.g.)

However, this had a few drawbacks:

you could not address the provider instances -- i.e., if you configured "meta-reference" with a given model, you could not assign an identifier to this instance which you could re-use later.
the above meant that you could not register a "routing_key" (e.g. model) dynamically and say "please use this existing provider I have already configured" for a new model.
the terms "routing_table" and "routing_key" were exposed directly to the user. in my view, this is way too much overhead for a new user (which almost everyone is.) people come to the stack wanting to do ML and encounter a completely unexpected term.
What this PR does: This PR structures the run config with only a single prominent key:

- providers
Providers are instances of configured provider types. Here's an example which shows two instances of the remote::tgi provider which are serving two different models.

providers:
  inference:
  - provider_id: foo
    provider_type: remote::tgi
    config: { ... }
  - provider_id: bar
    provider_type: remote::tgi
    config: { ... }
Secondly, the PR adds dynamic registration of { models | shields | memory_banks } to the API surface. The distribution still acts like a "routing table" (as previously) except that it asks the backing providers for a listing of these objects. For example it asks a TGI or Ollama inference adapter what models it is serving. Only the models that are being actually served can be requested by the user for inference. Otherwise, the Stack server will throw an error.

When dynamically registering these objects, you can use the provider IDs shown above. Info about providers can be obtained using the Api.inspect set of endpoints (/providers, /routes, etc.)

The above examples shows the correspondence between inference providers and models registry items. Things work similarly for the safety <=> shields and memory <=> memory_banks pairs.

Registry: This PR also makes it so that Providers need to implement additional methods for registering and listing objects. For example, each Inference provider is now expected to implement the ModelsProtocolPrivate protocol (naming is not great!) which consists of two methods

register_model
list_models
The goal is to inform the provider that a certain model needs to be supported so the provider can make any relevant backend changes if needed (or throw an error if the model cannot be supported.)

There are many other cleanups included some of which are detailed in a follow-up comment.
2024-10-10 10:24:13 -07:00
Dalton Flanagan
441052b0fd avoid jq since non-standard on macOS 2024-10-04 10:11:43 -04:00
Xi Yan
62d266f018
[CLI] avoid configure twice (#171)
* avoid configure twice

* cleanup tmp config

* update output msg

* address comment

* update msg

* script update
2024-10-03 11:20:54 -07:00
Xi Yan
b9b1e8b08b
[bugfix] conda path lookup (#179)
* fix conda lookup

* comments
2024-10-03 10:45:16 -07:00
Ashwin Bharambe
e9f6150588 A bit cleanup to avoid breakages 2024-10-02 21:31:09 -07:00
Ashwin Bharambe
988a9cada3 Don't ask for Api.inspect in stack build 2024-10-02 21:10:56 -07:00
Ashwin Bharambe
fe4aabd690 provider_id => provider_type, adapter_id => adapter_type 2024-10-02 14:05:59 -07:00
Ashwin Bharambe
df68db644b Refactoring distribution/distribution.py
This file was becoming too large and unclear what it housed. Split it
into pieces.
2024-10-02 14:03:02 -07:00
Xi Yan
73decb3781 re-build from name 2024-09-30 16:22:52 -07:00
Xi Yan
4897bf2f85 allow --name to re-build from config 2024-09-30 16:18:12 -07:00
Xi Yan
d28c3dfe0f
[CLI] simplify docker run (#159)
* bake run.yaml inside docker, simplify run

* add docker template examples

* delete generated Dockerfile

* unique deps

* clean up debug

* default entrypoint

* address comments, update output msg

* update msg

* build output msg

* configure msg

* unique special_deps

* remove quotes in configure
2024-09-30 15:04:04 -07:00
Xi Yan
f6a6598d1a
[bugfix] fix #146 (#147)
* more robust image type

* lint
2024-09-28 17:47:00 -07:00
Xi Yan
6a8c2ae1df
[CLI] remove dependency on CONDA_PREFIX in CLI (#144)
* remove dependency on CONDA_PREFIX in CLI

* lint

* typo

* more robust
2024-09-28 16:46:47 -07:00
Ashwin Bharambe
fe460ba103 Avoid importing a lot of stuff 2024-09-28 16:06:10 -07:00
Xi Yan
4ae8c63a2b pre-commit lint 2024-09-28 16:04:41 -07:00
Russell Bryant
f70c88ab7a
configure: Fix a error msg typo (#131)
I got this error message and noticed the typo in the message. It
directed the user to run `llama stack build first`, which is not a
valid command.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-27 14:00:25 -07:00
Russell Bryant
fb9e6371ec
Validate name in llama stack build (#128)
The first time I ran `llama stack build`, I quickly hit enter at the
first prompt asking for a name, assuming it would use the default
given in the help text. This caused a failure later on that wasn't
very obvious. I was using the `docker` format and a blank name caused
an invalid tag format that failed the image build.

This change adds validation for the `name` parameter to ensure it's
not empty before proceeding.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-27 13:30:55 -07:00
Xi Yan
ca7602a642 fix #100 2024-09-25 15:11:56 -07:00
Ashwin Bharambe
f45705cd10 Some lightweight cleanup and renaming for bedrock safety adapter 2024-09-24 19:29:56 -07:00
Ashwin Bharambe
ec4fc800cc
[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92)
This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.

Most important bits:

* Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.

* We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
  - routing model A to ollama and model B to a remote provider like Together
  - routing shield A to local impl while shield B to a remote provider like Bedrock
  - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis

* Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.
2024-09-23 14:22:22 -07:00
Hardik Shah
7e9e6117e3 do not assume CONDA_PREFIX exists during configuration 2024-09-19 23:39:34 -07:00
Xi Yan
6302a1ee90
fix prompt with name args (#80) 2024-09-18 23:48:31 -07:00
Ashwin Bharambe
c63d6cbd08 list(...keys()) so dict_keys does not show up 2024-09-18 23:24:07 -07:00
Ashwin Bharambe
9ab27e852b Bug fixes for memory 2024-09-18 21:54:02 -07:00
Xi Yan
f5d5e32d62 fix docker configure 2024-09-18 17:23:37 -07:00
Xi Yan
1128f69674
CLI: add build templates support, move imports (#77)
* list templates implementation

* relative path

* finalize templates

* remove imports

* remove templates from name, name templates

* fix docker

* fix docker
2024-09-18 14:25:53 -07:00
Xi Yan
6b21523c28
CLI - add back build wizard, configure with name instead of build.yaml (#74)
* add back wizard for build

* conda build path move

* polish message

* run with name only

* prompt for build

* improve comments

* update msgs

* add new lines

* move build.yaml

* address comments

* validator for providers

* move imports

* Please enter -> enter

* comments, get started guide

* nits

* fix cprint import

* fix imports
2024-09-18 11:41:56 -07:00
Ashwin Bharambe
3e27131a69 Don't import pkg_resources until you need it 2024-09-17 20:01:22 -07:00
Ashwin Bharambe
9487ad8294
API Updates (#73)
* API Keys passed from Client instead of distro configuration

* delete distribution registry

* Rename the "package" word away

* Introduce a "Router" layer for providers

Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:

- The inference API should be a routing layer over inference providers,
  routed using the "model" key
- The memory banks API is another instance where various memory bank
  types will be provided by independent providers (e.g., a vector store
  is served by Chroma while a keyvalue memory can be served by Redis or
  PGVector)

This commit introduces a generalized routing layer for this purpose.

* update `apis_to_serve`

* llama_toolchain -> llama_stack

* Codemod from llama_toolchain -> llama_stack

- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent

* Moved some stuff out of common/; re-generated OpenAPI spec

* llama-toolchain -> llama-stack (hyphens)

* add control plane API

* add redis adapter + sqlite provider

* move core -> distribution

* Some more toolchain -> stack changes

* small naming shenanigans

* Removing custom tool and agent utilities and moving them client side

* Move control plane to distribution server for now

* Remove control plane from API list

* no codeshield dependency randomly plzzzzz

* Add "fire" as a dependency

* add back event loggers

* stack configure fixes

* use brave instead of bing in the example client

* add init file so it gets packaged

* add init files so it gets packaged

* Update MANIFEST

* bug fix

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00