Commit graph

55 commits

Author SHA1 Message Date
Yuan Tang
2128e61da2
Fix incorrect completion() signature for Databricks provider (#236) 2024-10-11 08:47:57 -07:00
Xi Yan
7ff5800dea generate openapi 2024-10-10 15:30:34 -07:00
Ashwin Bharambe
6bb57e72a7
Remove "routing_table" and "routing_key" concepts for the user (#201)
This PR makes several core changes to the developer experience surrounding Llama Stack.

Background: PR #92 introduced the notion of "routing" to the Llama Stack. It introduces three object types: (1) models, (2) shields and (3) memory banks. Each of these objects can be associated with a distinct provider. So you can get model A to be inferenced locally while model B, C can be inference remotely (e.g.)

However, this had a few drawbacks:

you could not address the provider instances -- i.e., if you configured "meta-reference" with a given model, you could not assign an identifier to this instance which you could re-use later.
the above meant that you could not register a "routing_key" (e.g. model) dynamically and say "please use this existing provider I have already configured" for a new model.
the terms "routing_table" and "routing_key" were exposed directly to the user. in my view, this is way too much overhead for a new user (which almost everyone is.) people come to the stack wanting to do ML and encounter a completely unexpected term.
What this PR does: This PR structures the run config with only a single prominent key:

- providers
Providers are instances of configured provider types. Here's an example which shows two instances of the remote::tgi provider which are serving two different models.

providers:
  inference:
  - provider_id: foo
    provider_type: remote::tgi
    config: { ... }
  - provider_id: bar
    provider_type: remote::tgi
    config: { ... }
Secondly, the PR adds dynamic registration of { models | shields | memory_banks } to the API surface. The distribution still acts like a "routing table" (as previously) except that it asks the backing providers for a listing of these objects. For example it asks a TGI or Ollama inference adapter what models it is serving. Only the models that are being actually served can be requested by the user for inference. Otherwise, the Stack server will throw an error.

When dynamically registering these objects, you can use the provider IDs shown above. Info about providers can be obtained using the Api.inspect set of endpoints (/providers, /routes, etc.)

The above examples shows the correspondence between inference providers and models registry items. Things work similarly for the safety <=> shields and memory <=> memory_banks pairs.

Registry: This PR also makes it so that Providers need to implement additional methods for registering and listing objects. For example, each Inference provider is now expected to implement the ModelsProtocolPrivate protocol (naming is not great!) which consists of two methods

register_model
list_models
The goal is to inform the provider that a certain model needs to be supported so the provider can make any relevant backend changes if needed (or throw an error if the model cannot be supported.)

There are many other cleanups included some of which are detailed in a follow-up comment.
2024-10-10 10:24:13 -07:00
Dalton Flanagan
8c3010553f
Fix agents path in generate.py 2024-10-10 11:41:03 -04:00
Xi Yan
6b094b72d3
Update cli_reference.md 2024-10-08 15:32:06 -07:00
Xi Yan
ce70d21f65
Add files via upload 2024-10-08 15:29:19 -07:00
Xi Yan
2366e18873
refactor docs (#209) 2024-10-07 10:21:26 -07:00
Xi Yan
29138a5167
Update getting_started.md 2024-10-05 12:28:02 -07:00
Xi Yan
6d4013ac99
Update getting_started.md 2024-10-05 12:14:59 -07:00
raghotham
00ed9a410b
Update getting_started.md
update discord invite link
2024-10-03 23:28:43 -07:00
Ashwin Bharambe
210b71b0ba
fix prompt guard (#177)
Several other fixes to configure. Add support for 1b/3b models in ollama.
2024-10-03 11:07:53 -07:00
Ashwin Bharambe
8d049000e3 Add an introspection "Api.inspect" API 2024-10-02 15:41:14 -07:00
Adrian Cole
01d93be948
Adds markdown-link-check and fixes a broken link (#165)
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-10-02 14:26:20 -07:00
Ashwin Bharambe
fe4aabd690 provider_id => provider_type, adapter_id => adapter_type 2024-10-02 14:05:59 -07:00
Ashwin Bharambe
4a75d922a9 Make Llama Guard 1B the default 2024-10-02 09:48:26 -07:00
Russell Bryant
43744455d7
docs: Note how to use podman (#130)
Podman works as an alternative to Docker, but it wasn't immediately
obvious going through the quickstart how to enable it aside from
installing the docker alias. Add a note that points users to the
correct env var to use podman.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-27 14:00:40 -07:00
Deep Doshi
557ae38289
Update getting_started.ipynb (#117)
Update hyperlink to `llama-stack-apps` to point it correctly to the desired github repo
2024-09-26 14:43:04 -07:00
Xi Yan
2802ac8e9d
add llama-stack.png 2024-09-26 11:17:46 -07:00
Karthi Keyan
995a1a1d00
Reordered pip install and llama model download (#112)
Only after pip install step, llama cli command could be used (which is also specified in the notebook), so its common sense to put it before
2024-09-26 10:37:15 -07:00
Mark Sze
3c99f08267
minor typo and HuggingFace -> Hugging Face (#113) 2024-09-26 09:48:23 -07:00
machina-source
37be3fb184
Fix links & format (#104)
Fix broken examples link to llama-stack-apps repo
Remove extra space in README.md
2024-09-25 14:18:46 -07:00
Abhishek
851c30597a
chore (doc): fix typo for setup instructionllama-stack to llama-stack-apps (#103) 2024-09-25 13:27:55 -07:00
Ashwin Bharambe
56aed59eb4
Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
Ashwin Bharambe
ec4fc800cc
[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92)
This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.

Most important bits:

* Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.

* We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
  - routing model A to ollama and model B to a remote provider like Together
  - routing shield A to local impl while shield B to a remote provider like Bedrock
  - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis

* Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.
2024-09-23 14:22:22 -07:00
Xi Yan
06abd7e6c8 update MemoryToolDefinition 2024-09-20 17:51:53 -07:00
Ashwin Bharambe
942cb87a3c remove apis/stack.py 2024-09-20 09:37:08 -07:00
Xi Yan
543222ac39 update inference prompt msg 2024-09-19 12:03:24 -07:00
Xi Yan
880ed37026
Update cli_reference.md 2024-09-18 23:05:24 -07:00
Xi Yan
5c4a2dc0e1
Update getting_started.md 2024-09-18 23:03:14 -07:00
Xi Yan
f3f5873e9e regenerate openapi spec 2024-09-18 19:28:05 -07:00
Xi Yan
5ec64ac68c moving rfc->docs 2024-09-18 16:54:24 -07:00
Xi Yan
2c1ad10710 move openapi from rfcs->docs 2024-09-18 16:09:17 -07:00
Xi Yan
45e20ff431 update getting started 2024-09-18 15:40:48 -07:00
Xi Yan
2f9e952813 update getting started guide 2024-09-18 15:35:54 -07:00
Xi Yan
6b21523c28
CLI - add back build wizard, configure with name instead of build.yaml (#74)
* add back wizard for build

* conda build path move

* polish message

* run with name only

* prompt for build

* improve comments

* update msgs

* add new lines

* move build.yaml

* address comments

* validator for providers

* move imports

* Please enter -> enter

* comments, get started guide

* nits

* fix cprint import

* fix imports
2024-09-18 11:41:56 -07:00
Dalton Flanagan
eea0a83bd1
Update getting_started.md
config is now a positional argument
2024-09-18 00:47:41 -04:00
Ashwin Bharambe
25adc83de8 Fix for safety 2024-09-17 19:56:58 -07:00
Ashwin Bharambe
9487ad8294
API Updates (#73)
* API Keys passed from Client instead of distro configuration

* delete distribution registry

* Rename the "package" word away

* Introduce a "Router" layer for providers

Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:

- The inference API should be a routing layer over inference providers,
  routed using the "model" key
- The memory banks API is another instance where various memory bank
  types will be provided by independent providers (e.g., a vector store
  is served by Chroma while a keyvalue memory can be served by Redis or
  PGVector)

This commit introduces a generalized routing layer for this purpose.

* update `apis_to_serve`

* llama_toolchain -> llama_stack

* Codemod from llama_toolchain -> llama_stack

- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent

* Moved some stuff out of common/; re-generated OpenAPI spec

* llama-toolchain -> llama-stack (hyphens)

* add control plane API

* add redis adapter + sqlite provider

* move core -> distribution

* Some more toolchain -> stack changes

* small naming shenanigans

* Removing custom tool and agent utilities and moving them client side

* Move control plane to distribution server for now

* Remove control plane from API list

* no codeshield dependency randomly plzzzzz

* Add "fire" as a dependency

* add back event loggers

* stack configure fixes

* use brave instead of bing in the example client

* add init file so it gets packaged

* add init files so it gets packaged

* Update MANIFEST

* bug fix

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00
Xi Yan
ce6c868499
Update cli_reference.md 2024-09-16 12:02:46 -07:00
Xi Yan
ed4272e31e
Update getting_started.md 2024-09-16 11:55:10 -07:00
Xi Yan
d9147f3184
CLI Update: build -> configure -> run (#69)
* remove configure from build

* remove config from build

* configure to regenerate file

* update memory providers

* remove comments

* udpate build script

* add reedme

* update doc

* rename getting started

* update build cli

* update docker build script

* configure update

* clean up configure

* [tmp fix] hardware requirement tmp fix

* clean up build

* fix configure

* add example build files for conda & docker

* remove resolve_distribution_spec

* remove available_distribution_specs

* example build files

* update example build files

* more clean up on build

* add name args to override name

* move distribution to yaml files

* generate distribution specs

* getting started guide

* getting started

* add build yaml to Dockerfile

* cleanup distribution_dependencies

* configure from  docker image name

* build relative paths

* minor comment

* getting started

* Update getting_started.md

* Update getting_started.md

* address comments, configure within docker file

* remove distribution types!

* update getting started

* update documentation

* remove listing distribution

* minor heading

* address nits, remove docker_image=null

* gitignore
2024-09-16 11:02:26 -07:00
Celina Hanouti
736092f6bc
[Inference] Use huggingface_hub inference client for TGI adapter (#53)
* Use huggingface_hub inference client for TGI inference

* Update the default value for TGI URL

* Use InferenceClient.text_generation for TGI inference

* Fixes post-review and split TGI adapter into local and Inference Endpoints ones

* Update CLI reference and add typing

* Rename TGI Adapter class

* Use HfApi to get the namespace when not provide in the hf endpoint name

* Remove unecessary method argument

* Improve TGI adapter initialization condition

* Move helper into impl file + fix merging conflicts
2024-09-12 09:11:35 -07:00
Xi Yan
89300df5dc
Add config file based CLI (#60)
* config file for build

* fix build command

* configure script with config

* fix configure script to work with config file

* update build.sh

* update readme

* distribution_type -> distribution

* fix run-config/config-file to config

* move import to inline

* only consume config as argument

* update configure to only consume config

* update readme

* update readme
2024-09-11 11:39:46 -07:00
Ashwin Bharambe
70e682fbdf Update distribution_id -> distribution_type, provider_id -> provider_type 2024-09-07 08:42:28 -07:00
raghotham
225cd75074
Update cli_reference.md
Made it easier to follow along with numbered steps
2024-09-04 18:50:10 -07:00
Ashwin Bharambe
7bc7785b0d
API Updates: fleshing out RAG APIs, introduce "llama stack" CLI command (#51)
* add tools to chat completion request

* use templates for generating system prompts

* Moved ToolPromptFormat and jinja templates to llama_models.llama3.api

* <WIP> memory changes

- inlined AgenticSystemInstanceConfig so API feels more ergonomic
- renamed it to AgentConfig, AgentInstance -> Agent
- added a MemoryConfig and `memory` parameter
- added `attachments` to input and `output_attachments` to the response

- some naming changes

* InterleavedTextAttachment -> InterleavedTextMedia, introduce memory tool

* flesh out memory banks API

* agentic loop has a RAG implementation

* faiss provider implementation

* memory client works

* re-work tool definitions, fix FastAPI issues, fix tool regressions

* fix agentic_system utils

* basic RAG seems to work

* small bug fixes for inline attachments

* Refactor custom tool execution utilities

* Bug fix, show memory retrieval steps in EventLogger

* No need for api_key for Remote providers

* add special unicode character ↵ to showcase newlines in model prompt templates

* remove api.endpoints imports

* combine datatypes.py and endpoints.py into api.py

* Attachment / add TTL api

* split batch_inference from inference

* minor import fixes

* use a single impl for ChatFormat.decode_assistant_mesage

* use interleaved_text_media_as_str() utilityt

* Fix api.datatypes imports

* Add blobfile for tiktoken

* Add ToolPromptFormat to ChatFormat.encode_message so that tools are encoded properly

* templates take optional --format={json,function_tag}

* Rag Updates

* Add `api build` subcommand -- WIP

* fix

* build + run image seems to work

* <WIP> adapters

* bunch more work to make adapters work

* api build works for conda now

* ollama remote adapter works

* Several smaller fixes to make adapters work

Also, reorganized the pattern of __init__ inside providers so
configuration can stay lightweight

* llama distribution -> llama stack + containers (WIP)

* All the new CLI for api + stack work

* Make Fireworks and Together into the Adapter format

* Some quick fixes to the CLI behavior to make it consistent

* Updated README phew

* Update cli_reference.md

* llama_toolchain/distribution -> llama_toolchain/core

* Add termcolor

* update paths

* Add a log just for consistency

* chmod +x scripts

* Fix api dependencies not getting added to configuration

* missing import lol

* Delete utils.py; move to agentic system

* Support downloading of URLs for attachments for code interpreter

* Simplify and generalize `llama api build` yay

* Update `llama stack configure` to be very simple also

* Fix stack start

* Allow building an "adhoc" distribution

* Remote `llama api []` subcommands

* Fixes to llama stack commands and update docs

* Update documentation again and add error messages to llama stack start

* llama stack start -> llama stack run

* Change name of build for less confusion

* Add pyopenapi fork to the repository, update RFC assets

* Remove conflicting annotation

* Added a "--raw" option for model template printing

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com>
2024-09-03 22:39:39 -07:00
varunfb
9777639a1c
Updated URLs and addressed feedback (#37) 2024-08-22 13:34:46 -07:00
varunfb
4930616ec7
Updated cli instructions with additonal details for each subcommands (#36) 2024-08-22 12:20:47 -07:00
Jeff Tang
b4af8c0e00
update cli ref doc: llama model template names related; separation of copy-and-pastable commands with their outputs (#34) 2024-08-21 20:41:30 -07:00
Dalton Flanagan
416097a9ea
Rename inline -> local (#24)
* Rename the "inline" distribution to "local"

* further rename

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-08-08 17:39:03 -04:00