Commit graph

53 commits

Author SHA1 Message Date
Ashwin Bharambe
d6fcdefec7 Bump version to 0.0.63 2024-12-17 23:15:27 -08:00
Ashwin Bharambe
eea478618d Bump version to 0.0.62 2024-12-17 18:19:47 -08:00
Ashwin Bharambe
02b43be9d7 Bump version to 0.0.61 2024-12-10 10:18:44 -08:00
Ashwin Bharambe
1ad691bb04 Bump version to 0.0.60 2024-12-09 22:19:51 -08:00
Ashwin Bharambe
baae4f7b51 Bump version to 0.0.59 2024-12-09 21:22:20 -08:00
Ashwin Bharambe
2c5c73f7ca Bump version to 0.0.58 2024-12-06 08:36:00 -08:00
dltn
4c7b1a8fb3 Bump version to 0.0.57 2024-12-02 19:48:46 -08:00
Dinesh Yeduguru
fe48b9fb8c Bump version to 0.0.56 2024-11-30 12:27:31 -08:00
Ashwin Bharambe
45fd73218a Bump version to 0.0.55 2024-11-23 09:03:58 -08:00
Ashwin Bharambe
2137b0af40 Bump version to 0.0.54 2024-11-21 16:28:30 -08:00
Ashwin Bharambe
dd5466e17d Bump version to 0.0.53 2024-11-19 16:44:15 -08:00
Ashwin Bharambe
394519d68a Add llama-stack-client as a legitimate dependency for llama-stack 2024-11-19 11:44:35 -08:00
Xi Yan
f6aaa9c708 Bump version to 0.0.50 2024-11-08 17:28:39 -08:00
Ashwin Bharambe
3ca294c359 Bump version to 0.0.49 2024-11-04 20:38:00 -08:00
Xi Yan
4d60ab8531 Bump version to 0.0.48 2024-11-04 17:37:32 -08:00
Ashwin Bharambe
8a3b64d1be Bump version to 0.0.47 2024-10-27 22:30:38 -07:00
Ashwin Bharambe
426d821e7f Bump version to 0.0.46 2024-10-25 13:10:55 -07:00
Ashwin Bharambe
0538cc297e Bump version to 0.0.45 2024-10-24 12:14:18 -07:00
Ashwin Bharambe
8aa8847b4a Bump version to 0.0.44 2024-10-24 08:41:39 -07:00
Xi Yan
dbb5ce43fc Bump version to 0.0.43 2024-10-21 19:10:01 -07:00
Xi Yan
209cd3d35e Bump version to 0.0.42 2024-10-14 11:13:04 -07:00
Ashwin Bharambe
89d24a07f0 Bump version to 0.0.41 2024-10-10 10:27:03 -07:00
Ashwin Bharambe
bfb0e92034 Bump version to 0.0.40 2024-10-04 09:33:43 -07:00
Ashwin Bharambe
dc75aab547 Add setuptools dependency 2024-10-04 09:30:54 -07:00
Dalton Flanagan
441052b0fd avoid jq since non-standard on macOS 2024-10-04 10:11:43 -04:00
Dalton Flanagan
9bf2e354ae CLI now requires jq 2024-10-04 10:05:59 -04:00
Ashwin Bharambe
8d41e6caa9 Bump version to 0.0.39 2024-10-03 11:31:03 -07:00
Ashwin Bharambe
c02a90e4c8 Bump version to 0.0.38 2024-10-03 05:42:47 -07:00
Ashwin Bharambe
9b93ee2c2b Bump version to 0.0.37 2024-10-02 10:15:08 -07:00
Ashwin Bharambe
a80b707ff8 Ensure we always ask for pydantic>=2 2024-10-02 06:29:06 -07:00
Ashwin Bharambe
c8fa26482d Bump version to 0.0.36 2024-09-25 11:58:15 -07:00
Ashwin Bharambe
a227edb480 Bump version to 0.0.35 2024-09-25 10:34:59 -07:00
Ashwin Bharambe
56aed59eb4
Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
Ashwin Bharambe
7b35a4c827 Bump version to 0.0.24 2024-09-24 10:15:20 -07:00
Ashwin Bharambe
cd850c16de Bump version to 0.0.23 2024-09-24 09:08:40 -07:00
Ashwin Bharambe
9eb5ec3e4b Bump version to 0.0.21 2024-09-23 14:23:21 -07:00
Xi Yan
21058be0c1 Bump version to 0.0.19 2024-09-18 15:48:38 -07:00
Hardik Shah
29ce73ff7a update requirements, added prompt-toolkit 2024-09-18 15:21:45 -07:00
Ashwin Bharambe
81ff7476d3 Bump version to 0.0.18 2024-09-17 20:08:04 -07:00
Ashwin Bharambe
9487ad8294
API Updates (#73)
* API Keys passed from Client instead of distro configuration

* delete distribution registry

* Rename the "package" word away

* Introduce a "Router" layer for providers

Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:

- The inference API should be a routing layer over inference providers,
  routed using the "model" key
- The memory banks API is another instance where various memory bank
  types will be provided by independent providers (e.g., a vector store
  is served by Chroma while a keyvalue memory can be served by Redis or
  PGVector)

This commit introduces a generalized routing layer for this purpose.

* update `apis_to_serve`

* llama_toolchain -> llama_stack

* Codemod from llama_toolchain -> llama_stack

- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent

* Moved some stuff out of common/; re-generated OpenAPI spec

* llama-toolchain -> llama-stack (hyphens)

* add control plane API

* add redis adapter + sqlite provider

* move core -> distribution

* Some more toolchain -> stack changes

* small naming shenanigans

* Removing custom tool and agent utilities and moving them client side

* Move control plane to distribution server for now

* Remove control plane from API list

* no codeshield dependency randomly plzzzzz

* Add "fire" as a dependency

* add back event loggers

* stack configure fixes

* use brave instead of bing in the example client

* add init file so it gets packaged

* add init files so it gets packaged

* Update MANIFEST

* bug fix

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00
Xi Yan
f294eac5f5 Bump version to 0.0.17 2024-09-16 13:10:05 -07:00
Ashwin Bharambe
53ab18d6bb Bump version to 0.0.16 2024-09-14 08:09:45 -07:00
Ashwin Bharambe
7a283ea076 Bump version to 0.0.15 2024-09-13 17:23:12 -07:00
Xi Yan
6a863f9b78 Bump version to 0.0.14 2024-09-12 21:24:07 -07:00
Yufei (Benny) Chen
406c3b24d4
upgrade llama_models (#55) 2024-09-06 12:03:13 -07:00
Ashwin Bharambe
7bc7785b0d
API Updates: fleshing out RAG APIs, introduce "llama stack" CLI command (#51)
* add tools to chat completion request

* use templates for generating system prompts

* Moved ToolPromptFormat and jinja templates to llama_models.llama3.api

* <WIP> memory changes

- inlined AgenticSystemInstanceConfig so API feels more ergonomic
- renamed it to AgentConfig, AgentInstance -> Agent
- added a MemoryConfig and `memory` parameter
- added `attachments` to input and `output_attachments` to the response

- some naming changes

* InterleavedTextAttachment -> InterleavedTextMedia, introduce memory tool

* flesh out memory banks API

* agentic loop has a RAG implementation

* faiss provider implementation

* memory client works

* re-work tool definitions, fix FastAPI issues, fix tool regressions

* fix agentic_system utils

* basic RAG seems to work

* small bug fixes for inline attachments

* Refactor custom tool execution utilities

* Bug fix, show memory retrieval steps in EventLogger

* No need for api_key for Remote providers

* add special unicode character ↵ to showcase newlines in model prompt templates

* remove api.endpoints imports

* combine datatypes.py and endpoints.py into api.py

* Attachment / add TTL api

* split batch_inference from inference

* minor import fixes

* use a single impl for ChatFormat.decode_assistant_mesage

* use interleaved_text_media_as_str() utilityt

* Fix api.datatypes imports

* Add blobfile for tiktoken

* Add ToolPromptFormat to ChatFormat.encode_message so that tools are encoded properly

* templates take optional --format={json,function_tag}

* Rag Updates

* Add `api build` subcommand -- WIP

* fix

* build + run image seems to work

* <WIP> adapters

* bunch more work to make adapters work

* api build works for conda now

* ollama remote adapter works

* Several smaller fixes to make adapters work

Also, reorganized the pattern of __init__ inside providers so
configuration can stay lightweight

* llama distribution -> llama stack + containers (WIP)

* All the new CLI for api + stack work

* Make Fireworks and Together into the Adapter format

* Some quick fixes to the CLI behavior to make it consistent

* Updated README phew

* Update cli_reference.md

* llama_toolchain/distribution -> llama_toolchain/core

* Add termcolor

* update paths

* Add a log just for consistency

* chmod +x scripts

* Fix api dependencies not getting added to configuration

* missing import lol

* Delete utils.py; move to agentic system

* Support downloading of URLs for attachments for code interpreter

* Simplify and generalize `llama api build` yay

* Update `llama stack configure` to be very simple also

* Fix stack start

* Allow building an "adhoc" distribution

* Remote `llama api []` subcommands

* Fixes to llama stack commands and update docs

* Update documentation again and add error messages to llama stack start

* llama stack start -> llama stack run

* Change name of build for less confusion

* Add pyopenapi fork to the repository, update RFC assets

* Remove conflicting annotation

* Added a "--raw" option for model template printing

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com>
2024-09-03 22:39:39 -07:00
Ashwin Bharambe
870cd7bb8b Add blobfile for tiktoken 2024-08-26 14:50:53 -07:00
Hardik Shah
37da47ef8e upgrade pydantic to latest 2024-08-12 15:14:21 -07:00
Ashwin Bharambe
e830814399
Introduce Llama stack distributions (#22)
* Add distribution CLI scaffolding

* More progress towards `llama distribution install`

* getting closer to a distro definition, distro install + configure works

* Distribution server now functioning

* read existing configuration, save enums properly

* Remove inference uvicorn server entrypoint and llama inference CLI command

* updated dependency and client model name

* Improved exception handling

* local imports for faster cli

* undo a typo, add a passthrough distribution

* implement full-passthrough in the server

* add safety adapters, configuration handling, server + clients

* cleanup, moving stuff to common, nuke utils

* Add a Path() wrapper at the earliest place

* fixes

* Bring agentic system api to toolchain

Add adapter dependencies and resolve adapters using a topological sort

* refactor to reduce size of `agentic_system`

* move straggler files and fix some important existing bugs

* ApiSurface -> Api

* refactor a method out

* Adapter -> Provider

* Make each inference provider into its own subdirectory

* installation fixes

* Rename Distribution -> DistributionSpec, simplify RemoteProviders

* dict key instead of attr

* update inference config to take model and not model_dir

* Fix passthrough streaming, send headers properly not part of body :facepalm

* update safety to use model sku ids and not model dirs

* Update cli_reference.md

* minor fixes

* add DistributionConfig, fix a bug in model download

* Make install + start scripts do proper configuration automatically

* Update CLI_reference

* Nuke fp8_requirements, fold fbgemm into common requirements

* Update README, add newline between API surface configurations

* Refactor download functionality out of the Command so can be reused

* Add `llama model download` alias for `llama download`

* Show message about checksum file so users can check themselves

* Simpler intro statements

* get ollama working

* Reduce a bunch of dependencies from toolchain

Some improvements to the distribution install script

* Avoid using `conda run` since it buffers everything

* update dependencies and rely on LLAMA_TOOLCHAIN_DIR for dev purposes

* add validation for configuration input

* resort imports

* make optional subclasses default to yes for configuration

* Remove additional_pip_packages; move deps to providers

* for inline make 8b model the default

* Add scripts to MANIFEST

* allow installing from test.pypi.org

* Fix #2 to help with testing packages

* Must install llama-models at that same version first

* fix PIP_ARGS

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Hardik Shah <hjshah@meta.com>
2024-08-08 13:38:41 -07:00
Hardik Shah
156bfa0e15
Added Ollama as an inference impl (#20)
* fix non-streaming api in inference server

* unit test for inline inference

* Added non-streaming ollama inference impl

* add streaming support for ollama inference with tests

* addressing comments

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2024-07-31 22:08:37 -07:00