Commit graph

30 commits

Author SHA1 Message Date
Ashwin Bharambe
eb2d8a31a5
Add a RoutableProvider protocol, support for multiple routing keys (#163)
* Update configure.py to use multiple routing keys for safety
* Refactor distribution/datatypes into a providers/datatypes
* Cleanup
2024-09-30 17:30:21 -07:00
Xi Yan
4ae8c63a2b pre-commit lint 2024-09-28 16:04:41 -07:00
Ashwin Bharambe
0a3999a9a4
Use inference APIs for executing Llama Guard (#121)
We should use Inference APIs to execute Llama Guard instead of directly needing to use HuggingFace modeling related code. The actual inference consideration is handled by Inference.
2024-09-28 15:40:06 -07:00
Russell Bryant
5828ffd53b
inference: Fix download command in error msg (#133)
I got this error message and tried to the run the command presented
and it didn't work. The model needs to be give with `--model-id`
instead of as a positional argument.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-27 13:31:11 -07:00
Kate Plawiak
3ae1597b9b
load models using hf model id (#108) 2024-09-25 18:40:09 -07:00
Xi Yan
82f420c4f0
fix safety using inference (#99) 2024-09-25 11:30:27 -07:00
Dalton Flanagan
5c4f73d52f
Drop header from LocalInference.h 2024-09-25 11:27:37 -07:00
Ashwin Bharambe
d442af0818 Add safety impl for llama guard vision 2024-09-25 11:07:19 -07:00
Dalton Flanagan
b3b0349931 Update LocalInference to use public repos 2024-09-25 11:05:51 -07:00
Ashwin Bharambe
4fcda00872 Re-apply revert 2024-09-25 11:00:43 -07:00
Ashwin Bharambe
56aed59eb4
Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
Xi Yan
45be9f3b85 fix agent's embedding model config 2024-09-24 22:49:49 -07:00
Ashwin Bharambe
a2465f3f9c Revert parts of 0d2eb3bd25 2024-09-24 19:20:51 -07:00
Ashwin Bharambe
0d2eb3bd25 Use inference APIs for running llama guard
Test Plan:

First, start a TGI container with `meta-llama/Llama-Guard-3-8B` model
serving on port 5099. See https://github.com/meta-llama/llama-stack/pull/53 and its
description for how.

Then run llama-stack with the following run config:

```
image_name: safety
docker_image: null
conda_env: safety
apis_to_serve:
- models
- inference
- shields
- safety
api_providers:
  inference:
    providers:
    - remote::tgi
  safety:
    providers:
    - meta-reference
  telemetry:
    provider_id: meta-reference
    config: {}
routing_table:
  inference:
  - provider_id: remote::tgi
    config:
      url: http://localhost:5099
      api_token: null
      hf_endpoint_name: null
    routing_key: Llama-Guard-3-8B
  safety:
  - provider_id: meta-reference
    config:
      llama_guard_shield:
        model: Llama-Guard-3-8B
        excluded_categories: []
        disable_input_check: false
        disable_output_check: false
      prompt_guard_shield: null
    routing_key: llama_guard
```

Now simply run `python -m llama_stack.apis.safety.client localhost
<port>` and check that the llama_guard shield calls run correctly. (The
injection_shield calls fail as expected since we have not set up a
router for them.)
2024-09-24 17:02:57 -07:00
Xi Yan
d04cd97aba remove providers/impls/sqlite/* 2024-09-24 01:03:40 -07:00
Xi Yan
f92ff86b96 fix shields in agents safety 2024-09-23 21:22:22 -07:00
Ashwin Bharambe
c9005e95ed Another attempt at a proper bugfix for safety violations 2024-09-23 19:06:30 -07:00
Xi Yan
e5bdd6615a bug fix for safety violation 2024-09-23 18:17:15 -07:00
Xi Yan
70fb70a71c fix URL issue with agents 2024-09-23 16:44:25 -07:00
Ashwin Bharambe
ec4fc800cc
[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92)
This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.

Most important bits:

* Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.

* We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
  - routing model A to ollama and model B to a remote provider like Together
  - routing shield A to local impl while shield B to a remote provider like Bedrock
  - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis

* Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.
2024-09-23 14:22:22 -07:00
Hardik Shah
8bf8c07eb3 Respect user sent instructions in agent config and add them to system prompt 2024-09-21 16:46:10 -07:00
Ashwin Bharambe
132f9429b1 Add a test for CLI, but not fully done so disabled 2024-09-19 13:27:07 -07:00
Ashwin Bharambe
8b3ffa33de Add another test case 2024-09-19 13:02:57 -07:00
Ashwin Bharambe
abb43936ab Add a test runner and 2 very simple tests for agents 2024-09-19 12:22:48 -07:00
Ashwin Bharambe
f5eda1decf Add default for max_seq_len 2024-09-18 21:59:10 -07:00
Ashwin Bharambe
8cdc2f0cfb No RunShieldRequest 2024-09-18 20:38:21 -07:00
Xi Yan
e6fdb9df29
fix context retriever (#75) 2024-09-18 08:24:36 -07:00
Ashwin Bharambe
9fd431e710 make shield imports more lazy 2024-09-17 21:27:37 -07:00
Ashwin Bharambe
25adc83de8 Fix for safety 2024-09-17 19:56:58 -07:00
Ashwin Bharambe
9487ad8294
API Updates (#73)
* API Keys passed from Client instead of distro configuration

* delete distribution registry

* Rename the "package" word away

* Introduce a "Router" layer for providers

Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:

- The inference API should be a routing layer over inference providers,
  routed using the "model" key
- The memory banks API is another instance where various memory bank
  types will be provided by independent providers (e.g., a vector store
  is served by Chroma while a keyvalue memory can be served by Redis or
  PGVector)

This commit introduces a generalized routing layer for this purpose.

* update `apis_to_serve`

* llama_toolchain -> llama_stack

* Codemod from llama_toolchain -> llama_stack

- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent

* Moved some stuff out of common/; re-generated OpenAPI spec

* llama-toolchain -> llama-stack (hyphens)

* add control plane API

* add redis adapter + sqlite provider

* move core -> distribution

* Some more toolchain -> stack changes

* small naming shenanigans

* Removing custom tool and agent utilities and moving them client side

* Move control plane to distribution server for now

* Remove control plane from API list

* no codeshield dependency randomly plzzzzz

* Add "fire" as a dependency

* add back event loggers

* stack configure fixes

* use brave instead of bing in the example client

* add init file so it gets packaged

* add init files so it gets packaged

* Update MANIFEST

* bug fix

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00