Commit graph

1278 commits

Author SHA1 Message Date
Russell Bryant
8db49de961
docker: Install in editable mode for dev purposes (#160)
While rebuilding a stack using the `docker` image type and having
`LLAMA_STACK_DIR` set so it installs `llama_stack` from my local
source, I noticed that once built, it just used the image build cache
and didn't pull in changes to my source.

1. Install in editable mode (`pip install -e`) for dev purposes.

2. Mount the source into the container for `configure` and `run` so
   that the editable install works.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-30 11:56:31 -07:00
Russell Bryant
cb36be320f
Fix podman+selinux compatibility (#132)
When I ran `llama stack configure` for my `docker` based stack on my
system using podman + SELinux (CentOS Stream 9), The `podman run`
command failed due to SELinux blocking access to the volume mount.

As a simple fix, disable SELinux label checking.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-29 20:19:44 -07:00
moritalous
2bd785354d
fix broken bedrock inference provider (#151) 2024-09-29 20:17:58 -07:00
Byung Chun Kim
2f096ca509
accepts not model itself. (#153) 2024-09-29 20:16:50 -07:00
Ashwin Bharambe
5bf679cab6
Pull (extract) provider data from the provider instead of pushing from the top (#148) 2024-09-29 20:00:51 -07:00
Xi Yan
f6a6598d1a
[bugfix] fix #146 (#147)
* more robust image type

* lint
2024-09-28 17:47:00 -07:00
Xi Yan
6a8c2ae1df
[CLI] remove dependency on CONDA_PREFIX in CLI (#144)
* remove dependency on CONDA_PREFIX in CLI

* lint

* typo

* more robust
2024-09-28 16:46:47 -07:00
Ashwin Bharambe
fe460ba103 Avoid importing a lot of stuff 2024-09-28 16:06:10 -07:00
Xi Yan
4ae8c63a2b pre-commit lint 2024-09-28 16:04:41 -07:00
Ashwin Bharambe
ced5fb6388 Small cleanup for together safety implementation 2024-09-28 15:47:35 -07:00
Yogish Baliga
940968ee3f
fixing safety inference and safety adapter for new API spec. Pinned t… (#105)
* fixing safety inference and safety adapter for new API spec. Pinned the llama_models version to 0.0.24 as the latest version 0.0.35 has the model descriptor name changed. I was getting the missing package error during runtime as well, hence added the dependency to requirements.txt

* support Llama 3.2 models in Together inference adapter and cleanup Together safety adapter

* fixing model names

* adding vision guard to Together safety
2024-09-28 15:45:38 -07:00
Ashwin Bharambe
0a3999a9a4
Use inference APIs for executing Llama Guard (#121)
We should use Inference APIs to execute Llama Guard instead of directly needing to use HuggingFace modeling related code. The actual inference consideration is handled by Inference.
2024-09-28 15:40:06 -07:00
Xi Yan
6236634d84
[bugfix] fix duplicate api endpoints (#139)
* fix server api to serve

* remove print
2024-09-27 15:32:50 -07:00
Xi Yan
208b861289
add env for LLAMA_STACK_CONFIG_DIR (#137) 2024-09-27 14:16:46 -07:00
Russell Bryant
f70c88ab7a
configure: Fix a error msg typo (#131)
I got this error message and noticed the typo in the message. It
directed the user to run `llama stack build first`, which is not a
valid command.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-27 14:00:25 -07:00
Russell Bryant
5828ffd53b
inference: Fix download command in error msg (#133)
I got this error message and tried to the run the command presented
and it didn't work. The model needs to be give with `--model-id`
instead of as a positional argument.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-27 13:31:11 -07:00
Russell Bryant
fb9e6371ec
Validate name in llama stack build (#128)
The first time I ran `llama stack build`, I quickly hit enter at the
first prompt asking for a name, assuming it would use the default
given in the help text. This caused a failure later on that wasn't
very obvious. I was using the `docker` format and a blank name caused
an invalid tag format that failed the image build.

This change adds validation for the `name` parameter to ensure it's
not empty before proceeding.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-27 13:30:55 -07:00
Mark Sze
3c99f08267
minor typo and HuggingFace -> Hugging Face (#113) 2024-09-26 09:48:23 -07:00
Kate Plawiak
3ae1597b9b
load models using hf model id (#108) 2024-09-25 18:40:09 -07:00
Xi Yan
ca7602a642 fix #100 2024-09-25 15:11:56 -07:00
Lucain
615ed4bfbc
Make TGI adapter compatible with HF Inference API (#97) 2024-09-25 14:08:31 -07:00
Xi Yan
82f420c4f0
fix safety using inference (#99) 2024-09-25 11:30:27 -07:00
Dalton Flanagan
5c4f73d52f
Drop header from LocalInference.h 2024-09-25 11:27:37 -07:00
Ashwin Bharambe
d442af0818 Add safety impl for llama guard vision 2024-09-25 11:07:19 -07:00
Dalton Flanagan
b3b0349931 Update LocalInference to use public repos 2024-09-25 11:05:51 -07:00
Ashwin Bharambe
4fcda00872 Re-apply revert 2024-09-25 11:00:43 -07:00
Ashwin Bharambe
d82a9d94e3 Small fix to the prompt-format error message 2024-09-25 10:56:13 -07:00
Ashwin Bharambe
56aed59eb4
Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
poegej
95abbf576b
Bump version to 0.0.24 (#94)
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-09-25 09:31:12 -07:00
Ashwin Bharambe
ed8d10775a Remove key 2024-09-25 05:53:49 -07:00
Xi Yan
45be9f3b85 fix agent's embedding model config 2024-09-24 22:49:49 -07:00
Ashwin Bharambe
f45705cd10 Some lightweight cleanup and renaming for bedrock safety adapter 2024-09-24 19:29:56 -07:00
Ashwin Bharambe
a2465f3f9c Revert parts of 0d2eb3bd25 2024-09-24 19:20:51 -07:00
rsgrewal-aws
059e50b389
[aws-bedrock] Support for Bedrock Safety adapter (#96) 2024-09-24 19:16:55 -07:00
Yogish Baliga
b85d675c6f Adding safety adapter for Together 2024-09-24 18:35:48 -07:00
Ashwin Bharambe
0d2eb3bd25 Use inference APIs for running llama guard
Test Plan:

First, start a TGI container with `meta-llama/Llama-Guard-3-8B` model
serving on port 5099. See https://github.com/meta-llama/llama-stack/pull/53 and its
description for how.

Then run llama-stack with the following run config:

```
image_name: safety
docker_image: null
conda_env: safety
apis_to_serve:
- models
- inference
- shields
- safety
api_providers:
  inference:
    providers:
    - remote::tgi
  safety:
    providers:
    - meta-reference
  telemetry:
    provider_id: meta-reference
    config: {}
routing_table:
  inference:
  - provider_id: remote::tgi
    config:
      url: http://localhost:5099
      api_token: null
      hf_endpoint_name: null
    routing_key: Llama-Guard-3-8B
  safety:
  - provider_id: meta-reference
    config:
      llama_guard_shield:
        model: Llama-Guard-3-8B
        excluded_categories: []
        disable_input_check: false
        disable_output_check: false
      prompt_guard_shield: null
    routing_key: llama_guard
```

Now simply run `python -m llama_stack.apis.safety.client localhost
<port>` and check that the llama_guard shield calls run correctly. (The
injection_shield calls fail as expected since we have not set up a
router for them.)
2024-09-24 17:02:57 -07:00
Xi Yan
c4534217c8 fix cli describe 2024-09-24 14:41:19 -07:00
Ashwin Bharambe
00352bd251 Respect passed in embedding model 2024-09-24 14:40:28 -07:00
Ashwin Bharambe
bda974e660 Make the "all-remote" distribution lightweight in dependencies and size 2024-09-24 14:18:57 -07:00
Ashwin Bharambe
445536de64 Add httpx to core server deps 2024-09-24 10:42:04 -07:00
Ashwin Bharambe
8d511cdf91 Make build_conda_env a bit more robust 2024-09-24 10:12:07 -07:00
Xi Yan
d04cd97aba remove providers/impls/sqlite/* 2024-09-24 01:03:40 -07:00
Ashwin Bharambe
e617273d8c attribute changed (model_args -> arch_args) 2024-09-23 21:44:26 -07:00
Ashwin Bharambe
f136f802b1 Somewhat better error handling 2024-09-23 21:40:14 -07:00
Xi Yan
f92ff86b96 fix shields in agents safety 2024-09-23 21:22:22 -07:00
Ashwin Bharambe
c9005e95ed Another attempt at a proper bugfix for safety violations 2024-09-23 19:06:30 -07:00
Xi Yan
e5bdd6615a bug fix for safety violation 2024-09-23 18:17:15 -07:00
Xi Yan
70fb70a71c fix URL issue with agents 2024-09-23 16:44:25 -07:00
Ashwin Bharambe
ec4fc800cc
[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92)
This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.

Most important bits:

* Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.

* We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
  - routing model A to ollama and model B to a remote provider like Together
  - routing shield A to local impl while shield B to a remote provider like Bedrock
  - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis

* Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.
2024-09-23 14:22:22 -07:00
Hardik Shah
8bf8c07eb3 Respect user sent instructions in agent config and add them to system prompt 2024-09-21 16:46:10 -07:00