Commit graph

62 commits

Author SHA1 Message Date
Ashwin Bharambe
8a175129fc fix weaviate, update run.yamls 2024-10-09 22:15:28 -07:00
Ashwin Bharambe
b55034c0de Another round of simplification and clarity for models/shields/memory_banks stuff 2024-10-09 19:19:26 -07:00
Ashwin Bharambe
73a0a34e39 Kill non-llama guard shields 2024-10-08 17:47:03 -07:00
Ashwin Bharambe
24c61403b7 Fixes 2024-10-08 17:43:25 -07:00
Ashwin Bharambe
a86f3ae07d Update run.yaml 2024-10-08 17:41:06 -07:00
Ashwin Bharambe
924b1fba09 minor 2024-10-08 17:26:26 -07:00
Ashwin Bharambe
8eee5b9adc Fix server conditional awaiting on coroutines 2024-10-08 17:23:42 -07:00
Ashwin Bharambe
7f1160296c Updates to server.py to clean up streaming vs non-streaming stuff
Also make sure agent turn create is correctly marked
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
0c9eb3341c Separate chat_completion stream and non-stream implementations
This is a pretty important requirement. The streaming response type is
an AsyncGenerator while the non-stream one is a single object. So far
this has worked _sometimes_ due to various pre-existing hacks (and in
some cases, just failed.)
2024-10-08 17:23:40 -07:00
Ashwin Bharambe
4ab6e1b81a Add really basic testing for memory API
weaviate does not work; the cluster URL seems malformed
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
3ae2b712e8 Add inference test
Run it as:

```
PROVIDER_ID=test-remote \
 PROVIDER_CONFIG=$PWD/llama_stack/providers/tests/inference/provider_config_example.yaml \
 pytest -s llama_stack/providers/tests/inference/test_inference.py \
 --tb=auto \
 --disable-warnings
```
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4fa467731e Fix a bug in meta-reference inference when stream=False
Also introduce a gross hack (to cover grosser(?) hack) to ensure
non-stream requests don't send back responses in SSE format. Not sure
which of these hacks is grosser.
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
353c7dc82a A few bug fixes for covering corner cases 2024-10-08 17:23:02 -07:00
Zain Hasan
118c0ef105 Partial cleanup of weaviate 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
3725e74906 memory bank registration fixes 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
91e0063593 Introduce model_store, shield_store, memory_bank_store 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
e45a417543 more fixes, plug shutdown handlers
still, FastAPIs sigint handler is not calling ours
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
60dead6196 apis_to_serve -> apis 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
59302a86df inference registry updates 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4215cc9331 Push registration methods onto the backing providers 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
5a7b01d292 Significantly upgrade the interactive configuration experience 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
f3923e3f0b Redo the { models, shields, memory_banks } typeset 2024-10-08 17:23:02 -07:00
Xi Yan
27587f32bc fix db path 2024-10-06 11:46:08 -07:00
Xi Yan
cfe3ad33b3 fix db path 2024-10-06 11:45:35 -07:00
Prithu Dasgupta
7abab7604b
add databricks provider (#83)
* add databricks provider

* update provider and test
2024-10-05 23:35:54 -07:00
Russell Bryant
f73e247ba1
Inline vLLM inference provider (#181)
This is just like `local` using `meta-reference` for everything except
it uses `vllm` for inference.

Docker works, but So far, `conda` is a bit easier to use with the vllm
provider. The default container base image does not include all the
necessary libraries for all vllm features. More cuda dependencies are
necessary.

I started changing this base image used in this template, but it also
required changes to the Dockerfile, so it was getting too involved to
include in the first PR.

Working so far:

* `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream True`
* `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False`

Example:

```
$ python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False
User>hello world, write me a 2 sentence poem about the moon
Assistant>
The moon glows bright in the midnight sky
A beacon of light,
```

I have only tested these models:

* `Llama3.1-8B-Instruct` - across 4 GPUs (tensor_parallel_size = 4)
* `Llama3.2-1B-Instruct` - on a single GPU (tensor_parallel_size = 1)
2024-10-05 23:34:16 -07:00
Ashwin Bharambe
7f49315822 Kill a derpy import 2024-10-03 11:25:58 -07:00
Xi Yan
62d266f018
[CLI] avoid configure twice (#171)
* avoid configure twice

* cleanup tmp config

* update output msg

* address comment

* update msg

* script update
2024-10-03 11:20:54 -07:00
Ashwin Bharambe
210b71b0ba
fix prompt guard (#177)
Several other fixes to configure. Add support for 1b/3b models in ollama.
2024-10-03 11:07:53 -07:00
Ashwin Bharambe
e9f6150588 A bit cleanup to avoid breakages 2024-10-02 21:31:09 -07:00
Xi Yan
703ab9385f fix routing table key list 2024-10-02 18:23:31 -07:00
Ashwin Bharambe
8d049000e3 Add an introspection "Api.inspect" API 2024-10-02 15:41:14 -07:00
Ashwin Bharambe
fe4aabd690 provider_id => provider_type, adapter_id => adapter_type 2024-10-02 14:05:59 -07:00
Ashwin Bharambe
df68db644b Refactoring distribution/distribution.py
This file was becoming too large and unclear what it housed. Split it
into pieces.
2024-10-02 14:03:02 -07:00
Russell Bryant
204eb6d810
docker: Check for selinux before using --security-opt (#167)
Before using `--security-opt label=disable`, check that SELinux is
enabled. Otherwise, the option is not relevant.

This fixes errors on Mac.

Closes #166

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-02 10:37:41 -07:00
Ashwin Bharambe
bf0d111c53 Fix build script 2024-10-02 10:04:23 -07:00
Ashwin Bharambe
eb2d8a31a5
Add a RoutableProvider protocol, support for multiple routing keys (#163)
* Update configure.py to use multiple routing keys for safety
* Refactor distribution/datatypes into a providers/datatypes
* Cleanup
2024-09-30 17:30:21 -07:00
Xi Yan
d28c3dfe0f
[CLI] simplify docker run (#159)
* bake run.yaml inside docker, simplify run

* add docker template examples

* delete generated Dockerfile

* unique deps

* clean up debug

* default entrypoint

* address comments, update output msg

* update msg

* build output msg

* configure msg

* unique special_deps

* remove quotes in configure
2024-09-30 15:04:04 -07:00
Russell Bryant
8db49de961
docker: Install in editable mode for dev purposes (#160)
While rebuilding a stack using the `docker` image type and having
`LLAMA_STACK_DIR` set so it installs `llama_stack` from my local
source, I noticed that once built, it just used the image build cache
and didn't pull in changes to my source.

1. Install in editable mode (`pip install -e`) for dev purposes.

2. Mount the source into the container for `configure` and `run` so
   that the editable install works.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-30 11:56:31 -07:00
Russell Bryant
cb36be320f
Fix podman+selinux compatibility (#132)
When I ran `llama stack configure` for my `docker` based stack on my
system using podman + SELinux (CentOS Stream 9), The `podman run`
command failed due to SELinux blocking access to the volume mount.

As a simple fix, disable SELinux label checking.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-29 20:19:44 -07:00
Ashwin Bharambe
5bf679cab6
Pull (extract) provider data from the provider instead of pushing from the top (#148) 2024-09-29 20:00:51 -07:00
Xi Yan
6a8c2ae1df
[CLI] remove dependency on CONDA_PREFIX in CLI (#144)
* remove dependency on CONDA_PREFIX in CLI

* lint

* typo

* more robust
2024-09-28 16:46:47 -07:00
Xi Yan
4ae8c63a2b pre-commit lint 2024-09-28 16:04:41 -07:00
Xi Yan
6236634d84
[bugfix] fix duplicate api endpoints (#139)
* fix server api to serve

* remove print
2024-09-27 15:32:50 -07:00
Xi Yan
208b861289
add env for LLAMA_STACK_CONFIG_DIR (#137) 2024-09-27 14:16:46 -07:00
Xi Yan
ca7602a642 fix #100 2024-09-25 15:11:56 -07:00
Lucain
615ed4bfbc
Make TGI adapter compatible with HF Inference API (#97) 2024-09-25 14:08:31 -07:00
Ashwin Bharambe
56aed59eb4
Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
poegej
95abbf576b
Bump version to 0.0.24 (#94)
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-09-25 09:31:12 -07:00
Yogish Baliga
b85d675c6f Adding safety adapter for Together 2024-09-24 18:35:48 -07:00