Commit graph

77 commits

Author SHA1 Message Date
Ashwin Bharambe
bbd3a02615 Make Together inference work using the raw completions API 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
3ae2b712e8 Add inference test
Run it as:

```
PROVIDER_ID=test-remote \
 PROVIDER_CONFIG=$PWD/llama_stack/providers/tests/inference/provider_config_example.yaml \
 pytest -s llama_stack/providers/tests/inference/test_inference.py \
 --tb=auto \
 --disable-warnings
```
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4fa467731e Fix a bug in meta-reference inference when stream=False
Also introduce a gross hack (to cover grosser(?) hack) to ensure
non-stream requests don't send back responses in SSE format. Not sure
which of these hacks is grosser.
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
a05599c67a Weaviate "should" work (i.e., is code-complete) but not tested 2024-10-08 17:23:02 -07:00
Zain Hasan
118c0ef105 Partial cleanup of weaviate 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
862f8ddb8d more memory related fixes; memory.client now works 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
099a95b614 slight upgrade to CLI 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
1550187cd8 cleanup 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
91e0063593 Introduce model_store, shield_store, memory_bank_store 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
e45a417543 more fixes, plug shutdown handlers
still, FastAPIs sigint handler is not calling ours
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
60dead6196 apis_to_serve -> apis 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
59302a86df inference registry updates 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4215cc9331 Push registration methods onto the backing providers 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
5a7b01d292 Significantly upgrade the interactive configuration experience 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
f3923e3f0b Redo the { models, shields, memory_banks } typeset 2024-10-08 17:23:02 -07:00
Xi Yan
4d5f7459aa
[bugfix] Fix logprobs on meta-reference impl (#213)
* fix log probs

* add back LogProbsConfig

* error handling

* bugfix
2024-10-07 19:42:39 -07:00
Mindaugas
53d440e952
Fix ValueError in case chunks are empty (#206) 2024-10-07 08:55:06 -07:00
Ashwin Bharambe
4263764493 Fix adapter_id -> adapter_type for Weaviate 2024-10-07 06:46:32 -07:00
Zain Hasan
f4f7618120
add Weaviate memory adapter (#95) 2024-10-06 22:21:50 -07:00
Prithu Dasgupta
7abab7604b
add databricks provider (#83)
* add databricks provider

* update provider and test
2024-10-05 23:35:54 -07:00
Russell Bryant
f73e247ba1
Inline vLLM inference provider (#181)
This is just like `local` using `meta-reference` for everything except
it uses `vllm` for inference.

Docker works, but So far, `conda` is a bit easier to use with the vllm
provider. The default container base image does not include all the
necessary libraries for all vllm features. More cuda dependencies are
necessary.

I started changing this base image used in this template, but it also
required changes to the Dockerfile, so it was getting too involved to
include in the first PR.

Working so far:

* `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream True`
* `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False`

Example:

```
$ python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False
User>hello world, write me a 2 sentence poem about the moon
Assistant>
The moon glows bright in the midnight sky
A beacon of light,
```

I have only tested these models:

* `Llama3.1-8B-Instruct` - across 4 GPUs (tensor_parallel_size = 4)
* `Llama3.2-1B-Instruct` - on a single GPU (tensor_parallel_size = 1)
2024-10-05 23:34:16 -07:00
Mindaugas
9d16129603
Add 'url' property to Redis KV config (#192) 2024-10-05 11:26:26 -07:00
Ashwin Bharambe
f913b57397 fix fp8 imports 2024-10-03 14:40:21 -07:00
Ashwin Bharambe
210b71b0ba
fix prompt guard (#177)
Several other fixes to configure. Add support for 1b/3b models in ollama.
2024-10-03 11:07:53 -07:00
Ashwin Bharambe
19ce6bf009 Don't validate prompt-guard anymore 2024-10-02 20:43:57 -07:00
Ashwin Bharambe
8d049000e3 Add an introspection "Api.inspect" API 2024-10-02 15:41:14 -07:00
Adrian Cole
01d93be948
Adds markdown-link-check and fixes a broken link (#165)
Signed-off-by: Adrian Cole <adrian.cole@elastic.co>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-10-02 14:26:20 -07:00
Ashwin Bharambe
fe4aabd690 provider_id => provider_type, adapter_id => adapter_type 2024-10-02 14:05:59 -07:00
Ashwin Bharambe
df68db644b Refactoring distribution/distribution.py
This file was becoming too large and unclear what it housed. Split it
into pieces.
2024-10-02 14:03:02 -07:00
Ashwin Bharambe
227b69e6e6 Fix sample memory impl 2024-10-02 10:13:09 -07:00
Ashwin Bharambe
335dea849a fix sample impls 2024-10-02 10:10:31 -07:00
Ashwin Bharambe
4a75d922a9 Make Llama Guard 1B the default 2024-10-02 09:48:26 -07:00
Ashwin Bharambe
eb2d8a31a5
Add a RoutableProvider protocol, support for multiple routing keys (#163)
* Update configure.py to use multiple routing keys for safety
* Refactor distribution/datatypes into a providers/datatypes
* Cleanup
2024-09-30 17:30:21 -07:00
moritalous
2bd785354d
fix broken bedrock inference provider (#151) 2024-09-29 20:17:58 -07:00
Byung Chun Kim
2f096ca509
accepts not model itself. (#153) 2024-09-29 20:16:50 -07:00
Ashwin Bharambe
5bf679cab6
Pull (extract) provider data from the provider instead of pushing from the top (#148) 2024-09-29 20:00:51 -07:00
Xi Yan
4ae8c63a2b pre-commit lint 2024-09-28 16:04:41 -07:00
Ashwin Bharambe
ced5fb6388 Small cleanup for together safety implementation 2024-09-28 15:47:35 -07:00
Yogish Baliga
940968ee3f
fixing safety inference and safety adapter for new API spec. Pinned t… (#105)
* fixing safety inference and safety adapter for new API spec. Pinned the llama_models version to 0.0.24 as the latest version 0.0.35 has the model descriptor name changed. I was getting the missing package error during runtime as well, hence added the dependency to requirements.txt

* support Llama 3.2 models in Together inference adapter and cleanup Together safety adapter

* fixing model names

* adding vision guard to Together safety
2024-09-28 15:45:38 -07:00
Ashwin Bharambe
0a3999a9a4
Use inference APIs for executing Llama Guard (#121)
We should use Inference APIs to execute Llama Guard instead of directly needing to use HuggingFace modeling related code. The actual inference consideration is handled by Inference.
2024-09-28 15:40:06 -07:00
Russell Bryant
5828ffd53b
inference: Fix download command in error msg (#133)
I got this error message and tried to the run the command presented
and it didn't work. The model needs to be give with `--model-id`
instead of as a positional argument.

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-09-27 13:31:11 -07:00
Kate Plawiak
3ae1597b9b
load models using hf model id (#108) 2024-09-25 18:40:09 -07:00
Lucain
615ed4bfbc
Make TGI adapter compatible with HF Inference API (#97) 2024-09-25 14:08:31 -07:00
Xi Yan
82f420c4f0
fix safety using inference (#99) 2024-09-25 11:30:27 -07:00
Dalton Flanagan
5c4f73d52f
Drop header from LocalInference.h 2024-09-25 11:27:37 -07:00
Ashwin Bharambe
d442af0818 Add safety impl for llama guard vision 2024-09-25 11:07:19 -07:00
Dalton Flanagan
b3b0349931 Update LocalInference to use public repos 2024-09-25 11:05:51 -07:00
Ashwin Bharambe
4fcda00872 Re-apply revert 2024-09-25 11:00:43 -07:00
Ashwin Bharambe
56aed59eb4
Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
poegej
95abbf576b
Bump version to 0.0.24 (#94)
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-09-25 09:31:12 -07:00