Commit graph

304 commits

Author SHA1 Message Date
Ashwin Bharambe
05e73d12b3 introduce openai_compat with the completions (not chat-completions) API
This keeps the prompt encoding layer in our control (see
`chat_completion_request_to_prompt()` method)
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
0c9eb3341c Separate chat_completion stream and non-stream implementations
This is a pretty important requirement. The streaming response type is
an AsyncGenerator while the non-stream one is a single object. So far
this has worked _sometimes_ due to various pre-existing hacks (and in
some cases, just failed.)
2024-10-08 17:23:40 -07:00
Ashwin Bharambe
f8752ab8dc weaviate fixes, test now passes 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
f21ad1173e improve memory test, but it fails on chromadb :/ 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4ab6e1b81a Add really basic testing for memory API
weaviate does not work; the cluster URL seems malformed
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
dba7caf1d0 Fix fireworks and update the test
Don't look for eom_id / eot_id sadly since providers don't return the
last token
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
bbd3a02615 Make Together inference work using the raw completions API 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
3ae2b712e8 Add inference test
Run it as:

```
PROVIDER_ID=test-remote \
 PROVIDER_CONFIG=$PWD/llama_stack/providers/tests/inference/provider_config_example.yaml \
 pytest -s llama_stack/providers/tests/inference/test_inference.py \
 --tb=auto \
 --disable-warnings
```
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4fa467731e Fix a bug in meta-reference inference when stream=False
Also introduce a gross hack (to cover grosser(?) hack) to ensure
non-stream requests don't send back responses in SSE format. Not sure
which of these hacks is grosser.
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
353c7dc82a A few bug fixes for covering corner cases 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
a05599c67a Weaviate "should" work (i.e., is code-complete) but not tested 2024-10-08 17:23:02 -07:00
Zain Hasan
118c0ef105 Partial cleanup of weaviate 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
862f8ddb8d more memory related fixes; memory.client now works 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
3725e74906 memory bank registration fixes 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
099a95b614 slight upgrade to CLI 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
1550187cd8 cleanup 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
91e0063593 Introduce model_store, shield_store, memory_bank_store 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
e45a417543 more fixes, plug shutdown handlers
still, FastAPIs sigint handler is not calling ours
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
60dead6196 apis_to_serve -> apis 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
59302a86df inference registry updates 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4215cc9331 Push registration methods onto the backing providers 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
5a7b01d292 Significantly upgrade the interactive configuration experience 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
8d157a8197 rename 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
f3923e3f0b Redo the { models, shields, memory_banks } typeset 2024-10-08 17:23:02 -07:00
Xi Yan
6b094b72d3
Update cli_reference.md 2024-10-08 15:32:06 -07:00
Xi Yan
ce70d21f65
Add files via upload 2024-10-08 15:29:19 -07:00
Dalton Flanagan
2d4f7d8acf
Create SECURITY.md 2024-10-08 13:30:40 -04:00
Yuan Tang
48d0d2001e
Add classifiers in setup.py (#217)
* Add classifiers in setup.py

* Update setup.py

* Update setup.py
2024-10-08 06:55:16 -07:00
Xi Yan
4d5f7459aa
[bugfix] Fix logprobs on meta-reference impl (#213)
* fix log probs

* add back LogProbsConfig

* error handling

* bugfix
2024-10-07 19:42:39 -07:00
Yuan Tang
e4ae09d090
Add .idea to .gitignore (#216)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-10-07 19:38:43 -07:00
Xi Yan
16ba0fa06f
Update README.md 2024-10-07 11:24:27 -07:00
Russell Bryant
996efa9b42
README.md: Add vLLM to providers table (#207)
Signed-off-by: Russell Bryant <russell.bryant@gmail.com>
2024-10-07 10:26:52 -07:00
Xi Yan
2366e18873
refactor docs (#209) 2024-10-07 10:21:26 -07:00
Mindaugas
53d440e952
Fix ValueError in case chunks are empty (#206) 2024-10-07 08:55:06 -07:00
Russell Bryant
a4e775c465
download: improve help text (#204) 2024-10-07 08:40:04 -07:00
Ashwin Bharambe
4263764493 Fix adapter_id -> adapter_type for Weaviate 2024-10-07 06:46:32 -07:00
Zain Hasan
f4f7618120
add Weaviate memory adapter (#95) 2024-10-06 22:21:50 -07:00
Xi Yan
27587f32bc fix db path 2024-10-06 11:46:08 -07:00
Xi Yan
cfe3ad33b3 fix db path 2024-10-06 11:45:35 -07:00
Prithu Dasgupta
7abab7604b
add databricks provider (#83)
* add databricks provider

* update provider and test
2024-10-05 23:35:54 -07:00
Russell Bryant
f73e247ba1
Inline vLLM inference provider (#181)
This is just like `local` using `meta-reference` for everything except
it uses `vllm` for inference.

Docker works, but So far, `conda` is a bit easier to use with the vllm
provider. The default container base image does not include all the
necessary libraries for all vllm features. More cuda dependencies are
necessary.

I started changing this base image used in this template, but it also
required changes to the Dockerfile, so it was getting too involved to
include in the first PR.

Working so far:

* `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream True`
* `python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False`

Example:

```
$ python -m llama_stack.apis.inference.client localhost 5000 --model Llama3.2-1B-Instruct --stream False
User>hello world, write me a 2 sentence poem about the moon
Assistant>
The moon glows bright in the midnight sky
A beacon of light,
```

I have only tested these models:

* `Llama3.1-8B-Instruct` - across 4 GPUs (tensor_parallel_size = 4)
* `Llama3.2-1B-Instruct` - on a single GPU (tensor_parallel_size = 1)
2024-10-05 23:34:16 -07:00
Xi Yan
29138a5167
Update getting_started.md 2024-10-05 12:28:02 -07:00
Xi Yan
6d4013ac99
Update getting_started.md 2024-10-05 12:14:59 -07:00
Mindaugas
9d16129603
Add 'url' property to Redis KV config (#192) 2024-10-05 11:26:26 -07:00
Ashwin Bharambe
bfb0e92034 Bump version to 0.0.40 2024-10-04 09:33:43 -07:00
Ashwin Bharambe
dc75aab547 Add setuptools dependency 2024-10-04 09:30:54 -07:00
Dalton Flanagan
441052b0fd avoid jq since non-standard on macOS 2024-10-04 10:11:43 -04:00
Dalton Flanagan
9bf2e354ae CLI now requires jq 2024-10-04 10:05:59 -04:00
raghotham
00ed9a410b
Update getting_started.md
update discord invite link
2024-10-03 23:28:43 -07:00
AshleyT3
734f59d3b8
Check that the model is found before use. (#182) 2024-10-03 23:24:47 -07:00