Commit graph

325 commits

Author SHA1 Message Date
Ashwin Bharambe
238e658cdf Kill irrelevant (now) method 2024-10-09 22:11:18 -07:00
Ashwin Bharambe
77a486f176 added tool calling test 2024-10-09 22:01:28 -07:00
Ashwin Bharambe
ef4b74c935 Add a simple agents test case 2024-10-09 21:52:49 -07:00
Ashwin Bharambe
2d94ca71a9 Pass memory bank API to agent impl 2024-10-09 21:16:57 -07:00
Ashwin Bharambe
6788173ffc re-gen openapi spec 2024-10-09 21:13:11 -07:00
Ashwin Bharambe
fcd22b6baa Make Safety test work, other cleanup 2024-10-09 21:09:50 -07:00
Ashwin Bharambe
ba1f294cc6 Safety test placeholder 2024-10-09 19:35:48 -07:00
Ashwin Bharambe
b55034c0de Another round of simplification and clarity for models/shields/memory_banks stuff 2024-10-09 19:19:26 -07:00
Ashwin Bharambe
73a0a34e39 Kill non-llama guard shields 2024-10-08 17:47:03 -07:00
Ashwin Bharambe
24c61403b7 Fixes 2024-10-08 17:43:25 -07:00
Ashwin Bharambe
a86f3ae07d Update run.yaml 2024-10-08 17:41:06 -07:00
Ashwin Bharambe
924b1fba09 minor 2024-10-08 17:26:26 -07:00
Ashwin Bharambe
f40cd62306 Test fixes 2024-10-08 17:23:42 -07:00
Ashwin Bharambe
8eee5b9adc Fix server conditional awaiting on coroutines 2024-10-08 17:23:42 -07:00
Ashwin Bharambe
216e7eb4d5 Move async with SEMAPHORE inside the async methods 2024-10-08 17:23:42 -07:00
Ashwin Bharambe
4540d8bd87 move codeshield into an independent safety provider 2024-10-08 17:23:42 -07:00
Ashwin Bharambe
380b9dab90 regen openapi specs 2024-10-08 17:23:42 -07:00
Ashwin Bharambe
7f1160296c Updates to server.py to clean up streaming vs non-streaming stuff
Also make sure agent turn create is correctly marked
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
640c5c54f7 rename augment_messages 2024-10-08 17:23:42 -07:00
Ashwin Bharambe
336cf7a674 update vllm; not quite tested yet 2024-10-08 17:23:42 -07:00
Ashwin Bharambe
ed899a5dec Convert TGI to work with openai_compat 2024-10-08 17:23:42 -07:00
Ashwin Bharambe
05e73d12b3 introduce openai_compat with the completions (not chat-completions) API
This keeps the prompt encoding layer in our control (see
`chat_completion_request_to_prompt()` method)
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
0c9eb3341c Separate chat_completion stream and non-stream implementations
This is a pretty important requirement. The streaming response type is
an AsyncGenerator while the non-stream one is a single object. So far
this has worked _sometimes_ due to various pre-existing hacks (and in
some cases, just failed.)
2024-10-08 17:23:40 -07:00
Ashwin Bharambe
f8752ab8dc weaviate fixes, test now passes 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
f21ad1173e improve memory test, but it fails on chromadb :/ 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4ab6e1b81a Add really basic testing for memory API
weaviate does not work; the cluster URL seems malformed
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
dba7caf1d0 Fix fireworks and update the test
Don't look for eom_id / eot_id sadly since providers don't return the
last token
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
bbd3a02615 Make Together inference work using the raw completions API 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
3ae2b712e8 Add inference test
Run it as:

```
PROVIDER_ID=test-remote \
 PROVIDER_CONFIG=$PWD/llama_stack/providers/tests/inference/provider_config_example.yaml \
 pytest -s llama_stack/providers/tests/inference/test_inference.py \
 --tb=auto \
 --disable-warnings
```
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4fa467731e Fix a bug in meta-reference inference when stream=False
Also introduce a gross hack (to cover grosser(?) hack) to ensure
non-stream requests don't send back responses in SSE format. Not sure
which of these hacks is grosser.
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
353c7dc82a A few bug fixes for covering corner cases 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
a05599c67a Weaviate "should" work (i.e., is code-complete) but not tested 2024-10-08 17:23:02 -07:00
Zain Hasan
118c0ef105 Partial cleanup of weaviate 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
862f8ddb8d more memory related fixes; memory.client now works 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
3725e74906 memory bank registration fixes 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
099a95b614 slight upgrade to CLI 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
1550187cd8 cleanup 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
91e0063593 Introduce model_store, shield_store, memory_bank_store 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
e45a417543 more fixes, plug shutdown handlers
still, FastAPIs sigint handler is not calling ours
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
60dead6196 apis_to_serve -> apis 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
59302a86df inference registry updates 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4215cc9331 Push registration methods onto the backing providers 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
5a7b01d292 Significantly upgrade the interactive configuration experience 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
8d157a8197 rename 2024-10-08 17:23:02 -07:00
Ashwin Bharambe
f3923e3f0b Redo the { models, shields, memory_banks } typeset 2024-10-08 17:23:02 -07:00
Xi Yan
6b094b72d3
Update cli_reference.md 2024-10-08 15:32:06 -07:00
Xi Yan
ce70d21f65
Add files via upload 2024-10-08 15:29:19 -07:00
Dalton Flanagan
2d4f7d8acf
Create SECURITY.md 2024-10-08 13:30:40 -04:00
Yuan Tang
48d0d2001e
Add classifiers in setup.py (#217)
* Add classifiers in setup.py

* Update setup.py

* Update setup.py
2024-10-08 06:55:16 -07:00
Xi Yan
4d5f7459aa
[bugfix] Fix logprobs on meta-reference impl (#213)
* fix log probs

* add back LogProbsConfig

* error handling

* bugfix
2024-10-07 19:42:39 -07:00