Ashwin Bharambe
77a486f176
added tool calling test
2024-10-09 22:01:28 -07:00
Ashwin Bharambe
ef4b74c935
Add a simple agents test case
2024-10-09 21:52:49 -07:00
Ashwin Bharambe
2d94ca71a9
Pass memory bank API to agent impl
2024-10-09 21:16:57 -07:00
Ashwin Bharambe
6788173ffc
re-gen openapi spec
2024-10-09 21:13:11 -07:00
Ashwin Bharambe
fcd22b6baa
Make Safety test work, other cleanup
2024-10-09 21:09:50 -07:00
Ashwin Bharambe
ba1f294cc6
Safety test placeholder
2024-10-09 19:35:48 -07:00
Ashwin Bharambe
b55034c0de
Another round of simplification and clarity for models/shields/memory_banks stuff
2024-10-09 19:19:26 -07:00
Ashwin Bharambe
73a0a34e39
Kill non-llama guard shields
2024-10-08 17:47:03 -07:00
Ashwin Bharambe
24c61403b7
Fixes
2024-10-08 17:43:25 -07:00
Ashwin Bharambe
a86f3ae07d
Update run.yaml
2024-10-08 17:41:06 -07:00
Ashwin Bharambe
924b1fba09
minor
2024-10-08 17:26:26 -07:00
Ashwin Bharambe
f40cd62306
Test fixes
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
8eee5b9adc
Fix server conditional awaiting on coroutines
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
216e7eb4d5
Move async with SEMAPHORE
inside the async methods
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
4540d8bd87
move codeshield into an independent safety provider
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
380b9dab90
regen openapi specs
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
7f1160296c
Updates to server.py to clean up streaming vs non-streaming stuff
...
Also make sure agent turn create is correctly marked
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
640c5c54f7
rename augment_messages
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
336cf7a674
update vllm; not quite tested yet
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
ed899a5dec
Convert TGI to work with openai_compat
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
05e73d12b3
introduce openai_compat with the completions (not chat-completions) API
...
This keeps the prompt encoding layer in our control (see
`chat_completion_request_to_prompt()` method)
2024-10-08 17:23:42 -07:00
Ashwin Bharambe
0c9eb3341c
Separate chat_completion stream and non-stream implementations
...
This is a pretty important requirement. The streaming response type is
an AsyncGenerator while the non-stream one is a single object. So far
this has worked _sometimes_ due to various pre-existing hacks (and in
some cases, just failed.)
2024-10-08 17:23:40 -07:00
Ashwin Bharambe
f8752ab8dc
weaviate fixes, test now passes
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
f21ad1173e
improve memory test, but it fails on chromadb :/
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4ab6e1b81a
Add really basic testing for memory API
...
weaviate does not work; the cluster URL seems malformed
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
dba7caf1d0
Fix fireworks and update the test
...
Don't look for eom_id / eot_id sadly since providers don't return the
last token
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
bbd3a02615
Make Together inference work using the raw completions API
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
3ae2b712e8
Add inference test
...
Run it as:
```
PROVIDER_ID=test-remote \
PROVIDER_CONFIG=$PWD/llama_stack/providers/tests/inference/provider_config_example.yaml \
pytest -s llama_stack/providers/tests/inference/test_inference.py \
--tb=auto \
--disable-warnings
```
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4fa467731e
Fix a bug in meta-reference inference when stream=False
...
Also introduce a gross hack (to cover grosser(?) hack) to ensure
non-stream requests don't send back responses in SSE format. Not sure
which of these hacks is grosser.
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
353c7dc82a
A few bug fixes for covering corner cases
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
a05599c67a
Weaviate "should" work (i.e., is code-complete) but not tested
2024-10-08 17:23:02 -07:00
Zain Hasan
118c0ef105
Partial cleanup of weaviate
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
862f8ddb8d
more memory related fixes; memory.client now works
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
3725e74906
memory bank registration fixes
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
099a95b614
slight upgrade to CLI
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
1550187cd8
cleanup
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
91e0063593
Introduce model_store, shield_store, memory_bank_store
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
e45a417543
more fixes, plug shutdown handlers
...
still, FastAPIs sigint handler is not calling ours
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
60dead6196
apis_to_serve -> apis
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
59302a86df
inference registry updates
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
4215cc9331
Push registration methods onto the backing providers
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
5a7b01d292
Significantly upgrade the interactive configuration experience
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
8d157a8197
rename
2024-10-08 17:23:02 -07:00
Ashwin Bharambe
f3923e3f0b
Redo the { models, shields, memory_banks } typeset
2024-10-08 17:23:02 -07:00
Xi Yan
6b094b72d3
Update cli_reference.md
2024-10-08 15:32:06 -07:00
Xi Yan
ce70d21f65
Add files via upload
2024-10-08 15:29:19 -07:00
Dalton Flanagan
2d4f7d8acf
Create SECURITY.md
2024-10-08 13:30:40 -04:00
Yuan Tang
48d0d2001e
Add classifiers in setup.py ( #217 )
...
* Add classifiers in setup.py
* Update setup.py
* Update setup.py
2024-10-08 06:55:16 -07:00
Xi Yan
4d5f7459aa
[bugfix] Fix logprobs on meta-reference impl ( #213 )
...
* fix log probs
* add back LogProbsConfig
* error handling
* bugfix
2024-10-07 19:42:39 -07:00
Yuan Tang
e4ae09d090
Add .idea to .gitignore ( #216 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-10-07 19:38:43 -07:00