Commit graph

348 commits

Author SHA1 Message Date
Xi Yan
8593c94b91
Merge branch 'main' into ollama_docker 2024-10-21 11:13:01 -07:00
Xi Yan
8a50426d47 vllm 2024-10-21 11:12:26 -07:00
Xi Yan
88187bc5f6 vllm 2024-10-21 11:07:04 -07:00
Xi Yan
acfcbca14a tmp add back build to avoid merge conflicts 2024-10-21 11:04:26 -07:00
Xi Yan
202667f3db delete templates 2024-10-21 11:03:34 -07:00
Xi Yan
3ca822f4cd build templates 2024-10-21 11:02:32 -07:00
Xi Yan
ca2e7f52bd vllm 2024-10-21 11:00:50 -07:00
nehal-a2z
8ef3d3d239 Update event_logger.py (#275)
spelling error
2024-10-21 10:48:50 -07:00
nehal-a2z
c995219731
Update event_logger.py (#275)
spelling error
2024-10-21 10:46:53 -07:00
raghotham
af52c22c5e Create .readthedocs.yaml
Trying out readthedocs
2024-10-21 10:46:47 -07:00
Yuan Tang
74e6356b51 Add vLLM inference provider for OpenAI compatible vLLM server (#178)
This PR adds vLLM inference provider for OpenAI compatible vLLM server.
2024-10-21 10:46:45 -07:00
Ashwin Bharambe
391dedd1c0 update ollama for llama-guard3 2024-10-21 10:46:40 -07:00
Ashwin Bharambe
89759a0ad3 Improve an important error message 2024-10-21 10:46:40 -07:00
Ashwin Bharambe
5863f65874 Make all methods async def again; add completion() for meta-reference (#270)
PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def".

The rationale was that this allowed the user (within llama-stack) of this to use it as:

```
async for chunk in api.chat_completion(params)
```

However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like:

```
async for chunk in await api.chat_completion(params)
```

Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)
2024-10-21 10:46:40 -07:00
Ashwin Bharambe
92aca57bfa Small rename 2024-10-21 10:46:40 -07:00
Ashwin Bharambe
6f4537b4c4 Allow overridding checkpoint_dir via config 2024-10-21 10:46:40 -07:00
Ashwin Bharambe
a90ab5878b Add an option to not use elastic agents for meta-reference inference (#269) 2024-10-21 10:46:40 -07:00
Xi Yan
2f5c410c73 [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264)
* fix case where memory bank is registered without provider_id

* memory test

* agents unit test
2024-10-21 10:46:40 -07:00
Xi Yan
29c8edb4f6 readme 2024-10-21 09:11:25 -07:00
Xi Yan
5ea36b0274 readme 2024-10-21 09:03:05 -07:00
Xi Yan
d4caab3c67 developer cookbook 2024-10-21 09:01:34 -07:00
Xi Yan
302fa5c4bb build/developer cookbook/new api provider 2024-10-21 09:01:22 -07:00
raghotham
cae5b0708b
Create .readthedocs.yaml
Trying out readthedocs
2024-10-21 11:48:19 +05:30
Yuan Tang
a27a2cd2af
Add vLLM inference provider for OpenAI compatible vLLM server (#178)
This PR adds vLLM inference provider for OpenAI compatible vLLM server.
2024-10-20 18:43:25 -07:00
Ashwin Bharambe
59c43736e8 update ollama for llama-guard3 2024-10-19 17:26:18 -07:00
Ashwin Bharambe
8cfbb9d38b Improve an important error message 2024-10-19 17:19:54 -07:00
Ashwin Bharambe
2089427d60
Make all methods async def again; add completion() for meta-reference (#270)
PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def".

The rationale was that this allowed the user (within llama-stack) of this to use it as:

```
async for chunk in api.chat_completion(params)
```

However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like:

```
async for chunk in await api.chat_completion(params)
```

Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)
2024-10-18 20:50:59 -07:00
Xi Yan
f58441cc21 readme 2024-10-18 18:55:29 -07:00
Xi Yan
100b5fecd4 readme 2024-10-18 18:53:49 -07:00
Xi Yan
955743ba7a kill distribution/templates 2024-10-18 17:32:11 -07:00
Xi Yan
c830235936 rename 2024-10-18 17:28:26 -07:00
Xi Yan
cbb423a32f move distribution/templates to distributions/ 2024-10-18 17:21:50 -07:00
Xi Yan
b4aca0aeb6 move distribution folders 2024-10-18 17:05:41 -07:00
Ashwin Bharambe
95a96afe34 Small rename 2024-10-18 14:41:38 -07:00
Xi Yan
fd90d2ae97 readme 2024-10-18 14:30:44 -07:00
Ashwin Bharambe
71a905e93f Allow overridding checkpoint_dir via config 2024-10-18 14:28:06 -07:00
Xi Yan
a3f748a875 readme for distributions 2024-10-18 14:21:44 -07:00
Ashwin Bharambe
33afd34e6f
Add an option to not use elastic agents for meta-reference inference (#269) 2024-10-18 12:51:10 -07:00
Xi Yan
dcac9e4874 update compose file 2024-10-18 11:12:27 -07:00
Xi Yan
542ffbee72 comment 2024-10-17 19:37:22 -07:00
Xi Yan
293d8f2895 docker compose ollama 2024-10-17 19:31:29 -07:00
Xi Yan
be3c5c034d
[bugfix] fix case for agent when memory bank registered without specifying provider_id (#264)
* fix case where memory bank is registered without provider_id

* memory test

* agents unit test
2024-10-17 17:28:17 -07:00
Ashwin Bharambe
9fcf5d58e0 Allow overriding MODEL_IDS for inference test 2024-10-17 10:03:27 -07:00
Xi Yan
02be26098a getting started 2024-10-16 23:56:21 -07:00
Xi Yan
cf9e5b76b2
Update getting_started.md 2024-10-16 23:52:29 -07:00
Xi Yan
7cc47da8f2
Update getting_started.md 2024-10-16 23:50:31 -07:00
Xi Yan
d787d1e84f
config templates restructure, docs (#262)
* wip

* config templates

* readmes
2024-10-16 23:25:10 -07:00
Tam
a07dfffbbf
initial changes (#261)
Update the parsing logic for comma-separated list and download function
2024-10-16 23:15:59 -07:00
ATH
319a6b5f83
Update getting_started.md (#260) 2024-10-16 18:05:36 -07:00
Xi Yan
c4d5d6bb91
Docker compose scripts for remote adapters (#241)
* tgi docker compose

* path

* wait for tgi server to start before starting server

* update provider-id

* move scripts to distribution/ folder

* add readme

* readme
2024-10-15 16:32:53 -07:00