Commit graph

334 commits

Author SHA1 Message Date
Xi Yan
202667f3db delete templates 2024-10-21 11:03:34 -07:00
Xi Yan
3ca822f4cd build templates 2024-10-21 11:02:32 -07:00
Xi Yan
ca2e7f52bd vllm 2024-10-21 11:00:50 -07:00
nehal-a2z
8ef3d3d239 Update event_logger.py (#275)
spelling error
2024-10-21 10:48:50 -07:00
raghotham
af52c22c5e Create .readthedocs.yaml
Trying out readthedocs
2024-10-21 10:46:47 -07:00
Yuan Tang
74e6356b51 Add vLLM inference provider for OpenAI compatible vLLM server (#178)
This PR adds vLLM inference provider for OpenAI compatible vLLM server.
2024-10-21 10:46:45 -07:00
Ashwin Bharambe
391dedd1c0 update ollama for llama-guard3 2024-10-21 10:46:40 -07:00
Ashwin Bharambe
89759a0ad3 Improve an important error message 2024-10-21 10:46:40 -07:00
Ashwin Bharambe
5863f65874 Make all methods async def again; add completion() for meta-reference (#270)
PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def".

The rationale was that this allowed the user (within llama-stack) of this to use it as:

```
async for chunk in api.chat_completion(params)
```

However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like:

```
async for chunk in await api.chat_completion(params)
```

Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)
2024-10-21 10:46:40 -07:00
Ashwin Bharambe
92aca57bfa Small rename 2024-10-21 10:46:40 -07:00
Ashwin Bharambe
6f4537b4c4 Allow overridding checkpoint_dir via config 2024-10-21 10:46:40 -07:00
Ashwin Bharambe
a90ab5878b Add an option to not use elastic agents for meta-reference inference (#269) 2024-10-21 10:46:40 -07:00
Xi Yan
2f5c410c73 [bugfix] fix case for agent when memory bank registered without specifying provider_id (#264)
* fix case where memory bank is registered without provider_id

* memory test

* agents unit test
2024-10-21 10:46:40 -07:00
Xi Yan
29c8edb4f6 readme 2024-10-21 09:11:25 -07:00
Xi Yan
5ea36b0274 readme 2024-10-21 09:03:05 -07:00
Xi Yan
d4caab3c67 developer cookbook 2024-10-21 09:01:34 -07:00
Xi Yan
302fa5c4bb build/developer cookbook/new api provider 2024-10-21 09:01:22 -07:00
Xi Yan
f58441cc21 readme 2024-10-18 18:55:29 -07:00
Xi Yan
100b5fecd4 readme 2024-10-18 18:53:49 -07:00
Xi Yan
955743ba7a kill distribution/templates 2024-10-18 17:32:11 -07:00
Xi Yan
c830235936 rename 2024-10-18 17:28:26 -07:00
Xi Yan
cbb423a32f move distribution/templates to distributions/ 2024-10-18 17:21:50 -07:00
Xi Yan
b4aca0aeb6 move distribution folders 2024-10-18 17:05:41 -07:00
Xi Yan
fd90d2ae97 readme 2024-10-18 14:30:44 -07:00
Xi Yan
a3f748a875 readme for distributions 2024-10-18 14:21:44 -07:00
Xi Yan
dcac9e4874 update compose file 2024-10-18 11:12:27 -07:00
Xi Yan
542ffbee72 comment 2024-10-17 19:37:22 -07:00
Xi Yan
293d8f2895 docker compose ollama 2024-10-17 19:31:29 -07:00
Ashwin Bharambe
9fcf5d58e0 Allow overriding MODEL_IDS for inference test 2024-10-17 10:03:27 -07:00
Xi Yan
02be26098a getting started 2024-10-16 23:56:21 -07:00
Xi Yan
cf9e5b76b2
Update getting_started.md 2024-10-16 23:52:29 -07:00
Xi Yan
7cc47da8f2
Update getting_started.md 2024-10-16 23:50:31 -07:00
Xi Yan
d787d1e84f
config templates restructure, docs (#262)
* wip

* config templates

* readmes
2024-10-16 23:25:10 -07:00
Tam
a07dfffbbf
initial changes (#261)
Update the parsing logic for comma-separated list and download function
2024-10-16 23:15:59 -07:00
ATH
319a6b5f83
Update getting_started.md (#260) 2024-10-16 18:05:36 -07:00
Xi Yan
c4d5d6bb91
Docker compose scripts for remote adapters (#241)
* tgi docker compose

* path

* wait for tgi server to start before starting server

* update provider-id

* move scripts to distribution/ folder

* add readme

* readme
2024-10-15 16:32:53 -07:00
Matthieu FRONTON
770647dede
Fix broken rendering in Google Colab (#247) 2024-10-15 15:41:49 -07:00
Ashwin Bharambe
09b793c4d6 Fix fp8 implementation which had bit-rotten a bit
I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load
from fp8" codepath.

YAML I tested with:

```
providers:
  - provider_id: quantized
    provider_type: meta-reference-quantized
    config:
      model: Llama3.1-8B-Instruct
      quantization:
        type: fp8
```
2024-10-15 13:57:01 -07:00
Yuan Tang
80ada04f76
Remove request arg from chat completion response processing (#240)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-10-15 13:03:17 -07:00
Xi Yan
209cd3d35e Bump version to 0.0.42 2024-10-14 11:13:04 -07:00
Yuan Tang
a2b87ed0cb
Switch to pre-commit/action (#239) 2024-10-11 11:09:11 -07:00
Yuan Tang
05282d1234
Enable pre-commit on main branch (#237) 2024-10-11 10:03:59 -07:00
Yuan Tang
2128e61da2
Fix incorrect completion() signature for Databricks provider (#236) 2024-10-11 08:47:57 -07:00
Dalton Flanagan
9fbe8852aa
Add Swift Package Index badge 2024-10-10 23:39:25 -04:00
Xi Yan
ca29980c6b fix agents context retriever 2024-10-10 20:17:29 -07:00
Ashwin Bharambe
1ff0476002 Split off meta-reference-quantized provider 2024-10-10 16:03:19 -07:00
Xi Yan
7ff5800dea generate openapi 2024-10-10 15:30:34 -07:00
Dalton Flanagan
a3e65d58a9
Add logo 2024-10-10 15:04:21 -04:00
Russell Bryant
eba9d1ea14
ci: Run pre-commit checks in CI (#176)
Run the pre-commit checks in a github workflow to validate that a PR
or a direct push to the repo does not introduce new errors.
2024-10-10 11:21:59 -07:00
Ashwin Bharambe
89d24a07f0 Bump version to 0.0.41 2024-10-10 10:27:03 -07:00