Xi Yan
202667f3db
delete templates
2024-10-21 11:03:34 -07:00
Xi Yan
3ca822f4cd
build templates
2024-10-21 11:02:32 -07:00
Xi Yan
ca2e7f52bd
vllm
2024-10-21 11:00:50 -07:00
nehal-a2z
8ef3d3d239
Update event_logger.py ( #275 )
...
spelling error
2024-10-21 10:48:50 -07:00
raghotham
af52c22c5e
Create .readthedocs.yaml
...
Trying out readthedocs
2024-10-21 10:46:47 -07:00
Yuan Tang
74e6356b51
Add vLLM inference provider for OpenAI compatible vLLM server ( #178 )
...
This PR adds vLLM inference provider for OpenAI compatible vLLM server.
2024-10-21 10:46:45 -07:00
Ashwin Bharambe
391dedd1c0
update ollama for llama-guard3
2024-10-21 10:46:40 -07:00
Ashwin Bharambe
89759a0ad3
Improve an important error message
2024-10-21 10:46:40 -07:00
Ashwin Bharambe
5863f65874
Make all methods async def again; add completion() for meta-reference ( #270 )
...
PR #201 had made several changes while trying to fix issues with getting the stream=False branches of inference and agents API working. As part of this, it made a change which was slightly gratuitous. Namely, making chat_completion() and brethren "def" instead of "async def".
The rationale was that this allowed the user (within llama-stack) of this to use it as:
```
async for chunk in api.chat_completion(params)
```
However, it causes unnecessary confusion for several folks. Given that clients (e.g., llama-stack-apps) anyway use the SDK methods (which are completely isolated) this choice was not ideal. Let's revert back so the call now looks like:
```
async for chunk in await api.chat_completion(params)
```
Bonus: Added a completion() implementation for the meta-reference provider. Technically should have been another PR :)
2024-10-21 10:46:40 -07:00
Ashwin Bharambe
92aca57bfa
Small rename
2024-10-21 10:46:40 -07:00
Ashwin Bharambe
6f4537b4c4
Allow overridding checkpoint_dir via config
2024-10-21 10:46:40 -07:00
Ashwin Bharambe
a90ab5878b
Add an option to not use elastic agents for meta-reference inference ( #269 )
2024-10-21 10:46:40 -07:00
Xi Yan
2f5c410c73
[bugfix] fix case for agent when memory bank registered without specifying provider_id ( #264 )
...
* fix case where memory bank is registered without provider_id
* memory test
* agents unit test
2024-10-21 10:46:40 -07:00
Xi Yan
29c8edb4f6
readme
2024-10-21 09:11:25 -07:00
Xi Yan
5ea36b0274
readme
2024-10-21 09:03:05 -07:00
Xi Yan
d4caab3c67
developer cookbook
2024-10-21 09:01:34 -07:00
Xi Yan
302fa5c4bb
build/developer cookbook/new api provider
2024-10-21 09:01:22 -07:00
Xi Yan
f58441cc21
readme
2024-10-18 18:55:29 -07:00
Xi Yan
100b5fecd4
readme
2024-10-18 18:53:49 -07:00
Xi Yan
955743ba7a
kill distribution/templates
2024-10-18 17:32:11 -07:00
Xi Yan
c830235936
rename
2024-10-18 17:28:26 -07:00
Xi Yan
cbb423a32f
move distribution/templates to distributions/
2024-10-18 17:21:50 -07:00
Xi Yan
b4aca0aeb6
move distribution folders
2024-10-18 17:05:41 -07:00
Xi Yan
fd90d2ae97
readme
2024-10-18 14:30:44 -07:00
Xi Yan
a3f748a875
readme for distributions
2024-10-18 14:21:44 -07:00
Xi Yan
dcac9e4874
update compose file
2024-10-18 11:12:27 -07:00
Xi Yan
542ffbee72
comment
2024-10-17 19:37:22 -07:00
Xi Yan
293d8f2895
docker compose ollama
2024-10-17 19:31:29 -07:00
Ashwin Bharambe
9fcf5d58e0
Allow overriding MODEL_IDS for inference test
2024-10-17 10:03:27 -07:00
Xi Yan
02be26098a
getting started
2024-10-16 23:56:21 -07:00
Xi Yan
cf9e5b76b2
Update getting_started.md
2024-10-16 23:52:29 -07:00
Xi Yan
7cc47da8f2
Update getting_started.md
2024-10-16 23:50:31 -07:00
Xi Yan
d787d1e84f
config templates restructure, docs ( #262 )
...
* wip
* config templates
* readmes
2024-10-16 23:25:10 -07:00
Tam
a07dfffbbf
initial changes ( #261 )
...
Update the parsing logic for comma-separated list and download function
2024-10-16 23:15:59 -07:00
ATH
319a6b5f83
Update getting_started.md ( #260 )
2024-10-16 18:05:36 -07:00
Xi Yan
c4d5d6bb91
Docker compose scripts for remote adapters ( #241 )
...
* tgi docker compose
* path
* wait for tgi server to start before starting server
* update provider-id
* move scripts to distribution/ folder
* add readme
* readme
2024-10-15 16:32:53 -07:00
Matthieu FRONTON
770647dede
Fix broken rendering in Google Colab ( #247 )
2024-10-15 15:41:49 -07:00
Ashwin Bharambe
09b793c4d6
Fix fp8 implementation which had bit-rotten a bit
...
I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load
from fp8" codepath.
YAML I tested with:
```
providers:
- provider_id: quantized
provider_type: meta-reference-quantized
config:
model: Llama3.1-8B-Instruct
quantization:
type: fp8
```
2024-10-15 13:57:01 -07:00
Yuan Tang
80ada04f76
Remove request arg from chat completion response processing ( #240 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-10-15 13:03:17 -07:00
Xi Yan
209cd3d35e
Bump version to 0.0.42
2024-10-14 11:13:04 -07:00
Yuan Tang
a2b87ed0cb
Switch to pre-commit/action ( #239 )
2024-10-11 11:09:11 -07:00
Yuan Tang
05282d1234
Enable pre-commit on main branch ( #237 )
2024-10-11 10:03:59 -07:00
Yuan Tang
2128e61da2
Fix incorrect completion() signature for Databricks provider ( #236 )
2024-10-11 08:47:57 -07:00
Dalton Flanagan
9fbe8852aa
Add Swift Package Index badge
2024-10-10 23:39:25 -04:00
Xi Yan
ca29980c6b
fix agents context retriever
2024-10-10 20:17:29 -07:00
Ashwin Bharambe
1ff0476002
Split off meta-reference-quantized provider
2024-10-10 16:03:19 -07:00
Xi Yan
7ff5800dea
generate openapi
2024-10-10 15:30:34 -07:00
Dalton Flanagan
a3e65d58a9
Add logo
2024-10-10 15:04:21 -04:00
Russell Bryant
eba9d1ea14
ci: Run pre-commit checks in CI ( #176 )
...
Run the pre-commit checks in a github workflow to validate that a PR
or a direct push to the repo does not introduce new errors.
2024-10-10 11:21:59 -07:00
Ashwin Bharambe
89d24a07f0
Bump version to 0.0.41
2024-10-10 10:27:03 -07:00