Xi Yan
74de9bebd1
registry
2024-12-26 18:34:00 -08:00
Xi Yan
27da763af9
more fixes
2024-12-26 18:30:42 -08:00
Xi Yan
6596caed55
vllm
2024-12-26 18:25:28 -08:00
Xi Yan
206554e853
stack imports
2024-12-26 18:23:40 -08:00
Xi Yan
3c84f491ec
imports
2024-12-26 18:21:53 -08:00
Xi Yan
7c12cda244
llama guard
2024-12-26 18:18:01 -08:00
Xi Yan
f58e92f8d3
prompt guard
2024-12-26 18:15:55 -08:00
Xi Yan
61be406b49
scoring
2024-12-26 18:14:53 -08:00
Xi Yan
fcac7cfafa
braintrust
2024-12-26 18:13:43 -08:00
Xi Yan
71d50ab368
telemetry & sample
2024-12-26 18:12:51 -08:00
Xi Yan
c4b9b3cb52
huggingface
2024-12-26 18:11:10 -08:00
Xi Yan
d40e527471
bedrock
2024-12-26 18:10:23 -08:00
Xi Yan
28428c320a
databricks
2024-12-26 18:08:50 -08:00
Xi Yan
6f7f02fbad
fireworks
2024-12-26 18:08:08 -08:00
Xi Yan
f97638a323
ollama import remove
2024-12-26 18:07:18 -08:00
Xi Yan
165777a181
impls imports remove
2024-12-26 18:05:19 -08:00
Xi Yan
b641902bfa
impls imports remove
2024-12-26 18:01:45 -08:00
Xi Yan
c1ef055f39
test prompt adapter
2024-12-26 17:49:17 -08:00
Xi Yan
2fe4acd64d
text inference
2024-12-26 17:45:25 -08:00
Xi Yan
16cfe1014e
vision inference
2024-12-26 17:31:42 -08:00
Xi Yan
3b1f20ac00
memory tests fix
2024-12-26 17:27:01 -08:00
Xi Yan
3f86c19150
builds
2024-12-26 17:21:23 -08:00
Xi Yan
8a8550fe9b
cli imports
2024-12-26 17:19:40 -08:00
Xi Yan
21a6bd57ea
fix imports
2024-12-26 17:17:03 -08:00
Xi Yan
c6d3fc6fb6
datatypes
2024-12-26 17:00:56 -08:00
Xi Yan
6c6b5fb091
openai_compat
2024-12-26 16:59:06 -08:00
Xi Yan
9ab0730294
kvstore
2024-12-26 16:55:40 -08:00
Xi Yan
30fee82407
vector_store
2024-12-26 16:54:33 -08:00
Xi Yan
b7bc1c6297
telemetry
2024-12-26 16:48:54 -08:00
Xi Yan
bb0a3f5c8e
remove more imports
2024-12-26 16:43:30 -08:00
Xi Yan
93ed8aa814
remove more imports
2024-12-26 16:39:31 -08:00
Xi Yan
0a0c01fbc2
test agents imports
2024-12-26 16:32:23 -08:00
Xi Yan
9bdb7236b2
Merge branch 'main' into remove_import_stars
2024-12-26 15:50:12 -08:00
Xi Yan
88c967a3e2
fix client-sdk memory/safety test
2024-12-26 15:49:15 -08:00
Xi Yan
b05d8fd956
fix client-sdk agents/inference test
2024-12-26 15:49:14 -08:00
Xi Yan
19c99e36a0
update playground doc video
2024-12-26 15:49:14 -08:00
Xi Yan
70db039ff4
fix client-sdk memory/safety test
2024-12-26 15:48:28 -08:00
Xi Yan
b6aca4c8bb
fix client-sdk agents/inference test
2024-12-26 15:44:34 -08:00
Xi Yan
da26d22f90
remove imports 1/n
2024-12-26 15:19:06 -08:00
Xi Yan
4e1d0a2fc5
update playground doc video
2024-12-26 14:50:19 -08:00
Xi Yan
28ce511986
fix --endpoint docs
2024-12-26 14:32:07 -08:00
Ikko Eltociear Ashimine
7ba95a8e74
docs: update evals_reference/index.md ( #675 )
...
# What does this PR do?
minor fix
## Sources
Please link relevant resources if necessary.
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md ),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-26 11:32:37 -08:00
Aidan Do
21fb92d7cf
Add 3.3 70B to Ollama inference provider ( #681 )
...
# What does this PR do?
Adds 3.3 70B support to Ollama inference provider
## Test Plan
<details>
<summary>Manual</summary>
```bash
# 42GB to download
ollama pull llama3.3:70b
ollama run llama3.3:70b --keepalive 60m
export LLAMA_STACK_PORT=5000
pip install -e . \
&& llama stack build --template ollama --image-type conda \
&& llama stack run ./distributions/ollama/run.yaml \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=Llama3.3-70B-Instruct \
--env OLLAMA_URL=http://localhost:11434
export LLAMA_STACK_PORT=5000
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \
inference chat-completion \
--model-id Llama3.3-70B-Instruct \
--message "hello, what model are you?"
```
<img width="1221" alt="image"
src="https://github.com/user-attachments/assets/dcffbdd9-94c8-4d47-9f95-4ef6c3756294 "
/>
</details>
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md ),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-25 22:15:58 -08:00
Yuan Tang
fa371fdc9e
Removed unnecessary CONDA_PREFIX env var in installation guide ( #683 )
...
This is not needed since `conda activate stack` has already been
executed.
2024-12-23 13:17:30 -08:00
Yuan Tang
987e651755
Add missing venv option in --image-type ( #677 )
...
"venv" option is supported but not mentioned in the prompt.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-12-21 21:10:13 -08:00
Botao Chen
bae197c37e
Fix post training apis broken by torchtune release ( #674 )
...
There is a torchtune release this morning
https://github.com/pytorch/torchtune/releases/tag/v0.5.0 and breaks post
training apis
## test
spinning up server and the post training works again after the fix
<img width="1314" alt="Screenshot 2024-12-20 at 4 08 54 PM"
src="https://github.com/user-attachments/assets/dfae724d-ebf0-4846-9715-096efa060cee "
/>
## Note
We need to think hard of how to avoid this happen again and have a fast
follow up on this after holidays
2024-12-20 16:12:02 -08:00
Botao Chen
06cb0c837e
[torchtune integration] post training + eval ( #670 )
...
## What does this PR do?
- Add related Apis in experimental-post-training template to enable eval
on the finetuned checkpoint in the template
- A small bug fix on meta reference eval
- A small error handle improvement on post training
## Test Plan
From client side issued an E2E post training request
https://github.com/meta-llama/llama-stack-client-python/pull/70 and get
eval results successfully
<img width="1315" alt="Screenshot 2024-12-20 at 12 06 59 PM"
src="https://github.com/user-attachments/assets/a09bd524-59ae-490c-908f-2e36ccf27c0a "
/>
2024-12-20 13:43:13 -08:00
Dinesh Yeduguru
c8be0bf1c9
Tools API with brave and MCP providers ( #639 )
...
This PR adds a new Tools api and adds two tool runtime providers: brave
and MCP.
Test plan:
```
curl -X POST 'http://localhost:5000/alpha/toolgroups/register ' \
-H 'Content-Type: application/json' \
-d '{ "tool_group_id": "simple_tool",
"tool_group": {
"type": "model_context_protocol",
"endpoint": {"uri": "http://localhost:56000/sse "}
},
"provider_id": "model-context-protocol"
}'
curl -X POST 'http://localhost:5000/alpha/toolgroups/register ' \
-H 'Content-Type: application/json' \
-d '{
"tool_group_id": "search", "provider_id": "brave-search",
"tool_group": {
"type": "user_defined",
"tools": [
{
"name": "brave_search",
"description": "A web search tool",
"parameters": [
{
"name": "query",
"parameter_type": "string",
"description": "The query to search"
}
],
"metadata": {},
"tool_prompt_format": "json"
}
]
}
}'
curl -X GET http://localhost:5000/alpha/tools/list | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 662 100 662 0 0 333k 0 --:--:-- --:--:-- --:--:-- 646k
[
{
"identifier": "brave_search",
"provider_resource_id": "brave_search",
"provider_id": "brave-search",
"type": "tool",
"tool_group": "search",
"description": "A web search tool",
"parameters": [
{
"name": "query",
"parameter_type": "string",
"description": "The query to search"
}
],
"metadata": {},
"tool_prompt_format": "json"
},
{
"identifier": "fetch",
"provider_resource_id": "fetch",
"provider_id": "model-context-protocol",
"type": "tool",
"tool_group": "simple_tool",
"description": "Fetches a website and returns its content",
"parameters": [
{
"name": "url",
"parameter_type": "string",
"description": "URL to fetch"
}
],
"metadata": {
"endpoint": "http://localhost:56000/sse "
},
"tool_prompt_format": "json"
}
]
curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke ' \
-H 'Content-Type: application/json' \
-d '{
"tool_name": "fetch",
"args": {
"url": "http://google.com/ "
}
}'
curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke ' \
-H 'Content-Type: application/json' -H 'X-LlamaStack-ProviderData: {"api_key": "<KEY>"}' \
-d '{
"tool_name": "brave_search",
"args": {
"query": "who is meta ceo"
}
}'
```
2024-12-19 21:25:17 -08:00
Aidan Do
17fdb47e5e
Add Llama 70B 3.3 to fireworks ( #654 )
...
# What does this PR do?
- Makes Llama 70B 3.3 available for fireworks
## Test Plan
```shell
pip install -e . \
&& llama stack build --config distributions/fireworks/build.yaml --image-type conda \
&& llama stack run distributions/fireworks/run.yaml \
--port 5000
```
```python
response = client.inference.chat_completion(
model_id="Llama3.3-70B-Instruct",
messages=[
{"role": "user", "content": "hello world"},
],
)
```
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md ),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-19 17:32:49 -08:00
Dinesh Yeduguru
8b8d1c1ef4
fix trace starting in library client ( #655 )
...
# What does this PR do?
Because of the way library client sets up async io boundaries, tracing
was broken with streaming. This PR fixes the tracing to start at the
right way to caputre the life time of async gen functions correctly.
Test plan:
Script ran:
https://gist.github.com/yanxi0830/f6645129e55ab12de3cd6ec71564c69e
Before: No spans returned for a session
Now: We see spans
<img width="1678" alt="Screenshot 2024-12-18 at 9 50 46 PM"
src="https://github.com/user-attachments/assets/58a3b0dd-a41c-489a-b89a-075e698a2c03 "
/>
2024-12-19 16:13:52 -08:00