Commit graph

807 commits

Author SHA1 Message Date
Xi Yan
a6091fa158 server 2024-12-26 18:35:06 -08:00
Xi Yan
74de9bebd1 registry 2024-12-26 18:34:00 -08:00
Xi Yan
27da763af9 more fixes 2024-12-26 18:30:42 -08:00
Xi Yan
6596caed55 vllm 2024-12-26 18:25:28 -08:00
Xi Yan
206554e853 stack imports 2024-12-26 18:23:40 -08:00
Xi Yan
3c84f491ec imports 2024-12-26 18:21:53 -08:00
Xi Yan
7c12cda244 llama guard 2024-12-26 18:18:01 -08:00
Xi Yan
f58e92f8d3 prompt guard 2024-12-26 18:15:55 -08:00
Xi Yan
61be406b49 scoring 2024-12-26 18:14:53 -08:00
Xi Yan
fcac7cfafa braintrust 2024-12-26 18:13:43 -08:00
Xi Yan
71d50ab368 telemetry & sample 2024-12-26 18:12:51 -08:00
Xi Yan
c4b9b3cb52 huggingface 2024-12-26 18:11:10 -08:00
Xi Yan
d40e527471 bedrock 2024-12-26 18:10:23 -08:00
Xi Yan
28428c320a databricks 2024-12-26 18:08:50 -08:00
Xi Yan
6f7f02fbad fireworks 2024-12-26 18:08:08 -08:00
Xi Yan
f97638a323 ollama import remove 2024-12-26 18:07:18 -08:00
Xi Yan
165777a181 impls imports remove 2024-12-26 18:05:19 -08:00
Xi Yan
b641902bfa impls imports remove 2024-12-26 18:01:45 -08:00
Xi Yan
c1ef055f39 test prompt adapter 2024-12-26 17:49:17 -08:00
Xi Yan
2fe4acd64d text inference 2024-12-26 17:45:25 -08:00
Xi Yan
16cfe1014e vision inference 2024-12-26 17:31:42 -08:00
Xi Yan
3b1f20ac00 memory tests fix 2024-12-26 17:27:01 -08:00
Xi Yan
3f86c19150 builds 2024-12-26 17:21:23 -08:00
Xi Yan
8a8550fe9b cli imports 2024-12-26 17:19:40 -08:00
Xi Yan
21a6bd57ea fix imports 2024-12-26 17:17:03 -08:00
Xi Yan
c6d3fc6fb6 datatypes 2024-12-26 17:00:56 -08:00
Xi Yan
6c6b5fb091 openai_compat 2024-12-26 16:59:06 -08:00
Xi Yan
9ab0730294 kvstore 2024-12-26 16:55:40 -08:00
Xi Yan
30fee82407 vector_store 2024-12-26 16:54:33 -08:00
Xi Yan
b7bc1c6297 telemetry 2024-12-26 16:48:54 -08:00
Xi Yan
bb0a3f5c8e remove more imports 2024-12-26 16:43:30 -08:00
Xi Yan
93ed8aa814 remove more imports 2024-12-26 16:39:31 -08:00
Xi Yan
0a0c01fbc2 test agents imports 2024-12-26 16:32:23 -08:00
Xi Yan
9bdb7236b2 Merge branch 'main' into remove_import_stars 2024-12-26 15:50:12 -08:00
Xi Yan
88c967a3e2 fix client-sdk memory/safety test 2024-12-26 15:49:15 -08:00
Xi Yan
b05d8fd956 fix client-sdk agents/inference test 2024-12-26 15:49:14 -08:00
Xi Yan
19c99e36a0 update playground doc video 2024-12-26 15:49:14 -08:00
Xi Yan
70db039ff4 fix client-sdk memory/safety test 2024-12-26 15:48:28 -08:00
Xi Yan
b6aca4c8bb fix client-sdk agents/inference test 2024-12-26 15:44:34 -08:00
Xi Yan
da26d22f90 remove imports 1/n 2024-12-26 15:19:06 -08:00
Xi Yan
4e1d0a2fc5 update playground doc video 2024-12-26 14:50:19 -08:00
Xi Yan
28ce511986 fix --endpoint docs 2024-12-26 14:32:07 -08:00
Ikko Eltociear Ashimine
7ba95a8e74
docs: update evals_reference/index.md (#675)
# What does this PR do?

minor fix




## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-26 11:32:37 -08:00
Aidan Do
21fb92d7cf
Add 3.3 70B to Ollama inference provider (#681)
# What does this PR do?

Adds 3.3 70B support to Ollama inference provider

## Test Plan

<details>
<summary>Manual</summary>

```bash
# 42GB to download
ollama pull llama3.3:70b

ollama run llama3.3:70b --keepalive 60m

export LLAMA_STACK_PORT=5000
pip install -e . \
  && llama stack build --template ollama --image-type conda \
  && llama stack run ./distributions/ollama/run.yaml \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=Llama3.3-70B-Instruct \
  --env OLLAMA_URL=http://localhost:11434

export LLAMA_STACK_PORT=5000
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \
  inference chat-completion \
  --model-id Llama3.3-70B-Instruct \
  --message "hello, what model are you?"
```

<img width="1221" alt="image"
src="https://github.com/user-attachments/assets/dcffbdd9-94c8-4d47-9f95-4ef6c3756294"
/>

</details>

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-25 22:15:58 -08:00
Yuan Tang
fa371fdc9e
Removed unnecessary CONDA_PREFIX env var in installation guide (#683)
This is not needed since `conda activate stack` has already been
executed.
2024-12-23 13:17:30 -08:00
Yuan Tang
987e651755
Add missing venv option in --image-type (#677)
"venv" option is supported but not mentioned in the prompt.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-12-21 21:10:13 -08:00
Botao Chen
bae197c37e
Fix post training apis broken by torchtune release (#674)
There is a torchtune release this morning
https://github.com/pytorch/torchtune/releases/tag/v0.5.0 and breaks post
training apis

## test 
spinning up server and the post training works again after the fix 
<img width="1314" alt="Screenshot 2024-12-20 at 4 08 54 PM"
src="https://github.com/user-attachments/assets/dfae724d-ebf0-4846-9715-096efa060cee"
/>


## Note
We need to think hard of how to avoid this happen again and have a fast
follow up on this after holidays
2024-12-20 16:12:02 -08:00
Botao Chen
06cb0c837e
[torchtune integration] post training + eval (#670)
## What does this PR do?

- Add related Apis in experimental-post-training template to enable eval
on the finetuned checkpoint in the template
- A small bug fix on meta reference eval
- A small error handle improvement on post training 


## Test Plan
From client side issued an E2E post training request
https://github.com/meta-llama/llama-stack-client-python/pull/70 and get
eval results successfully

<img width="1315" alt="Screenshot 2024-12-20 at 12 06 59 PM"
src="https://github.com/user-attachments/assets/a09bd524-59ae-490c-908f-2e36ccf27c0a"
/>
2024-12-20 13:43:13 -08:00
Dinesh Yeduguru
c8be0bf1c9
Tools API with brave and MCP providers (#639)
This PR adds a new Tools api and adds two tool runtime providers: brave
and MCP.

Test plan:
```
curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \
-H 'Content-Type: application/json' \
-d '{ "tool_group_id": "simple_tool",
  "tool_group": {
    "type": "model_context_protocol",
    "endpoint": {"uri": "http://localhost:56000/sse"}
  },
  "provider_id": "model-context-protocol"
}'

 curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \
-H 'Content-Type: application/json' \
-d '{
  "tool_group_id": "search", "provider_id": "brave-search",
  "tool_group": {
    "type": "user_defined",
    "tools": [
      {
        "name": "brave_search",
        "description": "A web search tool",
        "parameters": [
          {
            "name": "query",
            "parameter_type": "string",
            "description": "The query to search"
          }
        ],
        "metadata": {},
        "tool_prompt_format": "json"
      }
    ]
  }
}'

 curl -X GET http://localhost:5000/alpha/tools/list | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   662  100   662    0     0   333k      0 --:--:-- --:--:-- --:--:--  646k
[
  {
    "identifier": "brave_search",
    "provider_resource_id": "brave_search",
    "provider_id": "brave-search",
    "type": "tool",
    "tool_group": "search",
    "description": "A web search tool",
    "parameters": [
      {
        "name": "query",
        "parameter_type": "string",
        "description": "The query to search"
      }
    ],
    "metadata": {},
    "tool_prompt_format": "json"
  },
  {
    "identifier": "fetch",
    "provider_resource_id": "fetch",
    "provider_id": "model-context-protocol",
    "type": "tool",
    "tool_group": "simple_tool",
    "description": "Fetches a website and returns its content",
    "parameters": [
      {
        "name": "url",
        "parameter_type": "string",
        "description": "URL to fetch"
      }
    ],
    "metadata": {
      "endpoint": "http://localhost:56000/sse"
    },
    "tool_prompt_format": "json"
  }
]

curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \
-H 'Content-Type: application/json' \
-d '{
    "tool_name": "fetch",
    "args": {
        "url": "http://google.com/"
    }
}'

 curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \
-H 'Content-Type: application/json' -H 'X-LlamaStack-ProviderData: {"api_key": "<KEY>"}' \
-d '{
    "tool_name": "brave_search",
    "args": {
        "query": "who is meta ceo"
    }
}'
```
2024-12-19 21:25:17 -08:00
Aidan Do
17fdb47e5e
Add Llama 70B 3.3 to fireworks (#654)
# What does this PR do?

- Makes Llama 70B 3.3 available for fireworks

## Test Plan

```shell
pip install -e . \
&& llama stack build --config distributions/fireworks/build.yaml --image-type conda \
&& llama stack run distributions/fireworks/run.yaml \
  --port 5000
```

```python
        response = client.inference.chat_completion(
            model_id="Llama3.3-70B-Instruct",
            messages=[
                {"role": "user", "content": "hello world"},
            ],
        )
```

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-19 17:32:49 -08:00