Commit graph

801 commits

Author SHA1 Message Date
Xi Yan
7c12cda244 llama guard 2024-12-26 18:18:01 -08:00
Xi Yan
f58e92f8d3 prompt guard 2024-12-26 18:15:55 -08:00
Xi Yan
61be406b49 scoring 2024-12-26 18:14:53 -08:00
Xi Yan
fcac7cfafa braintrust 2024-12-26 18:13:43 -08:00
Xi Yan
71d50ab368 telemetry & sample 2024-12-26 18:12:51 -08:00
Xi Yan
c4b9b3cb52 huggingface 2024-12-26 18:11:10 -08:00
Xi Yan
d40e527471 bedrock 2024-12-26 18:10:23 -08:00
Xi Yan
28428c320a databricks 2024-12-26 18:08:50 -08:00
Xi Yan
6f7f02fbad fireworks 2024-12-26 18:08:08 -08:00
Xi Yan
f97638a323 ollama import remove 2024-12-26 18:07:18 -08:00
Xi Yan
165777a181 impls imports remove 2024-12-26 18:05:19 -08:00
Xi Yan
b641902bfa impls imports remove 2024-12-26 18:01:45 -08:00
Xi Yan
c1ef055f39 test prompt adapter 2024-12-26 17:49:17 -08:00
Xi Yan
2fe4acd64d text inference 2024-12-26 17:45:25 -08:00
Xi Yan
16cfe1014e vision inference 2024-12-26 17:31:42 -08:00
Xi Yan
3b1f20ac00 memory tests fix 2024-12-26 17:27:01 -08:00
Xi Yan
3f86c19150 builds 2024-12-26 17:21:23 -08:00
Xi Yan
8a8550fe9b cli imports 2024-12-26 17:19:40 -08:00
Xi Yan
21a6bd57ea fix imports 2024-12-26 17:17:03 -08:00
Xi Yan
c6d3fc6fb6 datatypes 2024-12-26 17:00:56 -08:00
Xi Yan
6c6b5fb091 openai_compat 2024-12-26 16:59:06 -08:00
Xi Yan
9ab0730294 kvstore 2024-12-26 16:55:40 -08:00
Xi Yan
30fee82407 vector_store 2024-12-26 16:54:33 -08:00
Xi Yan
b7bc1c6297 telemetry 2024-12-26 16:48:54 -08:00
Xi Yan
bb0a3f5c8e remove more imports 2024-12-26 16:43:30 -08:00
Xi Yan
93ed8aa814 remove more imports 2024-12-26 16:39:31 -08:00
Xi Yan
0a0c01fbc2 test agents imports 2024-12-26 16:32:23 -08:00
Xi Yan
9bdb7236b2 Merge branch 'main' into remove_import_stars 2024-12-26 15:50:12 -08:00
Xi Yan
88c967a3e2 fix client-sdk memory/safety test 2024-12-26 15:49:15 -08:00
Xi Yan
b05d8fd956 fix client-sdk agents/inference test 2024-12-26 15:49:14 -08:00
Xi Yan
19c99e36a0 update playground doc video 2024-12-26 15:49:14 -08:00
Xi Yan
70db039ff4 fix client-sdk memory/safety test 2024-12-26 15:48:28 -08:00
Xi Yan
b6aca4c8bb fix client-sdk agents/inference test 2024-12-26 15:44:34 -08:00
Xi Yan
da26d22f90 remove imports 1/n 2024-12-26 15:19:06 -08:00
Xi Yan
4e1d0a2fc5 update playground doc video 2024-12-26 14:50:19 -08:00
Xi Yan
28ce511986 fix --endpoint docs 2024-12-26 14:32:07 -08:00
Ikko Eltociear Ashimine
7ba95a8e74
docs: update evals_reference/index.md (#675)
# What does this PR do?

minor fix




## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-26 11:32:37 -08:00
Aidan Do
21fb92d7cf
Add 3.3 70B to Ollama inference provider (#681)
# What does this PR do?

Adds 3.3 70B support to Ollama inference provider

## Test Plan

<details>
<summary>Manual</summary>

```bash
# 42GB to download
ollama pull llama3.3:70b

ollama run llama3.3:70b --keepalive 60m

export LLAMA_STACK_PORT=5000
pip install -e . \
  && llama stack build --template ollama --image-type conda \
  && llama stack run ./distributions/ollama/run.yaml \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=Llama3.3-70B-Instruct \
  --env OLLAMA_URL=http://localhost:11434

export LLAMA_STACK_PORT=5000
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \
  inference chat-completion \
  --model-id Llama3.3-70B-Instruct \
  --message "hello, what model are you?"
```

<img width="1221" alt="image"
src="https://github.com/user-attachments/assets/dcffbdd9-94c8-4d47-9f95-4ef6c3756294"
/>

</details>

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-25 22:15:58 -08:00
Yuan Tang
fa371fdc9e
Removed unnecessary CONDA_PREFIX env var in installation guide (#683)
This is not needed since `conda activate stack` has already been
executed.
2024-12-23 13:17:30 -08:00
Yuan Tang
987e651755
Add missing venv option in --image-type (#677)
"venv" option is supported but not mentioned in the prompt.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2024-12-21 21:10:13 -08:00
Botao Chen
bae197c37e
Fix post training apis broken by torchtune release (#674)
There is a torchtune release this morning
https://github.com/pytorch/torchtune/releases/tag/v0.5.0 and breaks post
training apis

## test 
spinning up server and the post training works again after the fix 
<img width="1314" alt="Screenshot 2024-12-20 at 4 08 54 PM"
src="https://github.com/user-attachments/assets/dfae724d-ebf0-4846-9715-096efa060cee"
/>


## Note
We need to think hard of how to avoid this happen again and have a fast
follow up on this after holidays
2024-12-20 16:12:02 -08:00
Botao Chen
06cb0c837e
[torchtune integration] post training + eval (#670)
## What does this PR do?

- Add related Apis in experimental-post-training template to enable eval
on the finetuned checkpoint in the template
- A small bug fix on meta reference eval
- A small error handle improvement on post training 


## Test Plan
From client side issued an E2E post training request
https://github.com/meta-llama/llama-stack-client-python/pull/70 and get
eval results successfully

<img width="1315" alt="Screenshot 2024-12-20 at 12 06 59 PM"
src="https://github.com/user-attachments/assets/a09bd524-59ae-490c-908f-2e36ccf27c0a"
/>
2024-12-20 13:43:13 -08:00
Dinesh Yeduguru
c8be0bf1c9
Tools API with brave and MCP providers (#639)
This PR adds a new Tools api and adds two tool runtime providers: brave
and MCP.

Test plan:
```
curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \
-H 'Content-Type: application/json' \
-d '{ "tool_group_id": "simple_tool",
  "tool_group": {
    "type": "model_context_protocol",
    "endpoint": {"uri": "http://localhost:56000/sse"}
  },
  "provider_id": "model-context-protocol"
}'

 curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \
-H 'Content-Type: application/json' \
-d '{
  "tool_group_id": "search", "provider_id": "brave-search",
  "tool_group": {
    "type": "user_defined",
    "tools": [
      {
        "name": "brave_search",
        "description": "A web search tool",
        "parameters": [
          {
            "name": "query",
            "parameter_type": "string",
            "description": "The query to search"
          }
        ],
        "metadata": {},
        "tool_prompt_format": "json"
      }
    ]
  }
}'

 curl -X GET http://localhost:5000/alpha/tools/list | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   662  100   662    0     0   333k      0 --:--:-- --:--:-- --:--:--  646k
[
  {
    "identifier": "brave_search",
    "provider_resource_id": "brave_search",
    "provider_id": "brave-search",
    "type": "tool",
    "tool_group": "search",
    "description": "A web search tool",
    "parameters": [
      {
        "name": "query",
        "parameter_type": "string",
        "description": "The query to search"
      }
    ],
    "metadata": {},
    "tool_prompt_format": "json"
  },
  {
    "identifier": "fetch",
    "provider_resource_id": "fetch",
    "provider_id": "model-context-protocol",
    "type": "tool",
    "tool_group": "simple_tool",
    "description": "Fetches a website and returns its content",
    "parameters": [
      {
        "name": "url",
        "parameter_type": "string",
        "description": "URL to fetch"
      }
    ],
    "metadata": {
      "endpoint": "http://localhost:56000/sse"
    },
    "tool_prompt_format": "json"
  }
]

curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \
-H 'Content-Type: application/json' \
-d '{
    "tool_name": "fetch",
    "args": {
        "url": "http://google.com/"
    }
}'

 curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \
-H 'Content-Type: application/json' -H 'X-LlamaStack-ProviderData: {"api_key": "<KEY>"}' \
-d '{
    "tool_name": "brave_search",
    "args": {
        "query": "who is meta ceo"
    }
}'
```
2024-12-19 21:25:17 -08:00
Aidan Do
17fdb47e5e
Add Llama 70B 3.3 to fireworks (#654)
# What does this PR do?

- Makes Llama 70B 3.3 available for fireworks

## Test Plan

```shell
pip install -e . \
&& llama stack build --config distributions/fireworks/build.yaml --image-type conda \
&& llama stack run distributions/fireworks/run.yaml \
  --port 5000
```

```python
        response = client.inference.chat_completion(
            model_id="Llama3.3-70B-Instruct",
            messages=[
                {"role": "user", "content": "hello world"},
            ],
        )
```

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-19 17:32:49 -08:00
Dinesh Yeduguru
8b8d1c1ef4
fix trace starting in library client (#655)
# What does this PR do?

Because of the way library client sets up async io boundaries, tracing
was broken with streaming. This PR fixes the tracing to start at the
right way to caputre the life time of async gen functions correctly.

Test plan:
Script ran:
https://gist.github.com/yanxi0830/f6645129e55ab12de3cd6ec71564c69e

Before: No spans returned for a session


Now: We see spans
<img width="1678" alt="Screenshot 2024-12-18 at 9 50 46 PM"
src="https://github.com/user-attachments/assets/58a3b0dd-a41c-489a-b89a-075e698a2c03"
/>
2024-12-19 16:13:52 -08:00
cdgamarose-nv
ddf37ea467
Fixed imports for inference (#661)
# What does this PR do?

In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

- [x] Addresses issue (#issue)
```
    from .nvidia import NVIDIAInferenceAdapter
  File "/localhome/local-cdgamarose/llama-stack/llama_stack/providers/remote/inference/nvidia/nvidia.py", line 37, in <module>
    from .openai_utils import (
  File "/localhome/local-cdgamarose/llama-stack/llama_stack/providers/remote/inference/nvidia/openai_utils.py", line 11, in <module>
    from llama_models.llama3.api.datatypes import (
ImportError: cannot import name 'CompletionMessage' from 'llama_models.llama3.api.datatypes' (/localhome/local-cdgamarose/.local/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py)
++ error_handler 62
```

## Test Plan
Deploy NIM using docker from
https://build.nvidia.com/meta/llama-3_1-8b-instruct?snippet_tab=Docker
```
(lsmyenv) local-cdgamarose@a4u8g-0006:~/llama-stack$ python3 -m pytest -s -v --providers inference=nvidia llama_stack/providers/tests/inference/ --env NVIDIA_BASE_URL=http://localhost:8000 -k test_completion --inference-model Llama3.1-8B-Instruct
======================================================================================== test session starts =========================================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /localhome/local-cdgamarose/anaconda3/envs/lsmyenv/bin/python3
cachedir: .pytest_cache
rootdir: /localhome/local-cdgamarose/llama-stack
configfile: pyproject.toml
plugins: anyio-4.7.0, asyncio-0.25.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 24 items / 21 deselected / 3 selected

llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[-nvidia] Initializing NVIDIAInferenceAdapter(http://localhost:8000)...
Checking NVIDIA NIM health...
Checking NVIDIA NIM health...
PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_logprobs[-nvidia] SKIPPED (Other inference providers don't support completion() yet)
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[-nvidia] SKIPPED (This test is not quite robust)

====================================================================== 1 passed, 2 skipped, 21 deselected, 2 warnings in 1.57s =======================================================================
```

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
2024-12-19 14:19:36 -08:00
Ashwin Bharambe
540fc4d717
Fix Meta reference GPU implementation (#663)
By performing in-place mutations, we lost. Never in life do that.
2024-12-19 14:09:45 -08:00
Ashwin Bharambe
f19eb8eee3 Update types in parallel_utils for meta-refernece-gpu impl 2024-12-19 13:58:41 -08:00
Vladimir Ivic
b33086d632 Adding @vladimirivic to the owners file 2024-12-19 13:22:10 -08:00
Xi Yan
5be2ea37b1 fix context_retriever model->model_id 2024-12-19 12:52:00 -08:00