Commit graph

1729 commits

Author SHA1 Message Date
Dinesh Yeduguru
6104bd06a0
feat: add different sinks for otel traces and metrics (#1731)
# What does this PR do?
Since we now start recording and exporting metrics, we no longer can use
single OTEL endpoint to export both traces and metrics. This PR adds two
sinks: OTEL_TRACE and OTEL_METRIC to be able to selectively enable the
exporters.

## Test Plan
Start server with OTEL_TRACE as sink and verify traces show up in jaeger
![Screenshot 2025-03-20 at 3 12
25 PM](https://github.com/user-attachments/assets/51007f28-b5ed-4853-912a-965a5cfe83af)
2025-03-20 15:51:41 -07:00
Hardik Shah
127bac6869
fix: Default to port 8321 everywhere (#1734)
As titled, moved all instances of 5001 to 8321
2025-03-20 15:50:41 -07:00
Hardik Shah
581e8ae562
fix: docker run with --pull always to fetch the latest image (#1733)
As titled
2025-03-20 15:35:48 -07:00
Ashwin Bharambe
f95bc29ca9
fix: handle registry errors gracefully (#1732)
We need to be able to handle stale registry entries gracefully. More
needs to be done when we are deleting important attributes from
resources which could have been persisted. But at the very least, the
server cannot die.

## Test Plan

Added unit tests
2025-03-20 15:24:07 -07:00
Yuan Tang
f5a5c5d459
docs: Add instruction on enabling tool calling for remote vLLM (#1719)
# What does this PR do?

This PR adds a link to tool calling instructions in vLLM. Users have
asked about this many times, e.g.
https://github.com/meta-llama/llama-stack/issues/1648#issuecomment-2740642077

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-20 15:18:17 -07:00
Ihar Hrachyshka
be03cb7523
chore: Don't hide stderr from api generator (#1720)
# What does this PR do?

If the generator fails, pre-commit logs will now show how it failed.

Note: stdout is still suppressed, so that regular informational messages
do not pollute pre-commit output when all the hook does is update
generated files.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Inject a failure in the generator code and confirm it's seen in the
output.

```
$ git diff
diff --git a/docs/openapi_generator/pyopenapi/utility.py b/docs/openapi_generator/pyopenapi/utility.py
index f60a33bb..482e26ef 100644
--- a/docs/openapi_generator/pyopenapi/utility.py
+++ b/docs/openapi_generator/pyopenapi/utility.py
@@ -127,6 +127,7 @@ def is_optional_type(type_: Any) -> bool:

 def validate_api_method_return_types() -> List[str]:
     """Validate that all API methods have proper return types."""
+    raise NotImplementedError("This function is not implemented yet")
     errors = []
     protocols = api_protocol_map()
```

```
$ pre-commit run --all-files
check for merge conflicts................................................Passed
trim trailing whitespace.................................................Passed
check for added large files..............................................Passed
fix end of files.........................................................Passed
Insert license in comments...............................................Passed
ruff.....................................................................Passed
ruff-format..............................................................Passed
blacken-docs.............................................................Passed
uv-lock..................................................................Passed
uv-export................................................................Passed
mypy.....................................................................Passed
Distribution Template Codegen............................................Passed
API Spec Codegen.........................................................Failed
- hook id: openapi-codegen
- exit code: 1

warning: `VIRTUAL_ENV=/Users/ihrachys/.cache/pre-commit/repo9p35zuhm/py_env-python3` does not match the project environment path `.venv` and will be ignored; use `--active` to target the active environment instead
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 91, in <module>
    fire.Fire(main)
  File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 44, in main
    return_type_errors = validate_api_method_return_types()
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/utility.py", line 130, in validate_api_method_return_types
    raise NotImplementedError("This function is not implemented yet")
NotImplementedError: This function is not implemented yet
```

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-20 15:17:52 -07:00
Dinesh Yeduguru
86f617a197
fix: tracing middleware to not start for lifespan events (#1730)
# What does this PR do?
Tracing middleware should not start tracing for lifespan events.
Lifespan event happens at server startup and shutdown and if we start
tracing for them, we will have an active trace for the lifetime of the
server, which messes up with regular tracing since we always expect the
traces to be never nested.

We started hitting this issue since
https://github.com/meta-llama/llama-stack/pull/1495.

## Test Plan
* llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
* Verify in sqlite store that the trace now has non null span id
![Screenshot 2025-03-20 at 1 49
47 PM](https://github.com/user-attachments/assets/d77354a7-d5f1-4b53-a946-6adbd7a4f772)
2025-03-20 14:22:19 -07:00
Yuan Tang
029e4fc64d
fix: Add missing gcc in container build. Fixes #1716 (#1727)
# What does this PR do?

This should fix https://github.com/meta-llama/llama-stack/issues/1716

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-20 15:50:56 -04:00
ehhuang
ea6a4a14ce
feat(api): simplify client imports (#1687)
# What does this PR do?
closes #1554 

## Test Plan
test_agents.py
2025-03-20 10:15:49 -07:00
Ihar Hrachyshka
515c16e352
chore: mypy violations cleanup for inline::{telemetry,tool_runtime,vector_io} (#1711)
# What does this PR do?

Clean up mypy violations for inline::{telemetry,tool_runtime,vector_io}.
This also makes API accept a tool call result without any content (like
RAG tool already may produce).

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-20 10:01:10 -07:00
Ihar Hrachyshka
355134f51d
fix: Support types.UnionType in schemas (#1721)
# What does this PR do?

Since Python 3.10, unions can be expressed as `type1 | type2`. Sadly,
while this is functionally equivalent to `Union[type1, type2]`, the type
of the expression is different (`types.UnionType`, not `typing.Union`).

We should handle both in schemas.

## Test Plan

Switch a schema type from Union to `|` and confirm the generator doesn't
crash with:

```
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 91, in <module>
    fire.Fire(main)
  File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 55, in main
    spec = Specification(
           ^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/utility.py", line 30, in __init__
    self.document = generator.generate()
                    ^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 782, in generate
    operation = self._build_operation(op)
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 648, in _build_operation
    "application/json": builder.build_media_type(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 221, in build_media_type
    schema = self.schema_builder.classdef_to_ref(item_type)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 135, in classdef_to_ref
    type_schema = self.classdef_to_schema(typ)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 116, in classdef_to_schema
    type_schema, type_definitions = self.schema_generator.classdef_to_schema(typ)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 607, in classdef_to_schema
    types_defined[sub_name] = self._type_to_schema_with_lookup(sub_type)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 564, in _type_to_schema_with_lookup
    type_schema = self.type_to_schema(data_type, force_expand=True)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 320, in type_to_schema
    return self._type_to_schema(data_type, force_expand, json_schema_extra) | common_info
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 487, in _type_to_schema
    property_docstrings = get_class_property_docstrings(typ, self.options.property_description_fun)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 94, in get_class_property_docstrings
    for base in inspect.getmro(data_type):
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/w2wykgpkzidnnr6cpw8wf94ghb0p8big-python3-3.11.11/lib/python3.11/inspect.py", line 731, in getmro
    return cls.__mro__
           ^^^^^^^^^^^
AttributeError: 'types.UnionType' object has no attribute '__mro__'. Did you mean: '__or__'?
```

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-20 09:54:02 -07:00
Ihar Hrachyshka
5403582582
fix: Restore discriminator for AlgorithmConfig (#1706) 2025-03-20 07:33:26 -07:00
ehhuang
af8b4484a3
fix: update default tool call system prompt (#1712)
# What does this PR do?
closes #1584 

This should be a rather innocuous change. 

## Test Plan

Verify that there's no more tool call parsing error for example in issue
<img width="1216" alt="image"
src="https://github.com/user-attachments/assets/a5a6f4e8-2093-4ca2-bc06-794b707a0429"
/>

LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
2025-03-19 22:49:24 -07:00
Ashwin Bharambe
01a25d9744
feat(server): add attribute based access control for resources (#1703)
This PR introduces a way to implement Attribute Based Access Control
(ABAC) for the Llama Stack server.

The rough design is:
- https://github.com/meta-llama/llama-stack/pull/1626 added a way for
the Llama Stack server to query an authenticator
- We build upon that and expect "access attributes" as part of the
response. These attributes indicate the scopes available for the
request.
- We use these attributes to perform access control for registered
resources as well as for constructing the default access control
policies for newly created resources.
- By default, if you support authentication but don't return access
attributes, we will add a unique namespace pointing to the API_KEY. That
way, all resources by default will be scoped to API_KEYs.

An important aspect of this design is that Llama Stack stays out of the
business of credential management or the CRUD for attributes. How you
manage your namespaces or projects is entirely up to you. The design
only implements access control checks for the metadata / book-keeping
information that the Stack tracks.

### Limitations

- Currently, read vs. write vs. admin permissions aren't made explicit,
but this can be easily extended by adding appropriate attributes to the
`AccessAttributes` data structure.
- This design does not apply to agent instances since they are not
considered resources the Stack knows about. Agent instances are
completely within the scope of the Agents API provider.

### Test Plan

Added unit tests, existing integration tests
2025-03-19 21:28:52 -07:00
ehhuang
c4e1b8d094
fix: better tool call parsing error message (#1710)
# What does this PR do?
context #1584

## Test Plan
<img width="1366" alt="image"
src="https://github.com/user-attachments/assets/b490b590-3270-43cb-838e-8446a8948f1d"
/>
2025-03-19 20:39:10 -07:00
Ihar Hrachyshka
41bd350539
chore: Don't set type variables from register_schema() (#1713)
# What does this PR do?

Don't set type variables from register_schema().

`mypy` is not happy about it since type variables are calculated at
runtime and hence the typing hints are not available during static
analysis.

Good news is there is no good reason to set the variables from the
return type.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-19 20:29:00 -07:00
Charlie Doern
a483a58c6e
chore: deprecate /v1/inspect/providers (#1678)
# What does this PR do?

with the new /v1/providers API, /v1/inspect/providers is duplicative,
deprecate it by removing the route, and add a test for the full
/v1/providers API

resolves #1623 

## Test Plan

`uv run pytest -v tests/integration/providers --stack-config=ollama
--text-model="meta-llama/Llama-3.2-3B-Instruct"
--embedding-model=all-MiniLM-L6-v2`

<img width="1512" alt="Screenshot 2025-03-18 at 9 18 38 AM"
src="https://github.com/user-attachments/assets/2db30f25-3ff6-4374-b39d-0047f093fe36"
/>

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-19 20:27:06 -07:00
Charlie Doern
1f04ca357b
fix: telemetry logger (#1714)
# What does this PR do?

currently if you have a run yaml without temeletry the following error
is hit:

TypeError: TelemetryAdapter.__init__() missing 1 required positional
argument: 'deps'

this is because the TelemetryAdapter requires a deps arg to be passed.
Pass {} to avoid errors.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-19 20:26:13 -07:00
Botao Chen
f369871083
feat: [New Eval Benchamark] IfEval (#1708)
# What does this PR do?
In this PR, we added a new eval open benchmark IfEval based on paper
https://arxiv.org/abs/2311.07911 to measure the model capability of
instruction following.


## Test Plan
spin up a llama stack server with open-benchmark template

run `llama-stack-client --endpoint xxx eval run-benchmark
"meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct"
--output-dir "/home/markchen1015/" --num-examples 20` on client side and
get the eval aggregate results
2025-03-19 16:39:59 -07:00
Michael Clifford
a7008dc15d
fix: Correctly set CLI_ARGS using BUILD_PLATFORM env with llama stack… (#1702)
# What does this PR do?
This PR updates `build_container.sh` to prevent an "unknown flag" error
when using the `BUILD_PLATFORM` environment variable during `llama stack
build`.


[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

Closes #1699 


## Test Plan

Running the following code with out these changes results in an "unknown
flag" error.

```
CONTAINER_BINARY=podman BUILD_PLATFORM=linux/amd64 llama stack build --template ollama --image-type container 
``` 

With these changes, the same command should build the image correctly.

Signed-off-by: Michael Clifford <mcliffor@redhat.com>
2025-03-19 16:18:11 -07:00
ehhuang
b6b103a20d
docs: update for mcp tools (#1705)
# What does this PR do?


## Test Plan
read
2025-03-19 15:45:53 -07:00
yyymeta
d117bfe597
feat: [new open benchmark] DocVQA (#1647)
# What does this PR do?
DocVQA asks model to look a a picture, then answer a question given in
text, with a text answer by text information in the picture. these
questions often require understanding of relative positions of texts
within the picture.

original dataset is defined in the "Task1" of
https://www.docvqa.org/datasets


## Test Plan
setup llama server with 

```
llama stack run ./llama_stack/templates/open-benchmark/run.yaml
```


then send traffic:

```
 llama-stack-client eval run-benchmark "meta-reference-docvqa"  --model-id   meta-llama/Llama-3.3-70B-Instruct     --output-dir /tmp/gpqa    --num-examples   200
```
2025-03-19 14:56:14 -07:00
ehhuang
1902e5754c
fix: toolgroups unregister (#1704)
# What does this PR do?
FAILED
tests/integration/tools/test_tools.py::test_toolsgroups_unregister[None]
- AttributeError: 'coroutine' object has no attribute 'data'

## Test Plan
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/tools/test_tools.py
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1704).
* #1705
* __->__ #1704
2025-03-19 13:43:51 -07:00
Botao Chen
ab777ef5cd
fix: fix open-benchmark template (#1695)
## What does this PR do?
open-benchmark templated is broken after the datasets api refactor due
to 2 reasons
- provider_id and provider_resource_id are no longer needed 
- the type in run.yaml will be resolved as dict

this PR is to fix the above 2 issues 

## Test 
spin up a llama stack server successfully with llama stack run
`llama_stack/templates/open-benchmark/run.yaml`
2025-03-19 11:27:11 -07:00
Xi Yan
c1d18283d2
feat(eval api): (2.2/n) delete eval / scoring / scoring_fn apis (#1700)
# What does this PR do?
- To make it easier, delete existing `eval/scoring/scoring_function`
apis. There will be a bunch of broken impls here. The sequence is:
1. migrate benchmark graders
2. clean up existing scoring functions

- Add a skeleton evaluation impl to make tests pass. 

## Test Plan
tested in following PRs

[//]: # (## Documentation)
2025-03-19 11:04:23 -07:00
Derek Higgins
6949bd1999
fix: Call pandas.read_* in a seperate thread (#1698)
These block on io reads which in turn block the
server. Move them to their own thread.

Closes: #1697

# What does this PR do?
To avoid blocking the main eventloop, updates datasetio/localfs to load
data in a seperate thread

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-03-19 10:46:37 -07:00
Hardik Shah
65ca85ba6b
fix: Updating ToolCall.arguments to allow for json strings that can be decoded on client side (#1685)
### What does this PR do?

Currently, `ToolCall.arguments` is a `Dict[str, RecursiveType]`.
However, on the client SDK side -- the `RecursiveType` gets deserialized
into a number ( both int and float get collapsed ) and hence when params
are `int` they get converted to float which might break client side
tools that might be doing type checking.

Closes: https://github.com/meta-llama/llama-stack/issues/1683

### Test Plan
Stainless changes --
https://github.com/meta-llama/llama-stack-client-python/pull/204
```
pytest -s -v --stack-config=fireworks tests/integration/agents/test_agents.py  --text-model meta-llama/Llama-3.1-8B-Instruct
```
2025-03-19 10:36:19 -07:00
ehhuang
113f3a259c
docs: add documentation for RAGDocument (#1693)
# What does this PR do?


## Test Plan
2025-03-19 10:16:00 -07:00
Francisco Arceo
5418e63919
chore: Add triagers list #1561 (#1701)
# What does this PR do?
Adds triagers list

## Closes #1561

## Documentation
Was provided here: https://github.com/meta-llama/llama-stack/pull/1621

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-03-19 09:59:17 -07:00
Xi Yan
0048274ec0 update 2025-03-19 09:49:53 -07:00
Xi Yan
42447729e4 update 2025-03-19 09:48:30 -07:00
Xi Yan
a92756a4b7 result_data in evaluation response 2025-03-18 22:09:35 -07:00
Yuan Tang
7c0448456e
docs: Remove mentions of focus on Llama models (#1690)
# What does this PR do?

This is a follow-up of
https://github.com/meta-llama/llama-stack/issues/965 to avoid mentioning
exclusive support on Llama models.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-19 00:17:22 -04:00
Xi Yan
08c0c5505e
feat(eval api): (2.1/n) fix resolver for benchmark routing table + fix precommit (#1691)
# What does this PR do?
- fixes routing table so that `llama stack run` works
- fixes pre-commit
- one of many fixes to address implementation fix

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
llama stack run
```

[//]: # (## Documentation)
2025-03-18 21:09:49 -07:00
Xi Yan
bf135f38b1 precommit 2025-03-18 20:48:03 -07:00
Xi Yan
205a50f10b openapi gen 2025-03-18 20:38:05 -07:00
Xi Yan
24d48b3692 Merge branch 'main' into eval_api_final 2025-03-18 20:17:24 -07:00
Xi Yan
913e6eb50f
Update llama_stack/apis/evaluation/evaluation.py
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-18 20:16:24 -07:00
Xi Yan
820b9a00c7
Update llama_stack/apis/evaluation/evaluation.py
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-18 20:16:15 -07:00
Xi Yan
85cad639ca
Update llama_stack/apis/evaluation/evaluation.py
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-18 20:16:08 -07:00
Xi Yan
d994499f09 update EvaluationTask 2025-03-18 19:30:01 -07:00
Xi Yan
f107e3229b update EvaluationTask 2025-03-18 19:28:34 -07:00
Xi Yan
5e817cd56a update 2025-03-18 18:16:00 -07:00
Xi Yan
398319fe7a agent config 2025-03-18 18:14:55 -07:00
Xi Yan
238cdc4e69 grading 2025-03-18 18:12:06 -07:00
Xi Yan
b98497ee56 docs 2025-03-18 18:10:45 -07:00
Xi Yan
e860c536da pre 2025-03-18 18:01:40 -07:00
Ashwin Bharambe
5b39d5a76a
feat(auth, rfc): Add support for Bearer (api_key) Authentication (#1626)
This PR adds support (or is a proposal for) for supporting API KEY
authentication on the Llama Stack server end. `llama-stack-client`
already supports accepting an api_key parameter and passes it down
through every request as an `Authentication: ` header.

Currently, Llama Stack does not propose APIs for handling authentication
or authorization for resources of any kind. Given that, and the fact
that any deployment will typically have _some_ authentication system
present, we simply adopt a delegation mechanism: delegate to an HTTPS
endpoint performing key management / authentication.

It is configured via: 
```yaml
server: 
   auth:
     endpoint: <...>
```

in the run.yaml configuration.


## How It Works

When authentication is enabled:

1. Every API request must include an `Authorization: Bearer <token>`
header
2. The server will send a _POST_ validation request to the configured
endpoint with the following payload:
   ```json
   {
     "api_key": "<token>",
     "request": {
       "path": "/api/path",
       "headers": { "header1": "value1", ... },
       "params": { "param1": "value1", ... }
     }
   }
   ```
3. If the authentication endpoint returns a 200 status code, the request
is allowed to proceed
4. If the authentication endpoint returns any other status code, a 401
Unauthorized response is returned

## Test Plan

Unit tests
2025-03-18 16:24:18 -07:00
yyymeta
b79e0435de
fix: avoid tensor memory error (#1688)
# What does this PR do?

we randomly get errors like the following, it's most likely due to
accessing an object that is already deallocated

```

E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] Traceback (most recent call last):
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     fn(i, *args)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 611, in _wrap
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     ret = record(fn)(*args_)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     return f(*args, **kwargs)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/internal-llama-stack/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py", line 249, in worker_process_entrypoint
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     task = req_gen.send(result)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/internal-llama-stack/llama_stack/providers/inline/inference/meta_reference/parallel_utils.py", line 156, in retrieve_requests
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     torch.distributed.broadcast_object_list(
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 81, in wrapper
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     return func(*args, **kwargs)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3504, in broadcast_object_list
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     object_list[i] = _tensor_to_object(obj_view, obj_size, group)
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]   File "/home/yyy/.conda/envs/myenv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2961, in _tensor_to_object
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]     return _unpickler(io.BytesIO(buf)).load()
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732] EOFError: Ran out of input
E0318 12:55:24.472000 1562188 site-packages/torch/distributed/elastic/multiprocessing/api.py:732]
Process SpawnProcess-1:
Traceback (most recent call last):
```

## Test Plan
start server

```
llama-stack-client eval run-benchmark mmmu_v1  --model-id meta-llama/Llama-4-17B-Omni-Instruct  --output-dir /tmp/mmmu_standard --num-examples 30
```

[//]: # (## Documentation)
2025-03-18 16:17:29 -07:00
Xi Yan
a69759613a comments 2025-03-18 15:01:41 -07:00