Commit graph

362 commits

Author SHA1 Message Date
Xi Yan
bc0cd07008 Merge branch 'main' into eval_api_final 2025-03-26 12:29:45 -07:00
Hardik Shah
cb2a9784ab
fix: multiple issues with getting_started notebook (#1795)
Fixes multiple issues 

1. llama stack build of dependencies was breaking with incompatible
numpy / pandas when importing datasets

Moved the notebook to start a local server instead of using library as a
client. This way the setup is cleaner since its all contained and by
using `uv run --with` we can test both the server setup process too in
CI and release time.

2. The change to [1] surfaced some other issues 
- running `llama stack run` was defaulting to conda env name 
- provider data was not being managed properly 
- Some notebook cells (telemetry for evals) were not updated with latest
changes

Fixed all the issues and update the notebook. 

### Test 

1. Manually run it all in local env 
2. `pytest -v -s --nbval-lax docs/getting_started.ipynb`
2025-03-26 10:59:12 -07:00
Ihar Hrachyshka
367c08f01e
feat(api): don't return a payload on file delete (#1640)
# What does this PR do?

This is to stay consistent with other APIs.

This change registers files in API, even though there are still no
providers. Removing tests that require a provider existing for a merged
API to enable it in API layer.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-25 17:12:36 -07:00
Xi Yan
97e7717c9b fix precommit 2025-03-23 16:42:50 -07:00
Xi Yan
3f8c7a584a precommit 2025-03-23 16:00:48 -07:00
Xi Yan
45f6d5cd08 remove ScoringFn 2025-03-23 15:57:48 -07:00
Xi Yan
a54d757ade merge 2025-03-23 15:48:14 -07:00
ehhuang
06788643b3
feat(telemetry): clean up spans (#1760) 2025-03-21 20:05:11 -07:00
Dinesh Yeduguru
5eb15684b4
feat: use same trace ids in stack and otel (#1759)
# What does this PR do?
1) Uses otel compatible id generation for stack
2) Stack starts returning trace id info in the header of response
3) We inject the same trace id that we have into otel in order to force
it to use our trace ids.

## Test Plan
```
 curl -i --request POST \
  --url http://localhost:8321/v1/inference/chat-completion \
  --header 'content-type: application/json' \
  --data '{
  "model_id": "meta-llama/Llama-3.1-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": {
        "type": "text",
        "text": "where do humans live"
      }
    }
  ],
  "stream": false
}'
HTTP/1.1 200 OK
date: Fri, 21 Mar 2025 21:51:19 GMT
server: uvicorn
content-length: 1712
content-type: application/json
x-trace-id: 595101ede31ece116ebe35b26d67e8cf

{"metrics":[{"metric":"prompt_tokens","value":10,"unit":null},{"metric":"completion_tokens","value":320,"unit":null},{"metric":"total_tokens","value":330,"unit":null}],"completion_message":{"role":"assistant","content":"Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. **Continents:** Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. **Countries:** There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. **Cities and towns:** Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. **Rural areas:** Some humans live in rural areas, such as villages, farms, and countryside.\n5. **Islands:** Humans inhabit many islands around the world, including tropical islands, island nations, and islands in the Arctic and Antarctic regions.\n6. **Underwater habitats:** A few humans live in underwater habitats, such as research stations and submarines.\n7. **Space:** A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nIn terms of specific environments, humans live in a wide range of ecosystems, including:\n\n* Deserts\n* Forests\n* Grasslands\n* Mountains\n* Oceans\n* Rivers\n* Tundras\n* Wetlands\n\nOverall, humans are incredibly adaptable and can be found living in almost every corner of the globe.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null}
```

Same trace id in Jaeger and sqlite:

![Screenshot 2025-03-21 at 2 51
53 PM](https://github.com/user-attachments/assets/38cc04b0-568c-4b9d-bccd-d3b90e581c27)
![Screenshot 2025-03-21 at 2 52
38 PM](https://github.com/user-attachments/assets/722383ad-6305-4020-8a1c-6cfdf381c25f)
2025-03-21 15:41:26 -07:00
Xi Yan
baf68c665c
fix: fix jobs api literal return type (#1757)
# What does this PR do?

- We cannot directly return a literal type

> Note: this is not final jobs API change

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
<img width="837" alt="image"
src="https://github.com/user-attachments/assets/18a17561-35f9-443d-987d-54afdd6ff40c"
/>


[//]: # (## Documentation)
2025-03-21 14:04:21 -07:00
ehhuang
34f89bfbd6
feat(telemetry): use zero-width space to avoid clutter (#1754)
# What does this PR do?
Before 
<img width="858" alt="image"
src="https://github.com/user-attachments/assets/6cefb1ae-5603-4818-85ea-a0c337b986bc"
/>

Note the redundant 'llama-stack' in front of every span

## Test Plan
<img width="1171" alt="image"
src="https://github.com/user-attachments/assets/bdc5fd5b-ff1f-4f10-8b40-cff2ea93dd1f"
/>
2025-03-21 12:02:10 -07:00
ehhuang
f76550ce4e
feat(telemetry): normalize path (#1739)
# What does this PR do?
This will prevent 'operations' from being flooded 
<img width="401" alt="image"
src="https://github.com/user-attachments/assets/c95e0eeb-4a10-4003-88df-9bb6d0a548cd"
/>


Before
<img width="1049" alt="image"
src="https://github.com/user-attachments/assets/157fb614-e007-4cb3-a571-226e50525bfa"
/>


## Test Plan
After
<img width="811" alt="image"
src="https://github.com/user-attachments/assets/b2b10344-1d73-44e5-abee-a9f039090963"
/>
2025-03-21 10:17:43 -07:00
Ashwin Bharambe
03b5c61bfc
feat: make sure agent sessions are under access control (#1737)
This builds on top of #1703.

Agent sessions are now properly access controlled.

## Test Plan

Added unit tests
2025-03-21 07:31:16 -07:00
Ashwin Bharambe
f95bc29ca9
fix: handle registry errors gracefully (#1732)
We need to be able to handle stale registry entries gracefully. More
needs to be done when we are deleting important attributes from
resources which could have been persisted. But at the very least, the
server cannot die.

## Test Plan

Added unit tests
2025-03-20 15:24:07 -07:00
Dinesh Yeduguru
86f617a197
fix: tracing middleware to not start for lifespan events (#1730)
# What does this PR do?
Tracing middleware should not start tracing for lifespan events.
Lifespan event happens at server startup and shutdown and if we start
tracing for them, we will have an active trace for the lifetime of the
server, which messes up with regular tracing since we always expect the
traces to be never nested.

We started hitting this issue since
https://github.com/meta-llama/llama-stack/pull/1495.

## Test Plan
* llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
* Verify in sqlite store that the trace now has non null span id
![Screenshot 2025-03-20 at 1 49
47 PM](https://github.com/user-attachments/assets/d77354a7-d5f1-4b53-a946-6adbd7a4f772)
2025-03-20 14:22:19 -07:00
Yuan Tang
029e4fc64d
fix: Add missing gcc in container build. Fixes #1716 (#1727)
# What does this PR do?

This should fix https://github.com/meta-llama/llama-stack/issues/1716

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-20 15:50:56 -04:00
ehhuang
ea6a4a14ce
feat(api): simplify client imports (#1687)
# What does this PR do?
closes #1554 

## Test Plan
test_agents.py
2025-03-20 10:15:49 -07:00
Ashwin Bharambe
01a25d9744
feat(server): add attribute based access control for resources (#1703)
This PR introduces a way to implement Attribute Based Access Control
(ABAC) for the Llama Stack server.

The rough design is:
- https://github.com/meta-llama/llama-stack/pull/1626 added a way for
the Llama Stack server to query an authenticator
- We build upon that and expect "access attributes" as part of the
response. These attributes indicate the scopes available for the
request.
- We use these attributes to perform access control for registered
resources as well as for constructing the default access control
policies for newly created resources.
- By default, if you support authentication but don't return access
attributes, we will add a unique namespace pointing to the API_KEY. That
way, all resources by default will be scoped to API_KEYs.

An important aspect of this design is that Llama Stack stays out of the
business of credential management or the CRUD for attributes. How you
manage your namespaces or projects is entirely up to you. The design
only implements access control checks for the metadata / book-keeping
information that the Stack tracks.

### Limitations

- Currently, read vs. write vs. admin permissions aren't made explicit,
but this can be easily extended by adding appropriate attributes to the
`AccessAttributes` data structure.
- This design does not apply to agent instances since they are not
considered resources the Stack knows about. Agent instances are
completely within the scope of the Agents API provider.

### Test Plan

Added unit tests, existing integration tests
2025-03-19 21:28:52 -07:00
Charlie Doern
a483a58c6e
chore: deprecate /v1/inspect/providers (#1678)
# What does this PR do?

with the new /v1/providers API, /v1/inspect/providers is duplicative,
deprecate it by removing the route, and add a test for the full
/v1/providers API

resolves #1623 

## Test Plan

`uv run pytest -v tests/integration/providers --stack-config=ollama
--text-model="meta-llama/Llama-3.2-3B-Instruct"
--embedding-model=all-MiniLM-L6-v2`

<img width="1512" alt="Screenshot 2025-03-18 at 9 18 38 AM"
src="https://github.com/user-attachments/assets/2db30f25-3ff6-4374-b39d-0047f093fe36"
/>

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-19 20:27:06 -07:00
Charlie Doern
1f04ca357b
fix: telemetry logger (#1714)
# What does this PR do?

currently if you have a run yaml without temeletry the following error
is hit:

TypeError: TelemetryAdapter.__init__() missing 1 required positional
argument: 'deps'

this is because the TelemetryAdapter requires a deps arg to be passed.
Pass {} to avoid errors.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-19 20:26:13 -07:00
Michael Clifford
a7008dc15d
fix: Correctly set CLI_ARGS using BUILD_PLATFORM env with llama stack… (#1702)
# What does this PR do?
This PR updates `build_container.sh` to prevent an "unknown flag" error
when using the `BUILD_PLATFORM` environment variable during `llama stack
build`.


[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

Closes #1699 


## Test Plan

Running the following code with out these changes results in an "unknown
flag" error.

```
CONTAINER_BINARY=podman BUILD_PLATFORM=linux/amd64 llama stack build --template ollama --image-type container 
``` 

With these changes, the same command should build the image correctly.

Signed-off-by: Michael Clifford <mcliffor@redhat.com>
2025-03-19 16:18:11 -07:00
ehhuang
1902e5754c
fix: toolgroups unregister (#1704)
# What does this PR do?
FAILED
tests/integration/tools/test_tools.py::test_toolsgroups_unregister[None]
- AttributeError: 'coroutine' object has no attribute 'data'

## Test Plan
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/tools/test_tools.py
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1704).
* #1705
* __->__ #1704
2025-03-19 13:43:51 -07:00
Botao Chen
ab777ef5cd
fix: fix open-benchmark template (#1695)
## What does this PR do?
open-benchmark templated is broken after the datasets api refactor due
to 2 reasons
- provider_id and provider_resource_id are no longer needed 
- the type in run.yaml will be resolved as dict

this PR is to fix the above 2 issues 

## Test 
spin up a llama stack server successfully with llama stack run
`llama_stack/templates/open-benchmark/run.yaml`
2025-03-19 11:27:11 -07:00
Xi Yan
c1d18283d2
feat(eval api): (2.2/n) delete eval / scoring / scoring_fn apis (#1700)
# What does this PR do?
- To make it easier, delete existing `eval/scoring/scoring_function`
apis. There will be a bunch of broken impls here. The sequence is:
1. migrate benchmark graders
2. clean up existing scoring functions

- Add a skeleton evaluation impl to make tests pass. 

## Test Plan
tested in following PRs

[//]: # (## Documentation)
2025-03-19 11:04:23 -07:00
Xi Yan
08c0c5505e
feat(eval api): (2.1/n) fix resolver for benchmark routing table + fix precommit (#1691)
# What does this PR do?
- fixes routing table so that `llama stack run` works
- fixes pre-commit
- one of many fixes to address implementation fix

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
llama stack run
```

[//]: # (## Documentation)
2025-03-18 21:09:49 -07:00
Xi Yan
24d48b3692 Merge branch 'main' into eval_api_final 2025-03-18 20:17:24 -07:00
Xi Yan
e860c536da pre 2025-03-18 18:01:40 -07:00
Ashwin Bharambe
5b39d5a76a
feat(auth, rfc): Add support for Bearer (api_key) Authentication (#1626)
This PR adds support (or is a proposal for) for supporting API KEY
authentication on the Llama Stack server end. `llama-stack-client`
already supports accepting an api_key parameter and passes it down
through every request as an `Authentication: ` header.

Currently, Llama Stack does not propose APIs for handling authentication
or authorization for resources of any kind. Given that, and the fact
that any deployment will typically have _some_ authentication system
present, we simply adopt a delegation mechanism: delegate to an HTTPS
endpoint performing key management / authentication.

It is configured via: 
```yaml
server: 
   auth:
     endpoint: <...>
```

in the run.yaml configuration.


## How It Works

When authentication is enabled:

1. Every API request must include an `Authorization: Bearer <token>`
header
2. The server will send a _POST_ validation request to the configured
endpoint with the following payload:
   ```json
   {
     "api_key": "<token>",
     "request": {
       "path": "/api/path",
       "headers": { "header1": "value1", ... },
       "params": { "param1": "value1", ... }
     }
   }
   ```
3. If the authentication endpoint returns a 200 status code, the request
is allowed to proceed
4. If the authentication endpoint returns any other status code, a 401
Unauthorized response is returned

## Test Plan

Unit tests
2025-03-18 16:24:18 -07:00
Xi Yan
a69759613a comments 2025-03-18 15:01:41 -07:00
Sarthak Deshpande
9c8e88ea9c
fix: Fixed import errors for UI and playground (#1666)
# What does this PR do?
Fixed import errors for playground and ui

---------

Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
2025-03-18 15:00:48 -07:00
Xi Yan
46f2ba5910 Merge branch 'main' into eval_api_final 2025-03-18 14:49:57 -07:00
Sébastien Han
c029fbcd13
fix: return 4xx for non-existent resources in GET requests (#1635)
# What does this PR do?

- Removed Optional return types for GET methods
- Raised ValueError when requested resource is not found
- Ensures proper 4xx response for missing resources
- Updated the API generator to check for wrong signatures

```
$ uv run --with ".[dev]" ./docs/openapi_generator/run_openapi_generator.sh
Validating API method return types...

API Method Return Type Validation Errors:

Method ScoringFunctions.get_scoring_function returns Optional type
```

Closes: https://github.com/meta-llama/llama-stack/issues/1630

## Test Plan

Run the server then:

```
curl http://127.0.0.1:8321/v1/models/foo     
{"detail":"Invalid value: Model 'foo' not found"}%  
```

Server log:

```
INFO:     127.0.0.1:52307 - "GET /v1/models/foo HTTP/1.1" 400 Bad Request
09:51:42.654 [END] /v1/models/foo [StatusCode.OK] (134.65ms)
 09:51:42.651 [ERROR] Error executing endpoint route='/v1/models/{model_id:path}' method='get'
Traceback (most recent call last):
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 193, in endpoint
    return await maybe_await(value)
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 156, in maybe_await
    return await value
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 217, in get_model
    raise ValueError(f"Model '{model_id}' not found")
ValueError: Model 'foo' not found
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-18 14:06:53 -07:00
Jamie Land
f4dc290705
feat: Created Playground Containerfile and Image Workflow (#1256)
# What does this PR do?
Adds a container file that can be used to build the playground UI.

This file will be built by this PR in the stack-ops repo:
https://github.com/meta-llama/llama-stack-ops/pull/9

Docker command in the docs will need to change once I know the address
of the official repository.

## Test Plan

Tested image on my local Openshift Instance using this helm chart:
https://github.com/Jaland/llama-stack-helm/tree/main/llama-stack

[//]: # (## Documentation)

---------

Co-authored-by: Jamie Land <hokie10@gmail.com>
2025-03-18 09:26:49 -07:00
Xi Yan
66cd83fb58 Merge branch 'main' into eval_api_final 2025-03-17 17:00:30 -07:00
Xi Yan
5287b437ae
feat(api): (1/n) datasets api clean up (#1573)
## PR Stack
- https://github.com/meta-llama/llama-stack/pull/1573
- https://github.com/meta-llama/llama-stack/pull/1625
- https://github.com/meta-llama/llama-stack/pull/1656
- https://github.com/meta-llama/llama-stack/pull/1657
- https://github.com/meta-llama/llama-stack/pull/1658
- https://github.com/meta-llama/llama-stack/pull/1659
- https://github.com/meta-llama/llama-stack/pull/1660

**Client SDK**
- https://github.com/meta-llama/llama-stack-client-python/pull/203

**CI**
- 1391130488
<img width="1042" alt="image"
src="https://github.com/user-attachments/assets/69636067-376d-436b-9204-896e2dd490ca"
/>
-- the test_rag_agent_with_attachments is flaky and not related to this
PR

## Doc
<img width="789" alt="image"
src="https://github.com/user-attachments/assets/b88390f3-73d6-4483-b09a-a192064e32d9"
/>


## Client Usage
```python
client.datasets.register(
    source={
        "type": "uri",
        "uri": "lsfs://mydata.jsonl",
    },
    schema="jsonl_messages",
    # optional 
    dataset_id="my_first_train_data"
)

# quick prototype debugging
client.datasets.register(
    data_reference={
        "type": "rows",
        "rows": [
                "messages": [...],
        ],
    },
    schema="jsonl_messages",
)
```

## Test Plan
- CI:
1387805545

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/datasets/test_datasets.py
```

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py
```

```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
2025-03-17 16:55:45 -07:00
Sébastien Han
24fd06879e
refactor: simplify command execution and remove PTY handling (#1641)
# What does this PR do?

A PTY is unnecessary for interactive mode since `subprocess.run()`
already inherits the calling terminal’s stdin, stdout, and stderr,
allowing natural interaction. Using a PTY can introduce unwanted side
effects like buffering issues and inconsistent signal handling. Standard
input/output is sufficient for most interactive programs.

This commit simplifies the command execution by:

1. Removing PTY-based execution in favor of direct subprocess handling
2. Consolidating command execution into a single run_command function
3. Improving error handling with specific subprocess error types
4. Adding proper type hints and documentation
5. Maintaining Ctrl+C handling for graceful interruption

## Test Plan

```
llama stack run
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-17 15:03:14 -07:00
Nathan Weinberg
e48af78b76
fix: add shutdown method for ProviderImpl (#1670)
# What does this PR do?
Currently there is no shutdown method implemented for the `ProviderImpl`
class

This leads to the following warning
```shell
INFO:     Waiting for application shutdown.
INFO     2025-03-17 17:25:13,280 __main__:145 server: Shutting down                                                     
INFO     2025-03-17 17:25:13,282 __main__:129 server: Shutting down ModelsRoutingTable                                  
INFO     2025-03-17 17:25:13,284 __main__:129 server: Shutting down DatasetsRoutingTable                                
INFO     2025-03-17 17:25:13,286 __main__:129 server: Shutting down DatasetIORouter                                     
INFO     2025-03-17 17:25:13,287 __main__:129 server: Shutting down TelemetryAdapter                                    
INFO     2025-03-17 17:25:13,288 __main__:129 server: Shutting down InferenceRouter                                     
INFO     2025-03-17 17:25:13,290 __main__:129 server: Shutting down ShieldsRoutingTable                                 
INFO     2025-03-17 17:25:13,291 __main__:129 server: Shutting down SafetyRouter                                        
INFO     2025-03-17 17:25:13,292 __main__:129 server: Shutting down VectorDBsRoutingTable                               
INFO     2025-03-17 17:25:13,293 __main__:129 server: Shutting down VectorIORouter                                      
INFO     2025-03-17 17:25:13,294 __main__:129 server: Shutting down ToolGroupsRoutingTable                              
INFO     2025-03-17 17:25:13,295 __main__:129 server: Shutting down ToolRuntimeRouter                                   
INFO     2025-03-17 17:25:13,296 __main__:129 server: Shutting down MetaReferenceAgentsImpl                             
INFO     2025-03-17 17:25:13,297 __main__:129 server: Shutting down ScoringFunctionsRoutingTable                        
INFO     2025-03-17 17:25:13,298 __main__:129 server: Shutting down ScoringRouter                                       
INFO     2025-03-17 17:25:13,299 __main__:129 server: Shutting down BenchmarksRoutingTable                              
INFO     2025-03-17 17:25:13,300 __main__:129 server: Shutting down EvalRouter                                          
INFO     2025-03-17 17:25:13,301 __main__:129 server: Shutting down DistributionInspectImpl                             
INFO     2025-03-17 17:25:13,303 __main__:129 server: Shutting down ProviderImpl                                        
WARNING  2025-03-17 17:25:13,304 __main__:134 server: No shutdown method for ProviderImpl                               
INFO:     Application shutdown complete.
INFO:     Finished server process [1]
```

## Test Plan
Start a server and shut it down

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-03-17 14:55:40 -07:00
Jeff MAURY
f11b6db40d
fix: build distribution with podman (#1671)
# What does this PR do?

Update the container build script so that it is compatible with podman.
The --progress=plain is now the default option and can be overriden.

## Test Plan
N/A

[//]: # (## Documentation)

Signed-off-by: Jeff MAURY <jmaury@redhat.com>
2025-03-17 14:30:06 -07:00
Xi Yan
c80d1f906b precommit 2025-03-16 19:40:18 -07:00
Xi Yan
035b2dcb60 new apis 2025-03-16 19:33:57 -07:00
Xi Yan
a6fa3aa5a2
feat(dataset api): (1.6/n) fix all iterrows callsites (#1660)
# What does this PR do?
- as title

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
CI

```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
<img width="587" alt="image"
src="https://github.com/user-attachments/assets/4a25f493-501e-43f4-9836-d9802223a93a"
/>


[//]: # (## Documentation)
2025-03-15 17:24:16 -07:00
Xi Yan
a568bf3f9d
feat(dataset api): (1.5/n) fix dataset registeration (#1659)
# What does this PR do?

- fix dataset registeration & iterrows
> NOTE: the URL endpoint is changed to datasetio due to flaky path
routing

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/datasets/test_datasets.py
```
<img width="854" alt="image"
src="https://github.com/user-attachments/assets/0168b352-1c5a-48d1-8e9a-93141d418e54"
/>


[//]: # (## Documentation)
2025-03-15 16:48:09 -07:00
Xi Yan
2c9d624910
feat(dataset api): (1.4/n) fix resolver signature mismatch (#1658)
# What does this PR do?
- fix datasets api signature mis-match so that llama stack run can start

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
llama stack run
```
<img width="626" alt="image"
src="https://github.com/user-attachments/assets/59072d1a-ccb6-453a-80e8-d87419896c41"
/>


[//]: # (## Documentation)
2025-03-15 14:56:11 -07:00
Charlie Doern
78d4872c0c
feat: add support for logging config in the run.yaml (#1408)
# What does this PR do?

a user should be able to store a static logging configuration outside of
their environment. This would make sense to store in the run yaml given
that we store other things like server configuration in there.

The environment variable settings override the config settings if both
are available.

The format in the config looks like this:

```
logging_config:
  category_levels:
    VALID_CATEGORY: VALID_STRING_LOG_LEVEL
```

any specified category out of the following:

`core | server | router | inference | agents | safety | eval | tools |
client`

combined with any of the following log levels:

`debug | info | warning | error | critical`

can be placed in the category_levels list in order to achieve the
desired log level

## Test Plan

Test locally with a run config like the following:

```
version: '2'
image_name: ollama
logging_config:
  category_levels:
      server: debug
apis:
...
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-14 12:36:25 -07:00
Xi Yan
33b096cc21
fix: OpenAPI with provider get (#1627)
# What does this PR do?
- https://github.com/meta-llama/llama-stack/pull/1429 introduces
GetProviderResponse in OpenAPI, which is not needed, and not correctly
defined.

cc @cdoern 


[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
llama-stack-client providers list
```
<img width="610" alt="image"
src="https://github.com/user-attachments/assets/2f7b62a5-daf2-4bf9-9505-69755c7025fc"
/>


[//]: # (## Documentation)
2025-03-13 19:56:32 -07:00
Charlie Doern
a062723d03
feat: add provider API for listing and inspecting provider info (#1429)
# What does this PR do?

currently the `inspect` API for providers is really a `list` API. Create
a new `providers` API which has a GET `providers/{provider_id}` inspect
API
which returns "user friendly" configuration to the end user. Also add a
GET `/providers` endpoint which returns the list of providers as
`inspect/providers` does today.

This API follows CRUD and is more intuitive/RESTful.

This work is part of the RFC at
https://github.com/meta-llama/llama-stack/pull/1359

sensitive fields are redacted using `redact_sensetive_fields` on the
server side before returning a response:

<img width="456" alt="Screenshot 2025-03-13 at 4 40 21 PM"
src="https://github.com/user-attachments/assets/9465c221-2a26-42f8-a08a-6ac4a9fecce8"
/>


## Test Plan

using https://github.com/meta-llama/llama-stack-client-python/pull/181 a
user is able to to run the following:

`llama stack build --template ollama --image-type venv`
`llama stack run --image-type venv
~/.llama/distributions/ollama/ollama-run.yaml`
`llama-stack-client providers inspect ollama`

<img width="378" alt="Screenshot 2025-03-13 at 4 39 35 PM"
src="https://github.com/user-attachments/assets/8273d05d-8bc3-44c6-9e4b-ef95e48d5466"
/>


also, was able to run the new test_list integration test locally with
ollama:

<img width="1509" alt="Screenshot 2025-03-13 at 11 03 40 AM"
src="https://github.com/user-attachments/assets/9b9db166-f02f-45b0-86a4-306d85149bc8"
/>

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-13 15:07:21 -07:00
Dinesh Yeduguru
99bbe0e70b
feat: Add new compact MetricInResponse type (#1593)
# What does this PR do?
This change adds a compact type to include metrics in response as
opposed to the full MetricEvent which is relevant for internal logging
purposes.

## Test Plan
```
LLAMA_STACK_CONFIG=~/.llama/distributions/fireworks/fireworks-run.yaml pytest -s -v agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct

 llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml

curl --request POST \
  --url http://localhost:8321/v1/inference/chat-completion \
  --header 'content-type: application/json' \
  --data '{
  "model_id": "meta-llama/Llama-3.1-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": {
        "type": "text",
        "text": "where do humans live"
      }
    }
  ],
  "stream": false
}'

{
  "metrics": [
    {
      "metric": "prompt_tokens",
      "value": 10,
      "unit": null
    },
    {
      "metric": "completion_tokens",
      "value": 522,
      "unit": null
    },
    {
      "metric": "total_tokens",
      "value": 532,
      "unit": null
    }
  ],
  "completion_message": {
    "role": "assistant",
    "content": "Humans live in various parts of the world...............",
    "stop_reason": "out_of_tokens",
    "tool_calls": []
  },
  "logprobs": null
}
```
2025-03-12 15:45:44 -07:00
ehhuang
1311faf3f5
fix: logging (#1598)
Summary:

Test Plan:
2025-03-12 14:57:31 -07:00
Dinesh Yeduguru
0fdb15bcc7
fix: fix build error in context.py (#1595)
# What does this PR do?
This fixes the build error


## Test Plan
pre-commit run --all-files
check for merge
conflicts................................................Passed
trim trailing
whitespace.................................................Passed
check for added large
files..............................................Passed
fix end of
files.........................................................Passed
Insert license in
comments...............................................Passed

ruff.....................................................................Passed

ruff-format..............................................................Passed

blacken-docs.............................................................Passed

uv-lock..................................................................Passed

uv-export................................................................Passed

mypy.....................................................................Passed
Distribution Template
Codegen............................................Passed
2025-03-12 13:26:23 -07:00
Dinesh Yeduguru
58d08d100e
feat: Add back inference metrics and preserve context variables across asyncio boundary (#1552)
# What does this PR do?
This PR adds back the changes in #1300  which were reverted in  #1476 .

It also adds logic to preserve context variables across asyncio
boundary. this is needed with the library client since the async
generator logic yields control to code outside the event loop, and on
resuming, does not have the same context as before and this requires
preserving the context vars.

address #1477 
## Test Plan


```
 curl --request POST \
  --url http://localhost:8321/v1/inference/chat-completion \
  --header 'content-type: application/json' \
  --data '{
  "model_id": "meta-llama/Llama-3.1-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": {
        "type": "text",
        "text": "where do humans live"
      }
    }
  ],
  "stream": false
}' | jq .

{
  "metrics": [
    {
      "trace_id": "kCZwO3tyQC-FuAGb",
      "span_id": "bsP_5a5O",
      "timestamp": "2025-03-11T16:47:38.549084Z",
      "attributes": {
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "fireworks"
      },
      "type": "metric",
      "metric": "prompt_tokens",
      "value": 10,
      "unit": "tokens"
    },
    {
      "trace_id": "kCZwO3tyQC-FuAGb",
      "span_id": "bsP_5a5O",
      "timestamp": "2025-03-11T16:47:38.549449Z",
      "attributes": {
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "fireworks"
      },
      "type": "metric",
      "metric": "completion_tokens",
      "value": 369,
      "unit": "tokens"
    },
    {
      "trace_id": "kCZwO3tyQC-FuAGb",
      "span_id": "bsP_5a5O",
      "timestamp": "2025-03-11T16:47:38.549457Z",
      "attributes": {
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "fireworks"
      },
      "type": "metric",
      "metric": "total_tokens",
      "value": 379,
      "unit": "tokens"
    }
  ],
  "completion_message": {
    "role": "assistant",
    "content": "Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. **Continents:** Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. **Countries:** There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. **Cities and towns:** Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. **Rural areas:** Some humans live in rural areas, such as villages, farms, and countryside.\n5. **Islands:** Humans inhabit many islands around the world, including those in the Pacific, Indian, and Atlantic Oceans.\n6. **Mountains and highlands:** Humans live in mountainous regions, such as the Himalayas, the Andes, and the Rocky Mountains.\n7. **Deserts:** Some humans live in desert regions, such as the Sahara, the Mojave, and the Atacama.\n8. **Coastal areas:** Many humans live in coastal areas, such as beaches, ports, and coastal cities.\n9. **Underwater habitats:** A few humans live in underwater habitats, such as research stations and submarines.\n10. **Space:** A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nOverall, humans can be found living in almost every environment on Earth, from the frozen tundra to the hottest deserts, and from the highest mountains to the deepest oceans.",
    "stop_reason": "end_of_turn",
    "tool_calls": []
  },
  "logprobs": null
}

```

Orignal repro no longer showing any error:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```

client logs:
https://gist.github.com/dineshyv/047c7e87b18a5792aa660e311ea53166
server logs:
https://gist.github.com/dineshyv/97a2174099619e9916c7c490be26e559
2025-03-12 12:01:03 -07:00