Commit graph

1523 commits

Author SHA1 Message Date
Xi Yan
18de4cd08a comments 2025-03-12 18:38:07 -07:00
Xi Yan
8942071b3b Merge branch 'main' into pr1573 2025-03-12 18:23:39 -07:00
Dinesh Yeduguru
99bbe0e70b
feat: Add new compact MetricInResponse type (#1593)
# What does this PR do?
This change adds a compact type to include metrics in response as
opposed to the full MetricEvent which is relevant for internal logging
purposes.

## Test Plan
```
LLAMA_STACK_CONFIG=~/.llama/distributions/fireworks/fireworks-run.yaml pytest -s -v agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct

 llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml

curl --request POST \
  --url http://localhost:8321/v1/inference/chat-completion \
  --header 'content-type: application/json' \
  --data '{
  "model_id": "meta-llama/Llama-3.1-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": {
        "type": "text",
        "text": "where do humans live"
      }
    }
  ],
  "stream": false
}'

{
  "metrics": [
    {
      "metric": "prompt_tokens",
      "value": 10,
      "unit": null
    },
    {
      "metric": "completion_tokens",
      "value": 522,
      "unit": null
    },
    {
      "metric": "total_tokens",
      "value": 532,
      "unit": null
    }
  ],
  "completion_message": {
    "role": "assistant",
    "content": "Humans live in various parts of the world...............",
    "stop_reason": "out_of_tokens",
    "tool_calls": []
  },
  "logprobs": null
}
```
2025-03-12 15:45:44 -07:00
Nathan Weinberg
ad939c97c3
docs: add unit test badge to README (#1591)
# What does this PR do?
This PR adds a simple unit test badge to the project README

It also modifies the workflow to run on merges to main, so that the
status reflected in the README is that of main and not pull request
branches

---------

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-03-12 15:41:35 -07:00
ehhuang
1311faf3f5
fix: logging (#1598)
Summary:

Test Plan:
2025-03-12 14:57:31 -07:00
Dinesh Yeduguru
0fdb15bcc7
fix: fix build error in context.py (#1595)
# What does this PR do?
This fixes the build error


## Test Plan
pre-commit run --all-files
check for merge
conflicts................................................Passed
trim trailing
whitespace.................................................Passed
check for added large
files..............................................Passed
fix end of
files.........................................................Passed
Insert license in
comments...............................................Passed

ruff.....................................................................Passed

ruff-format..............................................................Passed

blacken-docs.............................................................Passed

uv-lock..................................................................Passed

uv-export................................................................Passed

mypy.....................................................................Passed
Distribution Template
Codegen............................................Passed
2025-03-12 13:26:23 -07:00
Xi Yan
f840018088 Merge branch 'main' into pr1573 2025-03-12 12:31:49 -07:00
ehhuang
b7a9c45477
chore: deprecate ToolResponseMessage in agent.resume API (#1566)
# Summary:
closes #1431 

# Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
2025-03-12 12:10:21 -07:00
Dinesh Yeduguru
58d08d100e
feat: Add back inference metrics and preserve context variables across asyncio boundary (#1552)
# What does this PR do?
This PR adds back the changes in #1300  which were reverted in  #1476 .

It also adds logic to preserve context variables across asyncio
boundary. this is needed with the library client since the async
generator logic yields control to code outside the event loop, and on
resuming, does not have the same context as before and this requires
preserving the context vars.

address #1477 
## Test Plan


```
 curl --request POST \
  --url http://localhost:8321/v1/inference/chat-completion \
  --header 'content-type: application/json' \
  --data '{
  "model_id": "meta-llama/Llama-3.1-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": {
        "type": "text",
        "text": "where do humans live"
      }
    }
  ],
  "stream": false
}' | jq .

{
  "metrics": [
    {
      "trace_id": "kCZwO3tyQC-FuAGb",
      "span_id": "bsP_5a5O",
      "timestamp": "2025-03-11T16:47:38.549084Z",
      "attributes": {
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "fireworks"
      },
      "type": "metric",
      "metric": "prompt_tokens",
      "value": 10,
      "unit": "tokens"
    },
    {
      "trace_id": "kCZwO3tyQC-FuAGb",
      "span_id": "bsP_5a5O",
      "timestamp": "2025-03-11T16:47:38.549449Z",
      "attributes": {
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "fireworks"
      },
      "type": "metric",
      "metric": "completion_tokens",
      "value": 369,
      "unit": "tokens"
    },
    {
      "trace_id": "kCZwO3tyQC-FuAGb",
      "span_id": "bsP_5a5O",
      "timestamp": "2025-03-11T16:47:38.549457Z",
      "attributes": {
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "fireworks"
      },
      "type": "metric",
      "metric": "total_tokens",
      "value": 379,
      "unit": "tokens"
    }
  ],
  "completion_message": {
    "role": "assistant",
    "content": "Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. **Continents:** Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. **Countries:** There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. **Cities and towns:** Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. **Rural areas:** Some humans live in rural areas, such as villages, farms, and countryside.\n5. **Islands:** Humans inhabit many islands around the world, including those in the Pacific, Indian, and Atlantic Oceans.\n6. **Mountains and highlands:** Humans live in mountainous regions, such as the Himalayas, the Andes, and the Rocky Mountains.\n7. **Deserts:** Some humans live in desert regions, such as the Sahara, the Mojave, and the Atacama.\n8. **Coastal areas:** Many humans live in coastal areas, such as beaches, ports, and coastal cities.\n9. **Underwater habitats:** A few humans live in underwater habitats, such as research stations and submarines.\n10. **Space:** A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nOverall, humans can be found living in almost every environment on Earth, from the frozen tundra to the hottest deserts, and from the highest mountains to the deepest oceans.",
    "stop_reason": "end_of_turn",
    "tool_calls": []
  },
  "logprobs": null
}

```

Orignal repro no longer showing any error:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```

client logs:
https://gist.github.com/dineshyv/047c7e87b18a5792aa660e311ea53166
server logs:
https://gist.github.com/dineshyv/97a2174099619e9916c7c490be26e559
2025-03-12 12:01:03 -07:00
Xi Yan
c7139b0b67
fix: fix precommit (#1594)
# What does this PR do?

- fix precommit

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
CI

[//]: # (## Documentation)
2025-03-12 11:59:21 -07:00
Xi Yan
31e3409909 Merge branch 'main' into pr1573 2025-03-12 11:38:02 -07:00
Botao Chen
90ca4d94de
fix: fix passthrough inference provider to make it work for agent (#1577)
## What does this PR do?
We noticed that the passthrough inference provider doesn't work agent
due to the type mis-match between client and server. We manually cast
the llama stack client type to llama stack server type to fix the issue.

## test 
run `python -m examples.agents.hello localhost 8321` within
llama-stack-apps

<img width="1073" alt="Screenshot 2025-03-11 at 8 43 44 PM"
src="https://github.com/user-attachments/assets/bd1bdd31-606a-420c-a249-95f6184cc0b1"
/>

fix https://github.com/meta-llama/llama-stack/issues/1560
2025-03-12 11:16:17 -07:00
Botao Chen
0b0be70605
feat: Add open benchmark template codegen (#1579)
## What does this PR do?

As title, add codegen for open-benchmark template

## test 

checked the new generated run.yaml file and it's identical before and
after the change

Also add small improvement to together template so that missing
TOGETHER_API_KEY won't crash the server which is the consistent user
experience as other remote providers
2025-03-12 11:12:08 -07:00
Charlie Doern
4eee349acd
fix: respect log_level in uvicorn and third party libs (#1524)
# What does this PR do?

uvicorn has a `log_level` arg in uvicorn.run, pass in the effective
level set by the logger.

Additionally, third party libraries like httpx are using our logging
format, but not honoring our log level.

This seems unintended, so loop through all items in the loggerDict and
apply the same log level as what we have set.


## Test Plan

before:

```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Environment variable LLAMA_STACK_LOGGING found: all=warn
Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Environment variable LLAMA_STACK_LOGGING found: all=warn
WARNING  2025-03-10 16:05:49,706 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will
         not work correctly.
INFO     2025-03-10 16:05:49,916 datasets:54 uncategorized: PyTorch version 2.5.1 available.
INFO     2025-03-10 16:05:50,010 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200
         OK"
INFO     2025-03-10 16:05:50,297 httpx:1740 uncategorized: HTTP Request: POST http://localhost:11434/api/pull "HTTP/1.1
         200 OK"
INFO     2025-03-10 16:05:50,314 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/tags "HTTP/1.1
         200 OK"
INFO:     Started server process [89663]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```

after:

```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Environment variable LLAMA_STACK_LOGGING found: all=warn
Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Environment variable LLAMA_STACK_LOGGING found: all=warn
WARNING  2025-03-10 16:05:20,429 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will
         not work correctly.
INFO     2025-03-10 16:05:20,639 datasets:54 uncategorized: PyTorch version 2.5.1 available.
```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-12 11:07:28 -07:00
Nathan Weinberg
00da911167
ci: run unit tests on all supported python versions (#1575)
# What does this PR do?
python unit tests running via GitHub Actions were only running with
python 3.10

the project supports all python versions greater than or equal to 3.10

this commit adds 3.11, 3.12, and 3.13 to the test matrix for better
coverage and confidence for non-3.10 users

## Test Plan
All tests pass locally with python 3.11, 3.12, and 3.13

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-03-12 09:55:11 -07:00
Ihar Hrachyshka
b1a9b4cfa8
chore: Expand mypy exclusions list (#1543)
# What does this PR do?

Expand the mypy exclude list.

It will be easier to enable typing checks for specific modules if we
have an explicit list of violators that we can reduce over time, item by
item.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

pre-commit passes.

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-12 09:53:04 -07:00
Xi Yan
1d80ec7f81 upgrade doc 2025-03-12 00:17:58 -07:00
Xi Yan
0abedd070c comment 2025-03-12 00:13:27 -07:00
ehhuang
59dddafd12
feat: convert typehints from client_tool to litellm format (#1565)
Summary:
supports
https://github.com/meta-llama/llama-stack-client-python/pull/193

Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
2025-03-11 20:02:11 -07:00
Xi Yan
817331e76e precommit 2025-03-11 18:34:38 -07:00
Xi Yan
0e47c65051 update 2025-03-11 18:29:55 -07:00
Xi Yan
02aa9a1e85 remove json_schema_type decorator 2025-03-11 16:08:06 -07:00
Xi Yan
0e8a53ab69 openapi 2025-03-11 15:03:48 -07:00
Xi Yan
8592c2b48a precommit 2025-03-11 14:56:12 -07:00
Xi Yan
bc551e6459 datasets api
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
2025-03-11 14:44:49 -07:00
LESSuseLESS
2370e826bc
test: adding an e2e test for measuring TTFT (#1568)
# What does this PR do?

TTFT number largely depends on input length. Ideally we have a
"standard" test that we can use to measure against any llama stack
serving.

TODO: Once JSON is replaced with YAML, I will add "notes" for each test
to explain purpose of each test in place.

## Test plan

Please refer to e2e test doc for setup.
```
LLAMA_STACK_PORT=8322 pytest -v -s --stack-config="http://localhost:8322" \
--text-model="meta-llama/Llama-3.2-3B-Instruct" \
tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling
```
2025-03-11 14:41:55 -07:00
Josh Salomon
5f90be5388
fix: Fixed bad file name in inline::localfs (#1358)
Bug https://github.com/meta-llama/llama-stack/issues/1357

# What does this PR do?
Fix a bug of a wrong file name in inline::localfs datasetio provider

[//]: # (If resolving an issue, uncomment and update the line below)
# (Closes #1357)

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Josh Salomon <jsalomon@redhat.com>
2025-03-11 12:46:11 -07:00
Xi Yan
43044f29e2
fix: fix llama stack run with missing agent impl (#1559)
# What does this PR do?

- recent merge https://github.com/meta-llama/llama-stack/pull/1410
introduce error
```
ValueError: Provider meta-reference (Api.agents) does not implement the following methods:
[('list_agent_sessions', 'not_actually_implemented'), ('list_agents', 'not_actually_implemented')]
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
llama stack run
```

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct
```

1379530386

[//]: # (## Documentation)
2025-03-11 11:22:22 -07:00
Dinesh Yeduguru
85501ed875
fix: remove Llama-3.2-1B-Instruct for fireworks (#1558)
# What does this PR do?
remove Llama-3.2-1B-Instruct for fireworks as its no longer appears to
be hosted on website.


## Test Plan

python distro_codegen.py
2025-03-11 11:19:29 -07:00
Nathan Weinberg
275bab1373
test: loosen Python 3.10 version for unit tests (#1547)
# What does this PR do?
as I brought up in #1515 it shouldn't be nessessary to tie the unit test
runner to an exact z-stream of Python 3.10

updated so unit test runner always uses latest z-stream of Python 3.10

## Test Plan
```shell
$ uv run -p 3.10 --with-editable . --with-editable ".[dev]" --with-editable ".[unit]" pytest --cov=llama_stack -s -v tests/unit/ --junitxml=pytest-report.xml
```

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-03-11 11:11:32 -07:00
Charlie Doern
b647ecd9ed
feat: add support for LLAMA_STACK_LOG_FILE (#1450)
# What does this PR do?

setting $LLAMA_STACK_LOG_FILE will pipe the logs to a file as well as
stdout. this is done by using a logging FileHandler

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-11 11:09:31 -07:00
Sébastien Han
83a2c78615
feat(api): list agents / sessions and get agent (#1410)
# What does this PR do?

Add support for listing agents, describing an agent, and retrieving
session IDs for a given agent. This is only the API definition, the
implementations will come separately.

Closes: https://github.com/meta-llama/llama-stack/issues/1294

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-11 10:33:46 -07:00
Ihar Hrachyshka
aca82df7ed
fix: Multiple fixes for server shutdown (fix lifespan handling; fix handling CancelledError when raised by provider; let uvicorn handle signals) (#1495)
# What does this PR do?

If implementation raises CancelledError (e.g. when it runs its own async
loop for jobs), the main server shutdown handler gets confused and
doesn't attempt to shut down the main loop tasks.

While at it, also fixing the following failure when this happens:

```
UnboundLocalError: cannot access local variable 'loop' where it is not
associated with a value
```

Shutdown handlers were not running because lifespan logic was broken
since ~Oct 2024. Fixed that too and enforcing `lifespan` now (making
sure server will crash when it fails to interact with app through
middleware).

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Spotted while working on
https://github.com/meta-llama/llama-stack/pull/1437

One way to trigger it without the PR above is to add `raise
CancelledError` in
any of the running providers' `shutdown` methods; then `kill -INT <pid>`
the
server process.

Validated this with the following test patch:

```
diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py
index b85c463a..10dad83e 100644
--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -174,6 +174,7 @@ def handle_signal(app, signum, _) -> None:
         except asyncio.CancelledError:
             pass
         finally:
+            logger.info("Stopping event loop")
             loop.stop()
 
     loop = asyncio.get_running_loop()
diff --git a/llama_stack/providers/inline/post_training/torchtune/post_training.py b/llama_stack/providers/inline/post_training/torchtune/post_training.py
index b837362d..163f43d8 100644
--- a/llama_stack/providers/inline/post_training/torchtune/post_training.py
+++ b/llama_stack/providers/inline/post_training/torchtune/post_training.py
@@ -3,6 +3,7 @@
 #
 # This source code is licensed under the terms described in the LICENSE file in
 # the root directory of this source tree.
+import asyncio
 from datetime import datetime
 from typing import Any, Dict, Optional
 
@@ -43,6 +44,9 @@ class TorchtunePostTrainingImpl:
         self.jobs = {}
         self.checkpoints_dict = {}
 
+    async def shutdown(self) -> None:
+        raise asyncio.CancelledError("Shutdown")
+
     async def supervised_fine_tune(
         self,
         job_uuid: str,
```

Without the fix:

```
INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Finished server process [52099]
INFO     2025-03-07 23:25:33,548 __main__:143 server: Received signal SIGINT (2). Exiting gracefully...
INFO     2025-03-07 23:25:33,550 __main__:150 server: Shutting down DatasetsRoutingTable
INFO     2025-03-07 23:25:33,551 __main__:177 server: Stopping event loop
ERROR    2025-03-07 23:25:33,552 asyncio:1785 uncategorized: unhandled exception during asyncio.run() shutdown
         task: <Task finished name='Task-12' coro=<handle_signal.<locals>.shutdown() done, defined at
         /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:145>
         exception=UnboundLocalError("cannot access local variable 'loop' where it is not associated with a value")>
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:178 in shutdown           │
         │                                                                                                             │
         │   175 │   │   │   pass                                                                                      │
         │   176 │   │   finally:                                                                                      │
         │   177 │   │   │   logger.info("Stopping event loop")                                                        │
         │ ❱ 178 │   │   │   loop.stop()                                                                               │
         │   179 │                                                                                                     │
         │   180 │   loop = asyncio.get_running_loop()                                                                 │
         │   181 │   loop.create_task(shutdown())                                                                      │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         UnboundLocalError: cannot access local variable 'loop' where it is not associated with a value

```

With the fix, now seeing the following messages when the server is
killed:

```
INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Finished server process [50836]
INFO     2025-03-07 23:20:35,182 __main__:143 server: Received signal SIGINT (2). Exiting gracefully...
INFO     2025-03-07 23:20:35,184 __main__:149 server: Shutting down DatasetsRoutingTable
ERROR    2025-03-07 23:20:35,185 __main__:158 server: Failed to shutdown DatasetsRoutingTable: {CancelledError()}
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /usr/lib64/python3.11/asyncio/tasks.py:476 in wait_for                                                      │
         │                                                                                                             │
         │   473 │   try:                                                                                              │
         │   474 │   │   # wait until the future completes or the timeout                                              │
         │   475 │   │   try:                                                                                          │
         │ ❱ 476 │   │   │   await waiter                                                                              │
         │   477 │   │   except exceptions.CancelledError:                                                             │
         │   478 │   │   │   if fut.done():                                                                            │
         │   479 │   │   │   │   return fut.result()                                                                   │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         CancelledError

         During handling of the above exception, another exception occurred:

         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown           │
         │                                                                                                             │
         │   149 │   │   │   logger.info("Shutting down %s", impl_name)                                                │
         │   150 │   │   │   try:                                                                                      │
         │   151 │   │   │   │   if hasattr(impl, "shutdown"):                                                         │
         │ ❱ 152 │   │   │   │   │   await asyncio.wait_for(impl.shutdown(), timeout=5)                                │
         │   153 │   │   │   │   else:                                                                                 │
         │   154 │   │   │   │   │   logger.warning("No shutdown method for %s", impl_name)                            │
         │   155 │   │   │   except asyncio.TimeoutError:                                                              │
         │                                                                                                             │
         │ /usr/lib64/python3.11/asyncio/tasks.py:479 in wait_for                                                      │
         │                                                                                                             │
         │   476 │   │   │   await waiter                                                                              │
         │   477 │   │   except exceptions.CancelledError:                                                             │
         │   478 │   │   │   if fut.done():                                                                            │
         │ ❱ 479 │   │   │   │   return fut.result()                                                                   │
         │   480 │   │   │   else:                                                                                     │
         │   481 │   │   │   │   fut.remove_done_callback(cb)                                                          │
         │   482 │   │   │   │   # We must ensure that the task is not running                                         │
         │                                                                                                             │
         │ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/routers/routing_tables.py:131 in shutdown  │
         │                                                                                                             │
         │   128 │   │   │   elif api == Api.tool_runtime:                                                             │
         │   129 │   │   │   │   p.tool_store = self                                                                   │
         │   130 │                                                                                                     │
         │ ❱ 131 │   async def shutdown(self) -> None:                                                                 │
         │   132 │   │   for p in self.impls_by_provider_id.values():                                                  │
         │   133 │   │   │   await p.shutdown()                                                                        │
         │   134                                                                                                       │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         CancelledError
INFO     2025-03-07 23:20:35,295 __main__:149 server: Shutting down DatasetIORouter
INFO     2025-03-07 23:20:35,296 __main__:149 server: Shutting down ScoringFunctionsRoutingTable
INFO     2025-03-07 23:20:35,297 __main__:149 server: Shutting down ScoringRouter
INFO     2025-03-07 23:20:35,298 __main__:149 server: Shutting down ModelsRoutingTable
INFO     2025-03-07 23:20:35,299 __main__:149 server: Shutting down InferenceRouter
INFO     2025-03-07 23:20:35,300 __main__:149 server: Shutting down ShieldsRoutingTable
INFO     2025-03-07 23:20:35,300 __main__:149 server: Shutting down SafetyRouter
INFO     2025-03-07 23:20:35,301 __main__:149 server: Shutting down VectorDBsRoutingTable
INFO     2025-03-07 23:20:35,302 __main__:149 server: Shutting down VectorIORouter
INFO     2025-03-07 23:20:35,303 __main__:149 server: Shutting down ToolGroupsRoutingTable
INFO     2025-03-07 23:20:35,304 __main__:149 server: Shutting down ToolRuntimeRouter
INFO     2025-03-07 23:20:35,304 __main__:149 server: Shutting down MetaReferenceAgentsImpl
INFO     2025-03-07 23:20:35,305 __main__:149 server: Shutting down TelemetryAdapter
INFO     2025-03-07 23:20:35,306 __main__:149 server: Shutting down TorchtunePostTrainingImpl
ERROR    2025-03-07 23:20:35,307 __main__:158 server: Failed to shutdown TorchtunePostTrainingImpl:
         {CancelledError('Shutdown')}
         ╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
         │ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown           │
         │                                                                                                             │
         │   149 │   │   │   logger.info("Shutting down %s", impl_name)                                                │
         │   150 │   │   │   try:                                                                                      │
         │   151 │   │   │   │   if hasattr(impl, "shutdown"):                                                         │
         │ ❱ 152 │   │   │   │   │   await asyncio.wait_for(impl.shutdown(), timeout=5)                                │
         │   153 │   │   │   │   else:                                                                                 │
         │   154 │   │   │   │   │   logger.warning("No shutdown method for %s", impl_name)                            │
         │   155 │   │   │   except asyncio.TimeoutError:                                                              │
         │                                                                                                             │
         │ /usr/lib64/python3.11/asyncio/tasks.py:489 in wait_for                                                      │
         │                                                                                                             │
         │   486 │   │   │   │   raise                                                                                 │
         │   487 │   │                                                                                                 │
         │   488 │   │   if fut.done():                                                                                │
         │ ❱ 489 │   │   │   return fut.result()                                                                       │
         │   490 │   │   else:                                                                                         │
         │   491 │   │   │   fut.remove_done_callback(cb)                                                              │
         │   492 │   │   │   # We must ensure that the task is not running                                             │
         │                                                                                                             │
         │ /home/ec2-user/src/llama-stack/schedule/llama_stack/providers/inline/post_training/torchtune/post_training. │
         │ py:48 in shutdown                                                                                           │
         │                                                                                                             │
         │    45 │   │   self.checkpoints_dict = {}                                                                    │
         │    46 │                                                                                                     │
         │    47 │   async def shutdown(self) -> None:                                                                 │
         │ ❱  48 │   │   raise asyncio.CancelledError("Shutdown")                                                      │
         │    49 │                                                                                                     │
         │    50 │   async def supervised_fine_tune(                                                                   │
         │    51 │   │   self,                                                                                         │
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
         CancelledError: Shutdown
INFO     2025-03-07 23:20:35,352 __main__:149 server: Shutting down BenchmarksRoutingTable
INFO     2025-03-07 23:20:35,353 __main__:149 server: Shutting down EvalRouter
INFO     2025-03-07 23:20:35,354 __main__:149 server: Shutting down DistributionInspectImpl
INFO     2025-03-07 23:20:35,355 __main__:177 server: Stopping event loop
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 488, in <module>
    main()
  File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 476, in main
    uvicorn.run(**uvicorn_config)
  File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/main.py", line 579, in run
    server.run()
  File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/server.py", line 66, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/asyncio/runners.py", line 189, in run
    with Runner(debug=debug) as runner:
  File "/usr/lib64/python3.11/asyncio/runners.py", line 63, in __exit__
    self.close()
  File "/usr/lib64/python3.11/asyncio/runners.py", line 71, in close
    _cancel_all_tasks(loop)
  File "/usr/lib64/python3.11/asyncio/runners.py", line 201, in _cancel_all_tasks
    loop.run_until_complete(tasks.gather(*to_cancel, return_exceptions=True))
  File "/usr/lib64/python3.11/asyncio/base_events.py", line 652, in run_until_complete
    raise RuntimeError('Event loop stopped before Future completed.')
RuntimeError: Event loop stopped before Future completed.
++ error_handler 104
++ echo 'Error occurred in script at line: 104'
Error occurred in script at line: 104
++ exit 1
```

With all patches included, the shutdown now looks as follows:

```
$ kill -INT $(ps ax | grep  llama_stack.distribution.server.server | grep -v nvim | awk -e '{print $1}' | sort | head -n 1)
```

```
20:56:09.308 [START]
INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO     2025-03-10 20:56:43,961 __main__:140 server: Shutting down
INFO     2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable
INFO     2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter
INFO     2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO     2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter
INFO     2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable
INFO     2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter
INFO     2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable
INFO     2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter
INFO     2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO     2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter
INFO     2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO     2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter
INFO     2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO     2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter
INFO     2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl
WARNING  2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl
INFO     2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO     2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter
INFO     2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl
INFO:     Application shutdown complete.
INFO:     Finished server process [33862]
```

[//]: # (## Documentation)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-11 10:30:55 -07:00
Kelly Brown
d33b8ea3dc
docs: Small nits in llama CLI reference (#1542)
**Description:** Fixes some small nits in the llama CLI reference
Note: There are a few nits in this PR, but also has some small
suggestions, feel free to close if not necessary
2025-03-11 10:12:18 -07:00
Ihar Hrachyshka
c3d7d17bc4
chore: fix typing hints for get_provider_impl deps arguments (#1544)
# What does this PR do?

It's a dict that may contain different types, as per
resolver:instantiate_provider implementation. (AFAIU it also never
contains ProviderSpecs, but *instances* of provider implementations.)

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

mypy passing if enabled checks for these modules. (See #1543)

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-11 10:07:28 -07:00
Ihar Hrachyshka
04106b94aa
docs: Remove duplicate docs on api docs generator (#1534)
# What does this PR do?

Since #892, we also need to install ruamel. Instead of maintaining the
list of script dependencies in multiple places, remove it and assume
developers read CONTRIBUTING.md docs.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Just docs.

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-11 10:01:46 -07:00
Ihar Hrachyshka
0e73186a11
fix: Add missing shutdown handler for TorchtunePostTrainingImpl (#1535)
# What does this PR do?

Added missing shutdown handler. (Currently empty.)

Without it, when server shuts down, it posts the following warning:

```
__main__:129 server: No shutdown method for TorchtunePostTrainingImpl
```

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>


[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

(The test plan assumes shutdown logic is fixed, see #1495)

Without the patch:

```
INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO     2025-03-10 20:56:43,961 __main__:140 server: Shutting down
INFO     2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable
INFO     2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter
INFO     2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO     2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter
INFO     2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable
INFO     2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter
INFO     2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable
INFO     2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter
INFO     2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO     2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter
INFO     2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO     2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter
INFO     2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO     2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter
INFO     2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl
WARNING  2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl
INFO     2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO     2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter
INFO     2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl
INFO:     Application shutdown complete.
INFO:     Finished server process [33862]
```

Run with the patch and observe no warning:

```
$ kill -INT $(ps ax | grep  llama_stack.distribution.server.server | grep -v nvim | awk -e '{print $1}' | sort | head -n 1)
```

```
INFO:     Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO     2025-03-11 00:32:56,863 __main__:140 server: Shutting down
INFO     2025-03-11 00:32:56,864 __main__:124 server: Shutting down DatasetsRoutingTable
INFO     2025-03-11 00:32:56,866 __main__:124 server: Shutting down DatasetIORouter
INFO     2025-03-11 00:32:56,867 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO     2025-03-11 00:32:56,868 __main__:124 server: Shutting down ScoringRouter
INFO     2025-03-11 00:32:56,869 __main__:124 server: Shutting down ModelsRoutingTable
INFO     2025-03-11 00:32:56,870 __main__:124 server: Shutting down InferenceRouter
INFO     2025-03-11 00:32:56,871 __main__:124 server: Shutting down ShieldsRoutingTable
INFO     2025-03-11 00:32:56,872 __main__:124 server: Shutting down SafetyRouter
INFO     2025-03-11 00:32:56,873 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO     2025-03-11 00:32:56,874 __main__:124 server: Shutting down VectorIORouter
INFO     2025-03-11 00:32:56,875 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO     2025-03-11 00:32:56,876 __main__:124 server: Shutting down ToolRuntimeRouter
INFO     2025-03-11 00:32:56,877 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO     2025-03-11 00:32:56,878 __main__:124 server: Shutting down TelemetryAdapter
INFO     2025-03-11 00:32:56,879 __main__:124 server: Shutting down TorchtunePostTrainingImpl
INFO     2025-03-11 00:32:56,880 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO     2025-03-11 00:32:56,881 __main__:124 server: Shutting down EvalRouter
INFO     2025-03-11 00:32:56,882 __main__:124 server: Shutting down DistributionInspectImpl

```

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-11 10:01:09 -07:00
Ashwin Bharambe
e13c92f269
revert: feat(server): Use system packages for execution (#1551)
Reverts meta-llama/llama-stack#1252

The above PR breaks the following invocation:
```bash
llama stack run ~/.llama/distributions/together/together-run.yaml
```
2025-03-11 09:58:25 -07:00
Dinesh Yeduguru
ead9397e22
fix: tracing fixes for trace context propogation across coroutines (#1522)
# What does this PR do?
This PR has two fixes needed for correct trace context propagation
across asycnio boundary
Fix 1: Start using context vars to store the global trace context.
This is needed since we cannot use the same trace context across
coroutines since the state is shared. each coroutine
should have its own trace context so that each of it can start storing
its state correctly.
Fix 2: Start a new span for each new coroutines started for running
shields to keep the span tree clean


## Test Plan

### Integration tests with server
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run
~/.llama/distributions/together/together-run.yaml
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
server logs:
https://gist.github.com/dineshyv/51ac5d9864ed031d0d89ce77352821fe
test logs:
https://gist.github.com/dineshyv/e66acc1c4648a42f1854600609c467f3
 
### Integration tests with library client
LLAMA_STACK_CONFIG=fireworks pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct

logs: https://gist.github.com/dineshyv/ca160696a0b167223378673fb1dcefb8

### Apps test with server:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```
server logs:
https://gist.github.com/dineshyv/1717a572d8f7c14279c36123b79c5797
app logs:
https://gist.github.com/dineshyv/44167e9f57806a0ba3b710c32aec02f8
2025-03-11 07:12:48 -07:00
Botao Chen
e3edca7739
feat: [new open benchmark] Math 500 (#1538)
## What does this PR do?
Created a new math_500 open-benchmark based on OpenAI's [Let's Verify
Step by Step](https://arxiv.org/abs/2305.20050) paper and hugging face's
[HuggingFaceH4/MATH-500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
dataset.

The challenge part of this benchmark is to parse the generated and
expected answer and verify if they are same. For the parsing part, we
refer to [Minerva: Solving Quantitative Reasoning Problems with Language
Models](https://research.google/blog/minerva-solving-quantitative-reasoning-problems-with-language-models/).

To simply the parse logic, as the next step, we plan to also refer to
what [simple-eval](https://github.com/openai/simple-evals) is doing,
using llm as judge to check if the generated answer matches the expected
answer or not


## Test Plan
on sever side, spin up a server with open-benchmark template `llama
stack run llama_stack/templates/open-benchamrk/run.yaml`

on client side, issue an open benchmark eval request `llama-stack-client
--endpoint xxx eval run-benchmark "meta-reference-math-500" --model-id
"meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/"
--num-examples 20` and get ther aggregated eval results
<img width="238" alt="Screenshot 2025-03-10 at 7 57 04 PM"
src="https://github.com/user-attachments/assets/2c9da042-3b70-470e-a7c4-69f4cc24d1fb"
/>

check the generated answer and the related scoring and they make sense
2025-03-10 20:38:28 -07:00
Courtney Pacheco
ff853ccc38
fix: Use --with-editable to capture accurate code coverage reporting (#1532)
# What does this PR do?
I created a PR earlier today, but I realized the code coverage reporting
isn't correct: #1512

Essentially, we need to use `--with-editable` to enable develop/editable
mode through `uv`. Using editable mode will create a package.egg-link
file, and that allows pytest to accurately capture code coverage.

Before, some files had "0%" or "100%" coverage, which isn't accurate:

<img width="1455" alt="Screenshot 2025-03-10 at 10 01 53 AM"
src="https://github.com/user-attachments/assets/c425515a-9ecd-4962-a2d4-18cd16d12f25"
/>

More info on `--with-editable`:
https://docs.astral.sh/uv/reference/cli/#uv-run--with-editable

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
Tested locally

<img width="775" alt="Screenshot 2025-03-10 at 7 00 14 PM"
src="https://github.com/user-attachments/assets/31141318-5cf6-4666-8676-b5d8c8d2e719"
/>

Screenshot from CI:

<img width="1000" alt="Screenshot 2025-03-10 at 7 07 57 PM"
src="https://github.com/user-attachments/assets/47092909-ff8d-4e97-80dc-2a16d948405a"
/>

[//]: # (## Documentation)

Signed-off-by: Courtney Pacheco <6019922+courtneypacheco@users.noreply.github.com>
2025-03-10 19:30:28 -04:00
Ashwin Bharambe
dc84bc755a
fix: revert to using faiss for ollama distro (#1530)
This is unfortunate because `sqlite-vec` seems promising. But its PIP
package is not quite complete. It does not have binary for arm64 (I
think, or maybe it even lacks 64 bit builds?) which results in the arm64
container resulting in
```
File "/usr/local/lib/python3.10/site-packages/sqlite_vec/init.py", line 17, in load
    conn.load_extension(loadable_path())
sqlite3.OperationalError: /usr/local/lib/python3.10/site-packages/sqlite_vec/vec0.so: wrong ELF class: ELFCLASS32
```

To get around I tried to install from source via `uv pip install
sqlite-vec --no-binary=sqlite-vec` however it even lacks a source
distribution which makes that impossible.

## Test Plan

Build the container locally using: 

```bash
LLAMA_STACK_DIR=. llama stack build --template ollama --image-type container
```

Run the container as: 

```
podman run --privileged -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
   -v ~/.llama:/root/.llama \
    --env INFERENCE_MODEL=$INFERENCE_MODEL \
    --env OLLAMA_URL=http://host.containers.internal:11434 \
    -v ~/local/llama-stack:/app/llama-stack-source 
    localhost/distribution-ollama:dev --port $LLAMA_STACK_PORT
```

Verify the container starts up correctly. Without this patch, it would
encounter the ELFCLASS32 error.
2025-03-10 16:15:17 -07:00
Sébastien Han
21e39633d8
feat(server): Use system packages for execution (#1252)
# What does this PR do?

Users prefer to rely on the main CLI rather than invoking the server
through a Python module. Users interact with a high-level CLI rather
than needing to know internal module structures.

Now, when running llama stack run <path-to-config>, the server will
attempt to use the system package or a virtual environment if one is
active.

This also eliminates the current process dependency chain when running
from a virtual environment:

-> llama stack run
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> start_env.sh

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
-> python -m server...

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run:

```
ollama run llama3.2:3b-instruct-fp16 --keepalive=2m &
llama stack run ./llama_stack/templates/ollama/run.yaml --disable-ipv6
```

Notice that the server starts and shutdowns normally.

[//]: # (## Documentation)

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-10 16:01:03 -07:00
Reid
feacf89548
docs: improve integration test doc (#1502)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

It should use `export` for env var for api key.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-10 15:50:46 -07:00
Sébastien Han
91b1b92908
build: revamp "test" dependencies from pyproject (#1468)
# What does this PR do?

The `test` section has been updated to include only the essential
dependencies needed for running integration tests, which are shared
across all providers. If a provider requires additional dependencies,
please add them to your environment separately. When using uv to
run your tests, you can specify extra dependencies with the
`--with` flag.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-10 15:43:16 -07:00
Sébastien Han
201a7567ef
test: add inspect unit test (#1417)
# What does this PR do?

Add unit tests for the inspect endpoint.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

$ ollama run llama3.2:3b-instruct-fp16 --keepalive=60m &
$ LLAMA_STACK_CONFIG=./llama_stack/templates/ollama/run.yaml uv run
pytest -v -s tests/integration/inspect/test_inspect.py

/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207:
PytestDeprecationWarning: The configuration option
"asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the
fixture caching scope. Future versions of pytest-asyncio will default
the loop scope for asynchronous fixtures to function scope. Set the
default fixture loop scope explicitly in order to avoid unexpected
behavior in the future. Valid fixture loop scopes are: "function",
"class", "module", "package", "session"


warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================== test session starts
==============================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 --
/Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform':
'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4',
'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1',
'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0,
nbval-0.11.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items


tests/integration/inspect/test_inspect.py::TestInspect::test_health[txt=8B]
PASSED

tests/integration/inspect/test_inspect.py::TestInspect::test_version[txt=8B]
PASSED

========================================= 2 passed, 3 warnings in 2.26s
===================================
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-10 15:36:18 -07:00
Charlie Doern
7559b4055e
chore: add color to Env Variable message (#1525)
# What does this PR do?

currently the `"Environment variable LLAMA_STACK_LOGGING found"` message
is printed with no color switch to cprint and highlight in yellow for
visibility

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-03-10 15:29:40 -07:00
Ihar Hrachyshka
a64021bb47
fix: Disable async loop warning messages during test run (#1526)
# What does this PR do?

The test class by default enables debug mode, which produces some
unexpected warnings like:

```
tests/unit/models/test_prompt_adapter.py::PrepareMessagesTests::test_completion_message_encoding
WARNING  2025-03-10 20:41:48,577 asyncio:1904 uncategorized: Executing <Task pending name='Task-1'
  coro=<IsolatedAsyncioTestCase._asyncioLoopRunner() running at
  /home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/unittest/async_case.py:95
  > wait_for=<Future pending cb=[Task.task_wakeup()] created at
  /home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/base_events.py:42
  9> created at
  /home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/unittest/async_case.py:11
  7> took 0.231 seconds
PASSED
```

I suggest we disable these since they are not very useful and can
confuse other developers.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run tests. The warnings are no longer seen.

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-10 15:29:08 -07:00
ehhuang
0e3c0cf8de
fix: server logging (#1521)
Summary:

Test Plan:

ERROR 2025-03-10 10:53:00,804 __main__:239 server: Error executing
endpoint route='/v1/inference/chat-completion'
         method='post'
2025-03-10 15:25:23 -07:00
Sarthak Deshpande
921f8b1125
chore: Together async client (#1510)
# What does this PR do?
Uses together async client instead of sync client

[//]: # (If resolving an issue, uncomment and update the line below)

## Test Plan
Command to run the test is in the image below(2 tests fail, and they
were failing for the old stable version as well with the same errors.)
<img width="1689" alt="image"
src="https://github.com/user-attachments/assets/503db720-5379-425d-9844-0225010e41a1"
/>


[//]: # (## Documentation)

---------

Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
2025-03-10 15:25:01 -07:00