# What does this PR do?
the current `.python-version` file forces `uv` to
setup the development environment with Python 3.10
this causes an error if a dev system does not have
Python 3.10, even though the project officially
supports newer versions of Python as well
since `uv` can use the `pyproject.toml` to determine
python versions, we can safely remove this file from
the repo and subsequent git tracking
follows up on https://github.com/meta-llama/llama-stack/pull/1172
## Test Plan
N/A
---------
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Summary:
This is not used anywhere.
closes#1421
Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct --record-responses
Summary:
1. adds option to not use bwrap for code execution
2. disable bwrap when running tests on macs
Test Plan:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct
```
Verify code_interpreter result in logs
INFO 2025-03-11 08:10:39,858
llama_stack.providers.inline.agents.meta_reference.agent_instance:1032
agents: tool
call code_interpreter completed with result:
content='completed\n\n541\n' error_message=None error_code=None
metadata=None
# What does this PR do?
additional artifacts make test results more human-readable
## Test Plan
Ran locally
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Summary:
Refactoring only.
Centralize logic to preprocess toolgroup to one place.
Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/api/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1381).
* #1384
* __->__ #1381
# What does this PR do?
This change adds a compact type to include metrics in response as
opposed to the full MetricEvent which is relevant for internal logging
purposes.
## Test Plan
```
LLAMA_STACK_CONFIG=~/.llama/distributions/fireworks/fireworks-run.yaml pytest -s -v agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct
llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
curl --request POST \
--url http://localhost:8321/v1/inference/chat-completion \
--header 'content-type: application/json' \
--data '{
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "where do humans live"
}
}
],
"stream": false
}'
{
"metrics": [
{
"metric": "prompt_tokens",
"value": 10,
"unit": null
},
{
"metric": "completion_tokens",
"value": 522,
"unit": null
},
{
"metric": "total_tokens",
"value": 532,
"unit": null
}
],
"completion_message": {
"role": "assistant",
"content": "Humans live in various parts of the world...............",
"stop_reason": "out_of_tokens",
"tool_calls": []
},
"logprobs": null
}
```
# What does this PR do?
This PR adds a simple unit test badge to the project README
It also modifies the workflow to run on merges to main, so that the
status reflected in the README is that of main and not pull request
branches
---------
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
This fixes the build error
## Test Plan
pre-commit run --all-files
check for merge
conflicts................................................Passed
trim trailing
whitespace.................................................Passed
check for added large
files..............................................Passed
fix end of
files.........................................................Passed
Insert license in
comments...............................................Passed
ruff.....................................................................Passed
ruff-format..............................................................Passed
blacken-docs.............................................................Passed
uv-lock..................................................................Passed
uv-export................................................................Passed
mypy.....................................................................Passed
Distribution Template
Codegen............................................Passed
# What does this PR do?
This PR adds back the changes in #1300 which were reverted in #1476 .
It also adds logic to preserve context variables across asyncio
boundary. this is needed with the library client since the async
generator logic yields control to code outside the event loop, and on
resuming, does not have the same context as before and this requires
preserving the context vars.
address #1477
## Test Plan
```
curl --request POST \
--url http://localhost:8321/v1/inference/chat-completion \
--header 'content-type: application/json' \
--data '{
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "where do humans live"
}
}
],
"stream": false
}' | jq .
{
"metrics": [
{
"trace_id": "kCZwO3tyQC-FuAGb",
"span_id": "bsP_5a5O",
"timestamp": "2025-03-11T16:47:38.549084Z",
"attributes": {
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"provider_id": "fireworks"
},
"type": "metric",
"metric": "prompt_tokens",
"value": 10,
"unit": "tokens"
},
{
"trace_id": "kCZwO3tyQC-FuAGb",
"span_id": "bsP_5a5O",
"timestamp": "2025-03-11T16:47:38.549449Z",
"attributes": {
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"provider_id": "fireworks"
},
"type": "metric",
"metric": "completion_tokens",
"value": 369,
"unit": "tokens"
},
{
"trace_id": "kCZwO3tyQC-FuAGb",
"span_id": "bsP_5a5O",
"timestamp": "2025-03-11T16:47:38.549457Z",
"attributes": {
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"provider_id": "fireworks"
},
"type": "metric",
"metric": "total_tokens",
"value": 379,
"unit": "tokens"
}
],
"completion_message": {
"role": "assistant",
"content": "Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. **Continents:** Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. **Countries:** There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. **Cities and towns:** Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. **Rural areas:** Some humans live in rural areas, such as villages, farms, and countryside.\n5. **Islands:** Humans inhabit many islands around the world, including those in the Pacific, Indian, and Atlantic Oceans.\n6. **Mountains and highlands:** Humans live in mountainous regions, such as the Himalayas, the Andes, and the Rocky Mountains.\n7. **Deserts:** Some humans live in desert regions, such as the Sahara, the Mojave, and the Atacama.\n8. **Coastal areas:** Many humans live in coastal areas, such as beaches, ports, and coastal cities.\n9. **Underwater habitats:** A few humans live in underwater habitats, such as research stations and submarines.\n10. **Space:** A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nOverall, humans can be found living in almost every environment on Earth, from the frozen tundra to the hottest deserts, and from the highest mountains to the deepest oceans.",
"stop_reason": "end_of_turn",
"tool_calls": []
},
"logprobs": null
}
```
Orignal repro no longer showing any error:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```
client logs:
https://gist.github.com/dineshyv/047c7e87b18a5792aa660e311ea53166
server logs:
https://gist.github.com/dineshyv/97a2174099619e9916c7c490be26e559
# What does this PR do?
- fix precommit
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
CI
[//]: # (## Documentation)
## What does this PR do?
We noticed that the passthrough inference provider doesn't work agent
due to the type mis-match between client and server. We manually cast
the llama stack client type to llama stack server type to fix the issue.
## test
run `python -m examples.agents.hello localhost 8321` within
llama-stack-apps
<img width="1073" alt="Screenshot 2025-03-11 at 8 43 44 PM"
src="https://github.com/user-attachments/assets/bd1bdd31-606a-420c-a249-95f6184cc0b1"
/>
fix https://github.com/meta-llama/llama-stack/issues/1560
## What does this PR do?
As title, add codegen for open-benchmark template
## test
checked the new generated run.yaml file and it's identical before and
after the change
Also add small improvement to together template so that missing
TOGETHER_API_KEY won't crash the server which is the consistent user
experience as other remote providers
# What does this PR do?
uvicorn has a `log_level` arg in uvicorn.run, pass in the effective
level set by the logger.
Additionally, third party libraries like httpx are using our logging
format, but not honoring our log level.
This seems unintended, so loop through all items in the loggerDict and
apply the same log level as what we have set.
## Test Plan
before:
```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Environment variable LLAMA_STACK_LOGGING found: all=warn
Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Environment variable LLAMA_STACK_LOGGING found: all=warn
WARNING 2025-03-10 16:05:49,706 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will
not work correctly.
INFO 2025-03-10 16:05:49,916 datasets:54 uncategorized: PyTorch version 2.5.1 available.
INFO 2025-03-10 16:05:50,010 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200
OK"
INFO 2025-03-10 16:05:50,297 httpx:1740 uncategorized: HTTP Request: POST http://localhost:11434/api/pull "HTTP/1.1
200 OK"
INFO 2025-03-10 16:05:50,314 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/tags "HTTP/1.1
200 OK"
INFO: Started server process [89663]
INFO: Waiting for application startup.
INFO: ASGI 'lifespan' protocol appears unsupported.
INFO: Application startup complete.
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```
after:
```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Environment variable LLAMA_STACK_LOGGING found: all=warn
Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Environment variable LLAMA_STACK_LOGGING found: all=warn
WARNING 2025-03-10 16:05:20,429 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will
not work correctly.
INFO 2025-03-10 16:05:20,639 datasets:54 uncategorized: PyTorch version 2.5.1 available.
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
python unit tests running via GitHub Actions were only running with
python 3.10
the project supports all python versions greater than or equal to 3.10
this commit adds 3.11, 3.12, and 3.13 to the test matrix for better
coverage and confidence for non-3.10 users
## Test Plan
All tests pass locally with python 3.11, 3.12, and 3.13
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
Expand the mypy exclude list.
It will be easier to enable typing checks for specific modules if we
have an explicit list of violators that we can reduce over time, item by
item.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
pre-commit passes.
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
TTFT number largely depends on input length. Ideally we have a
"standard" test that we can use to measure against any llama stack
serving.
TODO: Once JSON is replaced with YAML, I will add "notes" for each test
to explain purpose of each test in place.
## Test plan
Please refer to e2e test doc for setup.
```
LLAMA_STACK_PORT=8322 pytest -v -s --stack-config="http://localhost:8322" \
--text-model="meta-llama/Llama-3.2-3B-Instruct" \
tests/integration/inference/test_text_inference.py::test_text_chat_completion_first_token_profiling
```
Bug https://github.com/meta-llama/llama-stack/issues/1357
# What does this PR do?
Fix a bug of a wrong file name in inline::localfs datasetio provider
[//]: # (If resolving an issue, uncomment and update the line below)
# (Closes#1357)
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Josh Salomon <jsalomon@redhat.com>
# What does this PR do?
- recent merge https://github.com/meta-llama/llama-stack/pull/1410
introduce error
```
ValueError: Provider meta-reference (Api.agents) does not implement the following methods:
[('list_agent_sessions', 'not_actually_implemented'), ('list_agents', 'not_actually_implemented')]
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
llama stack run
```
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct
```
1379530386
[//]: # (## Documentation)
# What does this PR do?
remove Llama-3.2-1B-Instruct for fireworks as its no longer appears to
be hosted on website.
## Test Plan
python distro_codegen.py
# What does this PR do?
as I brought up in #1515 it shouldn't be nessessary to tie the unit test
runner to an exact z-stream of Python 3.10
updated so unit test runner always uses latest z-stream of Python 3.10
## Test Plan
```shell
$ uv run -p 3.10 --with-editable . --with-editable ".[dev]" --with-editable ".[unit]" pytest --cov=llama_stack -s -v tests/unit/ --junitxml=pytest-report.xml
```
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
setting $LLAMA_STACK_LOG_FILE will pipe the logs to a file as well as
stdout. this is done by using a logging FileHandler
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
Add support for listing agents, describing an agent, and retrieving
session IDs for a given agent. This is only the API definition, the
implementations will come separately.
Closes: https://github.com/meta-llama/llama-stack/issues/1294
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
If implementation raises CancelledError (e.g. when it runs its own async
loop for jobs), the main server shutdown handler gets confused and
doesn't attempt to shut down the main loop tasks.
While at it, also fixing the following failure when this happens:
```
UnboundLocalError: cannot access local variable 'loop' where it is not
associated with a value
```
Shutdown handlers were not running because lifespan logic was broken
since ~Oct 2024. Fixed that too and enforcing `lifespan` now (making
sure server will crash when it fails to interact with app through
middleware).
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Spotted while working on
https://github.com/meta-llama/llama-stack/pull/1437
One way to trigger it without the PR above is to add `raise
CancelledError` in
any of the running providers' `shutdown` methods; then `kill -INT <pid>`
the
server process.
Validated this with the following test patch:
```
diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py
index b85c463a..10dad83e 100644
--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -174,6 +174,7 @@ def handle_signal(app, signum, _) -> None:
except asyncio.CancelledError:
pass
finally:
+ logger.info("Stopping event loop")
loop.stop()
loop = asyncio.get_running_loop()
diff --git a/llama_stack/providers/inline/post_training/torchtune/post_training.py b/llama_stack/providers/inline/post_training/torchtune/post_training.py
index b837362d..163f43d8 100644
--- a/llama_stack/providers/inline/post_training/torchtune/post_training.py
+++ b/llama_stack/providers/inline/post_training/torchtune/post_training.py
@@ -3,6 +3,7 @@
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
+import asyncio
from datetime import datetime
from typing import Any, Dict, Optional
@@ -43,6 +44,9 @@ class TorchtunePostTrainingImpl:
self.jobs = {}
self.checkpoints_dict = {}
+ async def shutdown(self) -> None:
+ raise asyncio.CancelledError("Shutdown")
+
async def supervised_fine_tune(
self,
job_uuid: str,
```
Without the fix:
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Finished server process [52099]
INFO 2025-03-07 23:25:33,548 __main__:143 server: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-03-07 23:25:33,550 __main__:150 server: Shutting down DatasetsRoutingTable
INFO 2025-03-07 23:25:33,551 __main__:177 server: Stopping event loop
ERROR 2025-03-07 23:25:33,552 asyncio:1785 uncategorized: unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-12' coro=<handle_signal.<locals>.shutdown() done, defined at
/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:145>
exception=UnboundLocalError("cannot access local variable 'loop' where it is not associated with a value")>
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:178 in shutdown │
│ │
│ 175 │ │ │ pass │
│ 176 │ │ finally: │
│ 177 │ │ │ logger.info("Stopping event loop") │
│ ❱ 178 │ │ │ loop.stop() │
│ 179 │ │
│ 180 │ loop = asyncio.get_running_loop() │
│ 181 │ loop.create_task(shutdown()) │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
UnboundLocalError: cannot access local variable 'loop' where it is not associated with a value
```
With the fix, now seeing the following messages when the server is
killed:
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Finished server process [50836]
INFO 2025-03-07 23:20:35,182 __main__:143 server: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-03-07 23:20:35,184 __main__:149 server: Shutting down DatasetsRoutingTable
ERROR 2025-03-07 23:20:35,185 __main__:158 server: Failed to shutdown DatasetsRoutingTable: {CancelledError()}
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /usr/lib64/python3.11/asyncio/tasks.py:476 in wait_for │
│ │
│ 473 │ try: │
│ 474 │ │ # wait until the future completes or the timeout │
│ 475 │ │ try: │
│ ❱ 476 │ │ │ await waiter │
│ 477 │ │ except exceptions.CancelledError: │
│ 478 │ │ │ if fut.done(): │
│ 479 │ │ │ │ return fut.result() │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CancelledError
During handling of the above exception, another exception occurred:
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown │
│ │
│ 149 │ │ │ logger.info("Shutting down %s", impl_name) │
│ 150 │ │ │ try: │
│ 151 │ │ │ │ if hasattr(impl, "shutdown"): │
│ ❱ 152 │ │ │ │ │ await asyncio.wait_for(impl.shutdown(), timeout=5) │
│ 153 │ │ │ │ else: │
│ 154 │ │ │ │ │ logger.warning("No shutdown method for %s", impl_name) │
│ 155 │ │ │ except asyncio.TimeoutError: │
│ │
│ /usr/lib64/python3.11/asyncio/tasks.py:479 in wait_for │
│ │
│ 476 │ │ │ await waiter │
│ 477 │ │ except exceptions.CancelledError: │
│ 478 │ │ │ if fut.done(): │
│ ❱ 479 │ │ │ │ return fut.result() │
│ 480 │ │ │ else: │
│ 481 │ │ │ │ fut.remove_done_callback(cb) │
│ 482 │ │ │ │ # We must ensure that the task is not running │
│ │
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/routers/routing_tables.py:131 in shutdown │
│ │
│ 128 │ │ │ elif api == Api.tool_runtime: │
│ 129 │ │ │ │ p.tool_store = self │
│ 130 │ │
│ ❱ 131 │ async def shutdown(self) -> None: │
│ 132 │ │ for p in self.impls_by_provider_id.values(): │
│ 133 │ │ │ await p.shutdown() │
│ 134 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CancelledError
INFO 2025-03-07 23:20:35,295 __main__:149 server: Shutting down DatasetIORouter
INFO 2025-03-07 23:20:35,296 __main__:149 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-07 23:20:35,297 __main__:149 server: Shutting down ScoringRouter
INFO 2025-03-07 23:20:35,298 __main__:149 server: Shutting down ModelsRoutingTable
INFO 2025-03-07 23:20:35,299 __main__:149 server: Shutting down InferenceRouter
INFO 2025-03-07 23:20:35,300 __main__:149 server: Shutting down ShieldsRoutingTable
INFO 2025-03-07 23:20:35,300 __main__:149 server: Shutting down SafetyRouter
INFO 2025-03-07 23:20:35,301 __main__:149 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-07 23:20:35,302 __main__:149 server: Shutting down VectorIORouter
INFO 2025-03-07 23:20:35,303 __main__:149 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-07 23:20:35,304 __main__:149 server: Shutting down ToolRuntimeRouter
INFO 2025-03-07 23:20:35,304 __main__:149 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-07 23:20:35,305 __main__:149 server: Shutting down TelemetryAdapter
INFO 2025-03-07 23:20:35,306 __main__:149 server: Shutting down TorchtunePostTrainingImpl
ERROR 2025-03-07 23:20:35,307 __main__:158 server: Failed to shutdown TorchtunePostTrainingImpl:
{CancelledError('Shutdown')}
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown │
│ │
│ 149 │ │ │ logger.info("Shutting down %s", impl_name) │
│ 150 │ │ │ try: │
│ 151 │ │ │ │ if hasattr(impl, "shutdown"): │
│ ❱ 152 │ │ │ │ │ await asyncio.wait_for(impl.shutdown(), timeout=5) │
│ 153 │ │ │ │ else: │
│ 154 │ │ │ │ │ logger.warning("No shutdown method for %s", impl_name) │
│ 155 │ │ │ except asyncio.TimeoutError: │
│ │
│ /usr/lib64/python3.11/asyncio/tasks.py:489 in wait_for │
│ │
│ 486 │ │ │ │ raise │
│ 487 │ │ │
│ 488 │ │ if fut.done(): │
│ ❱ 489 │ │ │ return fut.result() │
│ 490 │ │ else: │
│ 491 │ │ │ fut.remove_done_callback(cb) │
│ 492 │ │ │ # We must ensure that the task is not running │
│ │
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/providers/inline/post_training/torchtune/post_training. │
│ py:48 in shutdown │
│ │
│ 45 │ │ self.checkpoints_dict = {} │
│ 46 │ │
│ 47 │ async def shutdown(self) -> None: │
│ ❱ 48 │ │ raise asyncio.CancelledError("Shutdown") │
│ 49 │ │
│ 50 │ async def supervised_fine_tune( │
│ 51 │ │ self, │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CancelledError: Shutdown
INFO 2025-03-07 23:20:35,352 __main__:149 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-07 23:20:35,353 __main__:149 server: Shutting down EvalRouter
INFO 2025-03-07 23:20:35,354 __main__:149 server: Shutting down DistributionInspectImpl
INFO 2025-03-07 23:20:35,355 __main__:177 server: Stopping event loop
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 488, in <module>
main()
File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 476, in main
uvicorn.run(**uvicorn_config)
File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/main.py", line 579, in run
server.run()
File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/server.py", line 66, in run
return asyncio.run(self.serve(sockets=sockets))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/asyncio/runners.py", line 189, in run
with Runner(debug=debug) as runner:
File "/usr/lib64/python3.11/asyncio/runners.py", line 63, in __exit__
self.close()
File "/usr/lib64/python3.11/asyncio/runners.py", line 71, in close
_cancel_all_tasks(loop)
File "/usr/lib64/python3.11/asyncio/runners.py", line 201, in _cancel_all_tasks
loop.run_until_complete(tasks.gather(*to_cancel, return_exceptions=True))
File "/usr/lib64/python3.11/asyncio/base_events.py", line 652, in run_until_complete
raise RuntimeError('Event loop stopped before Future completed.')
RuntimeError: Event loop stopped before Future completed.
++ error_handler 104
++ echo 'Error occurred in script at line: 104'
Error occurred in script at line: 104
++ exit 1
```
With all patches included, the shutdown now looks as follows:
```
$ kill -INT $(ps ax | grep llama_stack.distribution.server.server | grep -v nvim | awk -e '{print $1}' | sort | head -n 1)
```
```
20:56:09.308 [START]
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO 2025-03-10 20:56:43,961 __main__:140 server: Shutting down
INFO 2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable
INFO 2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter
INFO 2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter
INFO 2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable
INFO 2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter
INFO 2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable
INFO 2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter
INFO 2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter
INFO 2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter
INFO 2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter
INFO 2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl
WARNING 2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl
INFO 2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter
INFO 2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl
INFO: Application shutdown complete.
INFO: Finished server process [33862]
```
[//]: # (## Documentation)
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
**Description:** Fixes some small nits in the llama CLI reference
Note: There are a few nits in this PR, but also has some small
suggestions, feel free to close if not necessary
# What does this PR do?
It's a dict that may contain different types, as per
resolver:instantiate_provider implementation. (AFAIU it also never
contains ProviderSpecs, but *instances* of provider implementations.)
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
mypy passing if enabled checks for these modules. (See #1543)
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Since #892, we also need to install ruamel. Instead of maintaining the
list of script dependencies in multiple places, remove it and assume
developers read CONTRIBUTING.md docs.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Just docs.
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Added missing shutdown handler. (Currently empty.)
Without it, when server shuts down, it posts the following warning:
```
__main__:129 server: No shutdown method for TorchtunePostTrainingImpl
```
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
(The test plan assumes shutdown logic is fixed, see #1495)
Without the patch:
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO 2025-03-10 20:56:43,961 __main__:140 server: Shutting down
INFO 2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable
INFO 2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter
INFO 2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter
INFO 2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable
INFO 2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter
INFO 2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable
INFO 2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter
INFO 2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter
INFO 2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter
INFO 2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter
INFO 2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl
WARNING 2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl
INFO 2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter
INFO 2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl
INFO: Application shutdown complete.
INFO: Finished server process [33862]
```
Run with the patch and observe no warning:
```
$ kill -INT $(ps ax | grep llama_stack.distribution.server.server | grep -v nvim | awk -e '{print $1}' | sort | head -n 1)
```
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO 2025-03-11 00:32:56,863 __main__:140 server: Shutting down
INFO 2025-03-11 00:32:56,864 __main__:124 server: Shutting down DatasetsRoutingTable
INFO 2025-03-11 00:32:56,866 __main__:124 server: Shutting down DatasetIORouter
INFO 2025-03-11 00:32:56,867 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-11 00:32:56,868 __main__:124 server: Shutting down ScoringRouter
INFO 2025-03-11 00:32:56,869 __main__:124 server: Shutting down ModelsRoutingTable
INFO 2025-03-11 00:32:56,870 __main__:124 server: Shutting down InferenceRouter
INFO 2025-03-11 00:32:56,871 __main__:124 server: Shutting down ShieldsRoutingTable
INFO 2025-03-11 00:32:56,872 __main__:124 server: Shutting down SafetyRouter
INFO 2025-03-11 00:32:56,873 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-11 00:32:56,874 __main__:124 server: Shutting down VectorIORouter
INFO 2025-03-11 00:32:56,875 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-11 00:32:56,876 __main__:124 server: Shutting down ToolRuntimeRouter
INFO 2025-03-11 00:32:56,877 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-11 00:32:56,878 __main__:124 server: Shutting down TelemetryAdapter
INFO 2025-03-11 00:32:56,879 __main__:124 server: Shutting down TorchtunePostTrainingImpl
INFO 2025-03-11 00:32:56,880 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-11 00:32:56,881 __main__:124 server: Shutting down EvalRouter
INFO 2025-03-11 00:32:56,882 __main__:124 server: Shutting down DistributionInspectImpl
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Reverts meta-llama/llama-stack#1252
The above PR breaks the following invocation:
```bash
llama stack run ~/.llama/distributions/together/together-run.yaml
```
# What does this PR do?
This PR has two fixes needed for correct trace context propagation
across asycnio boundary
Fix 1: Start using context vars to store the global trace context.
This is needed since we cannot use the same trace context across
coroutines since the state is shared. each coroutine
should have its own trace context so that each of it can start storing
its state correctly.
Fix 2: Start a new span for each new coroutines started for running
shields to keep the span tree clean
## Test Plan
### Integration tests with server
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run
~/.llama/distributions/together/together-run.yaml
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
server logs:
https://gist.github.com/dineshyv/51ac5d9864ed031d0d89ce77352821fe
test logs:
https://gist.github.com/dineshyv/e66acc1c4648a42f1854600609c467f3
### Integration tests with library client
LLAMA_STACK_CONFIG=fireworks pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
logs: https://gist.github.com/dineshyv/ca160696a0b167223378673fb1dcefb8
### Apps test with server:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```
server logs:
https://gist.github.com/dineshyv/1717a572d8f7c14279c36123b79c5797
app logs:
https://gist.github.com/dineshyv/44167e9f57806a0ba3b710c32aec02f8
## What does this PR do?
Created a new math_500 open-benchmark based on OpenAI's [Let's Verify
Step by Step](https://arxiv.org/abs/2305.20050) paper and hugging face's
[HuggingFaceH4/MATH-500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
dataset.
The challenge part of this benchmark is to parse the generated and
expected answer and verify if they are same. For the parsing part, we
refer to [Minerva: Solving Quantitative Reasoning Problems with Language
Models](https://research.google/blog/minerva-solving-quantitative-reasoning-problems-with-language-models/).
To simply the parse logic, as the next step, we plan to also refer to
what [simple-eval](https://github.com/openai/simple-evals) is doing,
using llm as judge to check if the generated answer matches the expected
answer or not
## Test Plan
on sever side, spin up a server with open-benchmark template `llama
stack run llama_stack/templates/open-benchamrk/run.yaml`
on client side, issue an open benchmark eval request `llama-stack-client
--endpoint xxx eval run-benchmark "meta-reference-math-500" --model-id
"meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/"
--num-examples 20` and get ther aggregated eval results
<img width="238" alt="Screenshot 2025-03-10 at 7 57 04 PM"
src="https://github.com/user-attachments/assets/2c9da042-3b70-470e-a7c4-69f4cc24d1fb"
/>
check the generated answer and the related scoring and they make sense