Bug https://github.com/meta-llama/llama-stack/issues/1357
# What does this PR do?
Fix a bug of a wrong file name in inline::localfs datasetio provider
[//]: # (If resolving an issue, uncomment and update the line below)
# (Closes#1357)
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Josh Salomon <jsalomon@redhat.com>
# What does this PR do?
- recent merge https://github.com/meta-llama/llama-stack/pull/1410
introduce error
```
ValueError: Provider meta-reference (Api.agents) does not implement the following methods:
[('list_agent_sessions', 'not_actually_implemented'), ('list_agents', 'not_actually_implemented')]
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
llama stack run
```
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct
```
1379530386
[//]: # (## Documentation)
# What does this PR do?
remove Llama-3.2-1B-Instruct for fireworks as its no longer appears to
be hosted on website.
## Test Plan
python distro_codegen.py
# What does this PR do?
as I brought up in #1515 it shouldn't be nessessary to tie the unit test
runner to an exact z-stream of Python 3.10
updated so unit test runner always uses latest z-stream of Python 3.10
## Test Plan
```shell
$ uv run -p 3.10 --with-editable . --with-editable ".[dev]" --with-editable ".[unit]" pytest --cov=llama_stack -s -v tests/unit/ --junitxml=pytest-report.xml
```
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
setting $LLAMA_STACK_LOG_FILE will pipe the logs to a file as well as
stdout. this is done by using a logging FileHandler
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
Add support for listing agents, describing an agent, and retrieving
session IDs for a given agent. This is only the API definition, the
implementations will come separately.
Closes: https://github.com/meta-llama/llama-stack/issues/1294
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
If implementation raises CancelledError (e.g. when it runs its own async
loop for jobs), the main server shutdown handler gets confused and
doesn't attempt to shut down the main loop tasks.
While at it, also fixing the following failure when this happens:
```
UnboundLocalError: cannot access local variable 'loop' where it is not
associated with a value
```
Shutdown handlers were not running because lifespan logic was broken
since ~Oct 2024. Fixed that too and enforcing `lifespan` now (making
sure server will crash when it fails to interact with app through
middleware).
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Spotted while working on
https://github.com/meta-llama/llama-stack/pull/1437
One way to trigger it without the PR above is to add `raise
CancelledError` in
any of the running providers' `shutdown` methods; then `kill -INT <pid>`
the
server process.
Validated this with the following test patch:
```
diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py
index b85c463a..10dad83e 100644
--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -174,6 +174,7 @@ def handle_signal(app, signum, _) -> None:
except asyncio.CancelledError:
pass
finally:
+ logger.info("Stopping event loop")
loop.stop()
loop = asyncio.get_running_loop()
diff --git a/llama_stack/providers/inline/post_training/torchtune/post_training.py b/llama_stack/providers/inline/post_training/torchtune/post_training.py
index b837362d..163f43d8 100644
--- a/llama_stack/providers/inline/post_training/torchtune/post_training.py
+++ b/llama_stack/providers/inline/post_training/torchtune/post_training.py
@@ -3,6 +3,7 @@
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
+import asyncio
from datetime import datetime
from typing import Any, Dict, Optional
@@ -43,6 +44,9 @@ class TorchtunePostTrainingImpl:
self.jobs = {}
self.checkpoints_dict = {}
+ async def shutdown(self) -> None:
+ raise asyncio.CancelledError("Shutdown")
+
async def supervised_fine_tune(
self,
job_uuid: str,
```
Without the fix:
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Finished server process [52099]
INFO 2025-03-07 23:25:33,548 __main__:143 server: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-03-07 23:25:33,550 __main__:150 server: Shutting down DatasetsRoutingTable
INFO 2025-03-07 23:25:33,551 __main__:177 server: Stopping event loop
ERROR 2025-03-07 23:25:33,552 asyncio:1785 uncategorized: unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-12' coro=<handle_signal.<locals>.shutdown() done, defined at
/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:145>
exception=UnboundLocalError("cannot access local variable 'loop' where it is not associated with a value")>
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:178 in shutdown │
│ │
│ 175 │ │ │ pass │
│ 176 │ │ finally: │
│ 177 │ │ │ logger.info("Stopping event loop") │
│ ❱ 178 │ │ │ loop.stop() │
│ 179 │ │
│ 180 │ loop = asyncio.get_running_loop() │
│ 181 │ loop.create_task(shutdown()) │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
UnboundLocalError: cannot access local variable 'loop' where it is not associated with a value
```
With the fix, now seeing the following messages when the server is
killed:
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Finished server process [50836]
INFO 2025-03-07 23:20:35,182 __main__:143 server: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-03-07 23:20:35,184 __main__:149 server: Shutting down DatasetsRoutingTable
ERROR 2025-03-07 23:20:35,185 __main__:158 server: Failed to shutdown DatasetsRoutingTable: {CancelledError()}
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /usr/lib64/python3.11/asyncio/tasks.py:476 in wait_for │
│ │
│ 473 │ try: │
│ 474 │ │ # wait until the future completes or the timeout │
│ 475 │ │ try: │
│ ❱ 476 │ │ │ await waiter │
│ 477 │ │ except exceptions.CancelledError: │
│ 478 │ │ │ if fut.done(): │
│ 479 │ │ │ │ return fut.result() │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CancelledError
During handling of the above exception, another exception occurred:
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown │
│ │
│ 149 │ │ │ logger.info("Shutting down %s", impl_name) │
│ 150 │ │ │ try: │
│ 151 │ │ │ │ if hasattr(impl, "shutdown"): │
│ ❱ 152 │ │ │ │ │ await asyncio.wait_for(impl.shutdown(), timeout=5) │
│ 153 │ │ │ │ else: │
│ 154 │ │ │ │ │ logger.warning("No shutdown method for %s", impl_name) │
│ 155 │ │ │ except asyncio.TimeoutError: │
│ │
│ /usr/lib64/python3.11/asyncio/tasks.py:479 in wait_for │
│ │
│ 476 │ │ │ await waiter │
│ 477 │ │ except exceptions.CancelledError: │
│ 478 │ │ │ if fut.done(): │
│ ❱ 479 │ │ │ │ return fut.result() │
│ 480 │ │ │ else: │
│ 481 │ │ │ │ fut.remove_done_callback(cb) │
│ 482 │ │ │ │ # We must ensure that the task is not running │
│ │
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/routers/routing_tables.py:131 in shutdown │
│ │
│ 128 │ │ │ elif api == Api.tool_runtime: │
│ 129 │ │ │ │ p.tool_store = self │
│ 130 │ │
│ ❱ 131 │ async def shutdown(self) -> None: │
│ 132 │ │ for p in self.impls_by_provider_id.values(): │
│ 133 │ │ │ await p.shutdown() │
│ 134 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CancelledError
INFO 2025-03-07 23:20:35,295 __main__:149 server: Shutting down DatasetIORouter
INFO 2025-03-07 23:20:35,296 __main__:149 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-07 23:20:35,297 __main__:149 server: Shutting down ScoringRouter
INFO 2025-03-07 23:20:35,298 __main__:149 server: Shutting down ModelsRoutingTable
INFO 2025-03-07 23:20:35,299 __main__:149 server: Shutting down InferenceRouter
INFO 2025-03-07 23:20:35,300 __main__:149 server: Shutting down ShieldsRoutingTable
INFO 2025-03-07 23:20:35,300 __main__:149 server: Shutting down SafetyRouter
INFO 2025-03-07 23:20:35,301 __main__:149 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-07 23:20:35,302 __main__:149 server: Shutting down VectorIORouter
INFO 2025-03-07 23:20:35,303 __main__:149 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-07 23:20:35,304 __main__:149 server: Shutting down ToolRuntimeRouter
INFO 2025-03-07 23:20:35,304 __main__:149 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-07 23:20:35,305 __main__:149 server: Shutting down TelemetryAdapter
INFO 2025-03-07 23:20:35,306 __main__:149 server: Shutting down TorchtunePostTrainingImpl
ERROR 2025-03-07 23:20:35,307 __main__:158 server: Failed to shutdown TorchtunePostTrainingImpl:
{CancelledError('Shutdown')}
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown │
│ │
│ 149 │ │ │ logger.info("Shutting down %s", impl_name) │
│ 150 │ │ │ try: │
│ 151 │ │ │ │ if hasattr(impl, "shutdown"): │
│ ❱ 152 │ │ │ │ │ await asyncio.wait_for(impl.shutdown(), timeout=5) │
│ 153 │ │ │ │ else: │
│ 154 │ │ │ │ │ logger.warning("No shutdown method for %s", impl_name) │
│ 155 │ │ │ except asyncio.TimeoutError: │
│ │
│ /usr/lib64/python3.11/asyncio/tasks.py:489 in wait_for │
│ │
│ 486 │ │ │ │ raise │
│ 487 │ │ │
│ 488 │ │ if fut.done(): │
│ ❱ 489 │ │ │ return fut.result() │
│ 490 │ │ else: │
│ 491 │ │ │ fut.remove_done_callback(cb) │
│ 492 │ │ │ # We must ensure that the task is not running │
│ │
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/providers/inline/post_training/torchtune/post_training. │
│ py:48 in shutdown │
│ │
│ 45 │ │ self.checkpoints_dict = {} │
│ 46 │ │
│ 47 │ async def shutdown(self) -> None: │
│ ❱ 48 │ │ raise asyncio.CancelledError("Shutdown") │
│ 49 │ │
│ 50 │ async def supervised_fine_tune( │
│ 51 │ │ self, │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CancelledError: Shutdown
INFO 2025-03-07 23:20:35,352 __main__:149 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-07 23:20:35,353 __main__:149 server: Shutting down EvalRouter
INFO 2025-03-07 23:20:35,354 __main__:149 server: Shutting down DistributionInspectImpl
INFO 2025-03-07 23:20:35,355 __main__:177 server: Stopping event loop
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 488, in <module>
main()
File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 476, in main
uvicorn.run(**uvicorn_config)
File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/main.py", line 579, in run
server.run()
File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/server.py", line 66, in run
return asyncio.run(self.serve(sockets=sockets))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/asyncio/runners.py", line 189, in run
with Runner(debug=debug) as runner:
File "/usr/lib64/python3.11/asyncio/runners.py", line 63, in __exit__
self.close()
File "/usr/lib64/python3.11/asyncio/runners.py", line 71, in close
_cancel_all_tasks(loop)
File "/usr/lib64/python3.11/asyncio/runners.py", line 201, in _cancel_all_tasks
loop.run_until_complete(tasks.gather(*to_cancel, return_exceptions=True))
File "/usr/lib64/python3.11/asyncio/base_events.py", line 652, in run_until_complete
raise RuntimeError('Event loop stopped before Future completed.')
RuntimeError: Event loop stopped before Future completed.
++ error_handler 104
++ echo 'Error occurred in script at line: 104'
Error occurred in script at line: 104
++ exit 1
```
With all patches included, the shutdown now looks as follows:
```
$ kill -INT $(ps ax | grep llama_stack.distribution.server.server | grep -v nvim | awk -e '{print $1}' | sort | head -n 1)
```
```
20:56:09.308 [START]
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO 2025-03-10 20:56:43,961 __main__:140 server: Shutting down
INFO 2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable
INFO 2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter
INFO 2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter
INFO 2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable
INFO 2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter
INFO 2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable
INFO 2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter
INFO 2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter
INFO 2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter
INFO 2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter
INFO 2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl
WARNING 2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl
INFO 2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter
INFO 2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl
INFO: Application shutdown complete.
INFO: Finished server process [33862]
```
[//]: # (## Documentation)
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
**Description:** Fixes some small nits in the llama CLI reference
Note: There are a few nits in this PR, but also has some small
suggestions, feel free to close if not necessary
# What does this PR do?
It's a dict that may contain different types, as per
resolver:instantiate_provider implementation. (AFAIU it also never
contains ProviderSpecs, but *instances* of provider implementations.)
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
mypy passing if enabled checks for these modules. (See #1543)
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Since #892, we also need to install ruamel. Instead of maintaining the
list of script dependencies in multiple places, remove it and assume
developers read CONTRIBUTING.md docs.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Just docs.
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Added missing shutdown handler. (Currently empty.)
Without it, when server shuts down, it posts the following warning:
```
__main__:129 server: No shutdown method for TorchtunePostTrainingImpl
```
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
(The test plan assumes shutdown logic is fixed, see #1495)
Without the patch:
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO 2025-03-10 20:56:43,961 __main__:140 server: Shutting down
INFO 2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable
INFO 2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter
INFO 2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter
INFO 2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable
INFO 2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter
INFO 2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable
INFO 2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter
INFO 2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter
INFO 2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter
INFO 2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter
INFO 2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl
WARNING 2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl
INFO 2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter
INFO 2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl
INFO: Application shutdown complete.
INFO: Finished server process [33862]
```
Run with the patch and observe no warning:
```
$ kill -INT $(ps ax | grep llama_stack.distribution.server.server | grep -v nvim | awk -e '{print $1}' | sort | head -n 1)
```
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO 2025-03-11 00:32:56,863 __main__:140 server: Shutting down
INFO 2025-03-11 00:32:56,864 __main__:124 server: Shutting down DatasetsRoutingTable
INFO 2025-03-11 00:32:56,866 __main__:124 server: Shutting down DatasetIORouter
INFO 2025-03-11 00:32:56,867 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-11 00:32:56,868 __main__:124 server: Shutting down ScoringRouter
INFO 2025-03-11 00:32:56,869 __main__:124 server: Shutting down ModelsRoutingTable
INFO 2025-03-11 00:32:56,870 __main__:124 server: Shutting down InferenceRouter
INFO 2025-03-11 00:32:56,871 __main__:124 server: Shutting down ShieldsRoutingTable
INFO 2025-03-11 00:32:56,872 __main__:124 server: Shutting down SafetyRouter
INFO 2025-03-11 00:32:56,873 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-11 00:32:56,874 __main__:124 server: Shutting down VectorIORouter
INFO 2025-03-11 00:32:56,875 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-11 00:32:56,876 __main__:124 server: Shutting down ToolRuntimeRouter
INFO 2025-03-11 00:32:56,877 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-11 00:32:56,878 __main__:124 server: Shutting down TelemetryAdapter
INFO 2025-03-11 00:32:56,879 __main__:124 server: Shutting down TorchtunePostTrainingImpl
INFO 2025-03-11 00:32:56,880 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-11 00:32:56,881 __main__:124 server: Shutting down EvalRouter
INFO 2025-03-11 00:32:56,882 __main__:124 server: Shutting down DistributionInspectImpl
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Reverts meta-llama/llama-stack#1252
The above PR breaks the following invocation:
```bash
llama stack run ~/.llama/distributions/together/together-run.yaml
```
# What does this PR do?
This PR has two fixes needed for correct trace context propagation
across asycnio boundary
Fix 1: Start using context vars to store the global trace context.
This is needed since we cannot use the same trace context across
coroutines since the state is shared. each coroutine
should have its own trace context so that each of it can start storing
its state correctly.
Fix 2: Start a new span for each new coroutines started for running
shields to keep the span tree clean
## Test Plan
### Integration tests with server
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run
~/.llama/distributions/together/together-run.yaml
LLAMA_STACK_CONFIG=http://localhost:8321 pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
server logs:
https://gist.github.com/dineshyv/51ac5d9864ed031d0d89ce77352821fe
test logs:
https://gist.github.com/dineshyv/e66acc1c4648a42f1854600609c467f3
### Integration tests with library client
LLAMA_STACK_CONFIG=fireworks pytest -s --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
logs: https://gist.github.com/dineshyv/ca160696a0b167223378673fb1dcefb8
### Apps test with server:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/together/together-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```
server logs:
https://gist.github.com/dineshyv/1717a572d8f7c14279c36123b79c5797
app logs:
https://gist.github.com/dineshyv/44167e9f57806a0ba3b710c32aec02f8
## What does this PR do?
Created a new math_500 open-benchmark based on OpenAI's [Let's Verify
Step by Step](https://arxiv.org/abs/2305.20050) paper and hugging face's
[HuggingFaceH4/MATH-500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
dataset.
The challenge part of this benchmark is to parse the generated and
expected answer and verify if they are same. For the parsing part, we
refer to [Minerva: Solving Quantitative Reasoning Problems with Language
Models](https://research.google/blog/minerva-solving-quantitative-reasoning-problems-with-language-models/).
To simply the parse logic, as the next step, we plan to also refer to
what [simple-eval](https://github.com/openai/simple-evals) is doing,
using llm as judge to check if the generated answer matches the expected
answer or not
## Test Plan
on sever side, spin up a server with open-benchmark template `llama
stack run llama_stack/templates/open-benchamrk/run.yaml`
on client side, issue an open benchmark eval request `llama-stack-client
--endpoint xxx eval run-benchmark "meta-reference-math-500" --model-id
"meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/"
--num-examples 20` and get ther aggregated eval results
<img width="238" alt="Screenshot 2025-03-10 at 7 57 04 PM"
src="https://github.com/user-attachments/assets/2c9da042-3b70-470e-a7c4-69f4cc24d1fb"
/>
check the generated answer and the related scoring and they make sense
This is unfortunate because `sqlite-vec` seems promising. But its PIP
package is not quite complete. It does not have binary for arm64 (I
think, or maybe it even lacks 64 bit builds?) which results in the arm64
container resulting in
```
File "/usr/local/lib/python3.10/site-packages/sqlite_vec/init.py", line 17, in load
conn.load_extension(loadable_path())
sqlite3.OperationalError: /usr/local/lib/python3.10/site-packages/sqlite_vec/vec0.so: wrong ELF class: ELFCLASS32
```
To get around I tried to install from source via `uv pip install
sqlite-vec --no-binary=sqlite-vec` however it even lacks a source
distribution which makes that impossible.
## Test Plan
Build the container locally using:
```bash
LLAMA_STACK_DIR=. llama stack build --template ollama --image-type container
```
Run the container as:
```
podman run --privileged -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env OLLAMA_URL=http://host.containers.internal:11434 \
-v ~/local/llama-stack:/app/llama-stack-source
localhost/distribution-ollama:dev --port $LLAMA_STACK_PORT
```
Verify the container starts up correctly. Without this patch, it would
encounter the ELFCLASS32 error.
# What does this PR do?
Users prefer to rely on the main CLI rather than invoking the server
through a Python module. Users interact with a high-level CLI rather
than needing to know internal module structures.
Now, when running llama stack run <path-to-config>, the server will
attempt to use the system package or a virtual environment if one is
active.
This also eliminates the current process dependency chain when running
from a virtual environment:
-> llama stack run
-> start_env.sh
-> python -m server...
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Run:
```
ollama run llama3.2:3b-instruct-fp16 --keepalive=2m &
llama stack run ./llama_stack/templates/ollama/run.yaml --disable-ipv6
```
Notice that the server starts and shutdowns normally.
[//]: # (## Documentation)
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
It should use `export` for env var for api key.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
The `test` section has been updated to include only the essential
dependencies needed for running integration tests, which are shared
across all providers. If a provider requires additional dependencies,
please add them to your environment separately. When using uv to
run your tests, you can specify extra dependencies with the
`--with` flag.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Add unit tests for the inspect endpoint.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
$ ollama run llama3.2:3b-instruct-fp16 --keepalive=60m &
$ LLAMA_STACK_CONFIG=./llama_stack/templates/ollama/run.yaml uv run
pytest -v -s tests/integration/inspect/test_inspect.py
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207:
PytestDeprecationWarning: The configuration option
"asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the
fixture caching scope. Future versions of pytest-asyncio will default
the loop scope for asynchronous fixtures to function scope. Set the
default fixture loop scope explicitly in order to avoid unexpected
behavior in the future. Valid fixture loop scopes are: "function",
"class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================== test session starts
==============================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 --
/Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform':
'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4',
'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1',
'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0,
nbval-0.11.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items
tests/integration/inspect/test_inspect.py::TestInspect::test_health[txt=8B]
PASSED
tests/integration/inspect/test_inspect.py::TestInspect::test_version[txt=8B]
PASSED
========================================= 2 passed, 3 warnings in 2.26s
===================================
```
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
currently the `"Environment variable LLAMA_STACK_LOGGING found"` message
is printed with no color switch to cprint and highlight in yellow for
visibility
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
The test class by default enables debug mode, which produces some
unexpected warnings like:
```
tests/unit/models/test_prompt_adapter.py::PrepareMessagesTests::test_completion_message_encoding
WARNING 2025-03-10 20:41:48,577 asyncio:1904 uncategorized: Executing <Task pending name='Task-1'
coro=<IsolatedAsyncioTestCase._asyncioLoopRunner() running at
/home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/unittest/async_case.py:95
> wait_for=<Future pending cb=[Task.task_wakeup()] created at
/home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/asyncio/base_events.py:42
9> created at
/home/ec2-user/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/unittest/async_case.py:11
7> took 0.231 seconds
PASSED
```
I suggest we disable these since they are not very useful and can
confuse other developers.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Run tests. The warnings are no longer seen.
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Uses together async client instead of sync client
[//]: # (If resolving an issue, uncomment and update the line below)
## Test Plan
Command to run the test is in the image below(2 tests fail, and they
were failing for the old stable version as well with the same errors.)
<img width="1689" alt="image"
src="https://github.com/user-attachments/assets/503db720-5379-425d-9844-0225010e41a1"
/>
[//]: # (## Documentation)
---------
Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
We removed `llama-models` as a dep which was pulling this in for us
previously. This did not get caught in the release process because the
distros we use for testing (fireworks / together) pull that in via
sentence transformers which we don't use in all distros (notably
ollama.)
See #1511
## Test Plan
Ran `llama-stack-ops/actions/test-and-cut/main.sh` with
`ONLY_TEST_DONT_CUT=1 COMMIT_ID=origin/fix_jinja2` and by making it
build the ollama docker. Ran the docker to ensure it does not error out
with jinja2 dependency error. (Unfortunately there is another error with
sqlite_vec there.)
This disambiguates "Image" term from "container image" alternative usage
and allows for:
```python
if image_type == LlamaStackImagetype.venv:
...
```
accesses rather than `ImageType.venv.value`
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Changes enum use to comply with semantic python styling and naming
conventions.
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Refactor was automated and small so simple run-through of creating
images was done.
Signed-off-by: James Kunstle <jkunstle@redhat.com>
# What does this PR do?
This PR allows for unit test code coverage % to be reported in PR
builds. Currently, today's output tells the end user which tests passed
and which tests failed:
<img width="744" alt="Screenshot 2025-03-10 at 9 44 28 AM"
src="https://github.com/user-attachments/assets/40b1a578-951f-4b74-8a37-a39c039b1d7e"
/>
If a contributor is creating a new module within Llama Stack and starts
writing unit tests for that module, it might be difficult for Llama
Stack maintainers to immediately determine the code coverage percentage
for that new module.
To allow for code coverage reporting in the CI, we simply need to
install `pytest-cov` so we can use the `--cov` flag with the existing
`pytest` command.
Ideally, it would be nicer to have a bot report code coverage, but this
PR can be a temporary solution.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
I ran these changes locally:
<img width="1455" alt="Screenshot 2025-03-10 at 10 01 53 AM"
src="https://github.com/user-attachments/assets/dfd765c6-5979-42a3-b899-7713a3f202e6"
/>
PR build to confirm the expected behavior:
<img width="1326" alt="Screenshot 2025-03-10 at 12 47 36 PM"
src="https://github.com/user-attachments/assets/fe94f1e6-fbb5-4e57-9902-197502c50621"
/>
[//]: # (## Documentation)
Signed-off-by: Courtney Pacheco <6019922+courtneypacheco@users.noreply.github.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
It should changed in this pr
https://github.com/meta-llama/llama-stack/pull/1190/files#diff-53e3f35ced54ee5e57dc8b0d3b04770ed84f2f6434c6f492f42569b3c2810ecd
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
- Together's inference only supports 3.1 for structured decoding
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
pytest -v -s --nbval-lax ./docs/getting_started.ipynb
```
[//]: # (## Documentation)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
This PR converts blocking calls for in built tools like wolfram, brave,
tavily and bing into non blocking async calls
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
pytest -s -v tool_runtime/test_builtin_tools.py --stack-config=together
--text-model=meta-llama/Llama-3.1-8B-Instruct
Used the command above to get the below results
<img width="1710" alt="image"
src="https://github.com/user-attachments/assets/76b0ca06-f6e4-45fa-a114-0449bef2325b"
/>
<img width="1389" alt="image"
src="https://github.com/user-attachments/assets/5220ccbb-7882-4240-b17e-f362ad46d25b"
/>
<img width="1432" alt="image"
src="https://github.com/user-attachments/assets/bb93a41e-e82a-4c98-a22d-6b0e320aa974"
/>
[//]: # (## Documentation)
---------
Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
Concurrent requests should not trample (or reuse) each others' provider
data. Provider data should be scoped to each request.
## Test Plan
Set the uvicorn server to have a single worker process + thread by
updating the config:
```python
uvicorn_config = {
...
"workers": 1,
"loop": "asyncio",
}
```
Then perform the following steps on `origin/main` (without this change).
(1) Run the server using `llama stack run dev` without having
`FIREWORKS_API_KEY` in the environment.
(2) Run a test by specifying the FIREWORKS_API_KEY env var so it gets
stored in the thread local
```
pytest -s -v tests/integration/inference/test_text_inference.py \
--stack-config http://localhost:8321 \
--text-model accounts/fireworks/models/llama-v3p1-8b-instruct \
-k test_text_chat_completion_with_tool_calling_and_streaming \
--env FIREWORKS_API_KEY=<...>
```
Ensure you don't have any other API keys in the environment (otherwise
the bug will not reproduce due to other specifics in our testing code.)
Verify this works.
(3) Run the same command again without specifying FIREWORKS_API_KEY. See
that the request actually succeeds when it *should have failed*.
----
Now do the same tests on this branch, verify step (3) results in
failure.
Finally, run the full `test_text_inference.py` test suite with this
change, verify it succeeds.
Summary:
| File
"/Users/erichuang/projects/llama-stack/llama_stack/distribution/server/server.py",
line 213, in sse_generator
| logger.exception(f"Error in sse_generator: {e}")
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1864, in exception
| self.log(ERROR, msg, *args, exc_info=exc_info, **kwargs)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1879, in log
| self.logger.log(level, msg, *args, **kwargs)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1547, in log
| self._log(level, msg, args, **kwargs)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1624, in _log
| self.handle(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1634, in handle
| self.callHandlers(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1696, in callHandlers
| hdlr.handle(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 968, in handle
| self.emit(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py",
line 167, in emit
| message_renderable = self.render_message(record, message)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py",
line 193, in render_message
| message_text = Text.from_markup(message) if use_markup else
Text(message)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/text.py",
line 287, in from_markup
| rendered_text = render(text, style, emoji=emoji,
emoji_variant=emoji_variant)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/markup.py",
line 167, in render
| raise MarkupError(
| rich.errors.MarkupError: closing tag '[/INST]' at position 105 doesn't
match any open tag
Test Plan:
reran failing rag_with_vector_db example
# What does this PR do?
- do not dump all commit history in CHANGELOG
cc @terrytangyuan
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
python scripts/gen-changelog.py
```
[//]: # (## Documentation)
Summary:
CI writes files to /tmp
[{"__module__": "llama_stack.apis.inference.inference", "__pydantic__":
"SystemMessage", "data": {"content": "You are a helpful assistant",
"role": "system"}}, {"__module__":
"llama_stack.apis.inference.inference", "__pydantic__": "UserMessage",
"data": {"content": "Here is a csv file, can you describe it?",
"context": null, "role": "user"}}, {"__module__":
"llama_stack.apis.inference.inference", "__pydantic__":
"ToolResponseMessage", "data": {"call_id": "", "content": [{"text": "#
User provided a file accessible to you at
\\"/tmp/tmp7k7dg6qk/gcDtT5M8inflation.csv\\"\\nYou can use
code_interpreter to load and inspect it.", "type": "text"}], "role":
"tool", "tool_name": {"__enum__": "BuiltinTool", "__module__":
"llama_stack.models.llama.datatypes", "value": "code_interpreter"}}}]],
{"response_format": null, "sa
Test Plan: