# What does this PR do?
Fixes a bunch of violations.
Note: this patch touches all files but post_training.py that will be
significantly changed by #1437, hence leaving it out of the picture for
now.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Testing with https://github.com/meta-llama/llama-stack/pull/1543
Also checked that GPU training works with the change:
```
INFO: ::1:53316 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 200 OK
INFO: ::1:53316 - "GET /v1/post-training/job/status?job_uuid=test-jobb5ca2d84-d541-42f8-883b-762828b4c0e7 HTTP/1.1" 200 OK
INFO: ::1:53316 - "GET /v1/post-training/job/artifacts?job_uuid=test-jobb5ca2d84-d541-42f8-883b-762828b4c0e7 HTTP/1.1" 200 OK
21:24:01.161 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (32526.75ms)
21:23:28.769 [DEBUG] Setting manual seed to local seed 3918872849. Local seed is seed + rank = 3918872849 + 0
21:23:28.996 [INFO] Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights.
21:23:29.933 [INFO] Memory stats after model init:
GPU peak memory allocation: 6.05 GiB
GPU peak memory reserved: 6.10 GiB
GPU peak memory active: 6.05 GiB
21:23:29.934 [INFO] Model is initialized with precision torch.bfloat16.
21:23:30.115 [INFO] Tokenizer is initialized.
21:23:30.118 [INFO] Optimizer is initialized.
21:23:30.119 [INFO] Loss is initialized.
21:23:30.896 [INFO] Dataset and Sampler are initialized.
21:23:30.898 [INFO] Learning rate scheduler is initialized.
21:23:31.618 [INFO] Memory stats after model init:
GPU peak memory allocation: 6.24 GiB
GPU peak memory reserved: 6.30 GiB
GPU peak memory active: 6.24 GiB
21:23:31.620 [INFO] Starting checkpoint save...
21:23:59.428 [INFO] Model checkpoint of size 6.43 GB saved to /home/ec2-user/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth
21:23:59.445 [INFO] Adapter checkpoint of size 0.00 GB saved to /home/ec2-user/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Made code interpreter tool call to be async such that its non blocking
## Test Plan
pytest -s -v tests/integration/agents/test_agents.py
--stack-config=together --text-model=meta-llama/Llama-3.3-70B-Instruct
<img width="1693" alt="image"
src="https://github.com/user-attachments/assets/42520bb6-7acf-42d5-b71f-b35ca149d722"
/>
[//]: # (## Documentation)
Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
# What does this PR do?
- Removed Optional return types for GET methods
- Raised ValueError when requested resource is not found
- Ensures proper 4xx response for missing resources
- Updated the API generator to check for wrong signatures
```
$ uv run --with ".[dev]" ./docs/openapi_generator/run_openapi_generator.sh
Validating API method return types...
API Method Return Type Validation Errors:
Method ScoringFunctions.get_scoring_function returns Optional type
```
Closes: https://github.com/meta-llama/llama-stack/issues/1630
## Test Plan
Run the server then:
```
curl http://127.0.0.1:8321/v1/models/foo
{"detail":"Invalid value: Model 'foo' not found"}%
```
Server log:
```
INFO: 127.0.0.1:52307 - "GET /v1/models/foo HTTP/1.1" 400 Bad Request
09:51:42.654 [END] /v1/models/foo [StatusCode.OK] (134.65ms)
09:51:42.651 [ERROR] Error executing endpoint route='/v1/models/{model_id:path}' method='get'
Traceback (most recent call last):
File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 193, in endpoint
return await maybe_await(value)
File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 156, in maybe_await
return await value
File "/Users/leseb/Documents/AI/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper
result = await method(self, *args, **kwargs)
File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 217, in get_model
raise ValueError(f"Model '{model_id}' not found")
ValueError: Model 'foo' not found
```
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Removed local execution option from the remote Qdrant provider and
introduced an explicit inline provider for the embedded execution.
Updated the ollama template to include this option: this part can be
reverted in case we don't want to have two default `vector_io`
providers.
(Closes#1082)
## Test Plan
Build and run an ollama distro:
```bash
llama stack build --template ollama --image-type conda
llama stack run --image-type conda ollama
```
Run one of the sample ingestionapplicatinos like
[rag_with_vector_db.py](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py),
but replace this line:
```py
selected_vector_provider = vector_providers[0]
```
with the following, to use the `qdrant` provider:
```py
selected_vector_provider = vector_providers[1]
```
After running the test code, verify the timestamp of the Qdrant store:
```bash
% ls -ltr ~/.llama/distributions/ollama/qdrant.db/collection/test_vector_db_*
total 784
-rw-r--r--@ 1 dmartino staff 401408 Feb 26 10:07 storage.sqlite
```
[//]: # (## Documentation)
---------
Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
Co-authored-by: Francisco Arceo <farceo@redhat.com>
# What does this PR do?
support nvidia hosted 3.2 11b/90b vision models. they are not hosted on
the common https://integrate.api.nvidia.com/v1. they are hosted on their
own individual urls.
## Test Plan
`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v
tests/client-sdk/inference/test_vision_inference.py
--inference-model=meta/llama-3.2-11b-vision-instruct -k image`
# What does this PR do?
Adds a container file that can be used to build the playground UI.
This file will be built by this PR in the stack-ops repo:
https://github.com/meta-llama/llama-stack-ops/pull/9
Docker command in the docs will need to change once I know the address
of the official repository.
## Test Plan
Tested image on my local Openshift Instance using this helm chart:
https://github.com/Jaland/llama-stack-helm/tree/main/llama-stack
[//]: # (## Documentation)
---------
Co-authored-by: Jamie Land <hokie10@gmail.com>
# What does this PR do?
Add the option to not verify SSL certificates for the remote-vllm
provider. This allows llama stack server to talk to remote LLMs which
have self-signed certificates
Partially addresses #1545
# Summary:
Includes fixes to get test_agents working with openAI model, e.g. tool
parsing and message conversion
# Test Plan:
```
LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model openai/gpt-4o-mini
```
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1550).
* #1556
* __->__ #1550
# What does this PR do?
A PTY is unnecessary for interactive mode since `subprocess.run()`
already inherits the calling terminal’s stdin, stdout, and stderr,
allowing natural interaction. Using a PTY can introduce unwanted side
effects like buffering issues and inconsistent signal handling. Standard
input/output is sufficient for most interactive programs.
This commit simplifies the command execution by:
1. Removing PTY-based execution in favor of direct subprocess handling
2. Consolidating command execution into a single run_command function
3. Improving error handling with specific subprocess error types
4. Adding proper type hints and documentation
5. Maintaining Ctrl+C handling for graceful interruption
## Test Plan
```
llama stack run
```
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Currently there is no shutdown method implemented for the `ProviderImpl`
class
This leads to the following warning
```shell
INFO: Waiting for application shutdown.
INFO 2025-03-17 17:25:13,280 __main__:145 server: Shutting down
INFO 2025-03-17 17:25:13,282 __main__:129 server: Shutting down ModelsRoutingTable
INFO 2025-03-17 17:25:13,284 __main__:129 server: Shutting down DatasetsRoutingTable
INFO 2025-03-17 17:25:13,286 __main__:129 server: Shutting down DatasetIORouter
INFO 2025-03-17 17:25:13,287 __main__:129 server: Shutting down TelemetryAdapter
INFO 2025-03-17 17:25:13,288 __main__:129 server: Shutting down InferenceRouter
INFO 2025-03-17 17:25:13,290 __main__:129 server: Shutting down ShieldsRoutingTable
INFO 2025-03-17 17:25:13,291 __main__:129 server: Shutting down SafetyRouter
INFO 2025-03-17 17:25:13,292 __main__:129 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-17 17:25:13,293 __main__:129 server: Shutting down VectorIORouter
INFO 2025-03-17 17:25:13,294 __main__:129 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-17 17:25:13,295 __main__:129 server: Shutting down ToolRuntimeRouter
INFO 2025-03-17 17:25:13,296 __main__:129 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-17 17:25:13,297 __main__:129 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-17 17:25:13,298 __main__:129 server: Shutting down ScoringRouter
INFO 2025-03-17 17:25:13,299 __main__:129 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-17 17:25:13,300 __main__:129 server: Shutting down EvalRouter
INFO 2025-03-17 17:25:13,301 __main__:129 server: Shutting down DistributionInspectImpl
INFO 2025-03-17 17:25:13,303 __main__:129 server: Shutting down ProviderImpl
WARNING 2025-03-17 17:25:13,304 __main__:134 server: No shutdown method for ProviderImpl
INFO: Application shutdown complete.
INFO: Finished server process [1]
```
## Test Plan
Start a server and shut it down
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
Update the container build script so that it is compatible with podman.
The --progress=plain is now the default option and can be overriden.
## Test Plan
N/A
[//]: # (## Documentation)
Signed-off-by: Jeff MAURY <jmaury@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
The generate_response_prompt had an import error, fixed that error.
Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
# What does this PR do?
current passthrough impl returns chatcompletion_message.content as a
TextItem() , not a straight string. so it's not compatible with other
providers, and causes parsing error downstream.
change away from the generic pydantic conversion, and explicitly parse
out content.text
## Test Plan
setup llama server with passthrough
```
llama-stack-client eval run-benchmark "MMMU_Pro_standard" --model-id meta-llama/Llama-3-8B --output-dir /tmp/ --num-examples 20
```
works without parsing error
# What does this PR do?
create a new dataset BFCL_v3 from
https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html
overall each question asks the model to perform a task described in
natural language, and additionally a set of available functions and
their schema are given for the model to choose from. the model is
required to write the function call form including function name and
parameters , to achieve the stated purpose. the results are validated
against provided ground truth, to make sure that the generated function
call and the ground truth function call are syntactically and
semantically equivalent, by checking their AST .
## Test Plan
start server by
```
llama stack run ./llama_stack/templates/ollama/run.yaml
```
then send traffic
```
llama-stack-client eval run-benchmark "bfcl" --model-id meta-llama/Llama-3.2-3B-Instruct --output-dir /tmp/gpqa --num-examples 2
```
[//]: # (## Documentation)
# What does this PR do?
a user should be able to store a static logging configuration outside of
their environment. This would make sense to store in the run yaml given
that we store other things like server configuration in there.
The environment variable settings override the config settings if both
are available.
The format in the config looks like this:
```
logging_config:
category_levels:
VALID_CATEGORY: VALID_STRING_LOG_LEVEL
```
any specified category out of the following:
`core | server | router | inference | agents | safety | eval | tools |
client`
combined with any of the following log levels:
`debug | info | warning | error | critical`
can be placed in the category_levels list in order to achieve the
desired log level
## Test Plan
Test locally with a run config like the following:
```
version: '2'
image_name: ollama
logging_config:
category_levels:
server: debug
apis:
...
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
Updates the help text for the `llama model prompt-format` command to
clarify that users should provide a specific model name (e.g.,
Llama3.1-8B, Llama3.2-11B-Vision), not a model family. Removes the
default value and field for `--model-name` to prevent users from
mistakenly thinking a model family name is acceptable. Adds guidance to
run `llama model list` to view valid model names.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Output of `llama model prompt-format -h` Before:
```
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h
usage: llama model prompt-format [-h] [-m MODEL_NAME]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Model Family (llama3_1, llama3_X, etc.)
Example:
llama model prompt-format <options>
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format --model-name llama3_1
usage: llama model prompt-format [-h] [-m MODEL_NAME]
llama model prompt-format: error: llama3_1 is not a valid Model. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct
```
Output of `llama model prompt-format -h` After:
```
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h
usage: llama model prompt-format [-h] [-m MODEL_NAME]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Example: Llama3.1-8B or Llama3.2-11B-Vision, etc
(Run `llama model list` to see a list of valid model names)
Example:
llama model prompt-format <options>
```
Signed-off-by: Alina Ryan <aliryan@redhat.com>
# What does this PR do?
Updated all instances of datetime.now() to use timezone.utc for
consistency in handling time across different systems. This ensures that
timestamps are always in Coordinated Universal Time (UTC), avoiding
issues with time zone discrepancies and promoting uniformity in
time-related data.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
currently the `inspect` API for providers is really a `list` API. Create
a new `providers` API which has a GET `providers/{provider_id}` inspect
API
which returns "user friendly" configuration to the end user. Also add a
GET `/providers` endpoint which returns the list of providers as
`inspect/providers` does today.
This API follows CRUD and is more intuitive/RESTful.
This work is part of the RFC at
https://github.com/meta-llama/llama-stack/pull/1359
sensitive fields are redacted using `redact_sensetive_fields` on the
server side before returning a response:
<img width="456" alt="Screenshot 2025-03-13 at 4 40 21 PM"
src="https://github.com/user-attachments/assets/9465c221-2a26-42f8-a08a-6ac4a9fecce8"
/>
## Test Plan
using https://github.com/meta-llama/llama-stack-client-python/pull/181 a
user is able to to run the following:
`llama stack build --template ollama --image-type venv`
`llama stack run --image-type venv
~/.llama/distributions/ollama/ollama-run.yaml`
`llama-stack-client providers inspect ollama`
<img width="378" alt="Screenshot 2025-03-13 at 4 39 35 PM"
src="https://github.com/user-attachments/assets/8273d05d-8bc3-44c6-9e4b-ef95e48d5466"
/>
also, was able to run the new test_list integration test locally with
ollama:
<img width="1509" alt="Screenshot 2025-03-13 at 11 03 40 AM"
src="https://github.com/user-attachments/assets/9b9db166-f02f-45b0-86a4-306d85149bc8"
/>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
- Fix issue w/ passthrough provider
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
llama stack run
[//]: # (## Documentation)
Summary:
This is not used anywhere.
closes#1421
Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct --record-responses
Summary:
1. adds option to not use bwrap for code execution
2. disable bwrap when running tests on macs
Test Plan:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct
```
Verify code_interpreter result in logs
INFO 2025-03-11 08:10:39,858
llama_stack.providers.inline.agents.meta_reference.agent_instance:1032
agents: tool
call code_interpreter completed with result:
content='completed\n\n541\n' error_message=None error_code=None
metadata=None
Summary:
Refactoring only.
Centralize logic to preprocess toolgroup to one place.
Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/api/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1381).
* #1384
* __->__ #1381
# What does this PR do?
This change adds a compact type to include metrics in response as
opposed to the full MetricEvent which is relevant for internal logging
purposes.
## Test Plan
```
LLAMA_STACK_CONFIG=~/.llama/distributions/fireworks/fireworks-run.yaml pytest -s -v agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model meta-llama/Llama-3.1-8B-Instruct
llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
curl --request POST \
--url http://localhost:8321/v1/inference/chat-completion \
--header 'content-type: application/json' \
--data '{
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "where do humans live"
}
}
],
"stream": false
}'
{
"metrics": [
{
"metric": "prompt_tokens",
"value": 10,
"unit": null
},
{
"metric": "completion_tokens",
"value": 522,
"unit": null
},
{
"metric": "total_tokens",
"value": 532,
"unit": null
}
],
"completion_message": {
"role": "assistant",
"content": "Humans live in various parts of the world...............",
"stop_reason": "out_of_tokens",
"tool_calls": []
},
"logprobs": null
}
```
# What does this PR do?
This fixes the build error
## Test Plan
pre-commit run --all-files
check for merge
conflicts................................................Passed
trim trailing
whitespace.................................................Passed
check for added large
files..............................................Passed
fix end of
files.........................................................Passed
Insert license in
comments...............................................Passed
ruff.....................................................................Passed
ruff-format..............................................................Passed
blacken-docs.............................................................Passed
uv-lock..................................................................Passed
uv-export................................................................Passed
mypy.....................................................................Passed
Distribution Template
Codegen............................................Passed
# What does this PR do?
This PR adds back the changes in #1300 which were reverted in #1476 .
It also adds logic to preserve context variables across asyncio
boundary. this is needed with the library client since the async
generator logic yields control to code outside the event loop, and on
resuming, does not have the same context as before and this requires
preserving the context vars.
address #1477
## Test Plan
```
curl --request POST \
--url http://localhost:8321/v1/inference/chat-completion \
--header 'content-type: application/json' \
--data '{
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "where do humans live"
}
}
],
"stream": false
}' | jq .
{
"metrics": [
{
"trace_id": "kCZwO3tyQC-FuAGb",
"span_id": "bsP_5a5O",
"timestamp": "2025-03-11T16:47:38.549084Z",
"attributes": {
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"provider_id": "fireworks"
},
"type": "metric",
"metric": "prompt_tokens",
"value": 10,
"unit": "tokens"
},
{
"trace_id": "kCZwO3tyQC-FuAGb",
"span_id": "bsP_5a5O",
"timestamp": "2025-03-11T16:47:38.549449Z",
"attributes": {
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"provider_id": "fireworks"
},
"type": "metric",
"metric": "completion_tokens",
"value": 369,
"unit": "tokens"
},
{
"trace_id": "kCZwO3tyQC-FuAGb",
"span_id": "bsP_5a5O",
"timestamp": "2025-03-11T16:47:38.549457Z",
"attributes": {
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"provider_id": "fireworks"
},
"type": "metric",
"metric": "total_tokens",
"value": 379,
"unit": "tokens"
}
],
"completion_message": {
"role": "assistant",
"content": "Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. **Continents:** Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. **Countries:** There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. **Cities and towns:** Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. **Rural areas:** Some humans live in rural areas, such as villages, farms, and countryside.\n5. **Islands:** Humans inhabit many islands around the world, including those in the Pacific, Indian, and Atlantic Oceans.\n6. **Mountains and highlands:** Humans live in mountainous regions, such as the Himalayas, the Andes, and the Rocky Mountains.\n7. **Deserts:** Some humans live in desert regions, such as the Sahara, the Mojave, and the Atacama.\n8. **Coastal areas:** Many humans live in coastal areas, such as beaches, ports, and coastal cities.\n9. **Underwater habitats:** A few humans live in underwater habitats, such as research stations and submarines.\n10. **Space:** A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nOverall, humans can be found living in almost every environment on Earth, from the frozen tundra to the hottest deserts, and from the highest mountains to the deepest oceans.",
"stop_reason": "end_of_turn",
"tool_calls": []
},
"logprobs": null
}
```
Orignal repro no longer showing any error:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```
client logs:
https://gist.github.com/dineshyv/047c7e87b18a5792aa660e311ea53166
server logs:
https://gist.github.com/dineshyv/97a2174099619e9916c7c490be26e559
# What does this PR do?
- fix precommit
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
CI
[//]: # (## Documentation)
## What does this PR do?
We noticed that the passthrough inference provider doesn't work agent
due to the type mis-match between client and server. We manually cast
the llama stack client type to llama stack server type to fix the issue.
## test
run `python -m examples.agents.hello localhost 8321` within
llama-stack-apps
<img width="1073" alt="Screenshot 2025-03-11 at 8 43 44 PM"
src="https://github.com/user-attachments/assets/bd1bdd31-606a-420c-a249-95f6184cc0b1"
/>
fix https://github.com/meta-llama/llama-stack/issues/1560
## What does this PR do?
As title, add codegen for open-benchmark template
## test
checked the new generated run.yaml file and it's identical before and
after the change
Also add small improvement to together template so that missing
TOGETHER_API_KEY won't crash the server which is the consistent user
experience as other remote providers
# What does this PR do?
uvicorn has a `log_level` arg in uvicorn.run, pass in the effective
level set by the logger.
Additionally, third party libraries like httpx are using our logging
format, but not honoring our log level.
This seems unintended, so loop through all items in the loggerDict and
apply the same log level as what we have set.
## Test Plan
before:
```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Environment variable LLAMA_STACK_LOGGING found: all=warn
Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Environment variable LLAMA_STACK_LOGGING found: all=warn
WARNING 2025-03-10 16:05:49,706 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will
not work correctly.
INFO 2025-03-10 16:05:49,916 datasets:54 uncategorized: PyTorch version 2.5.1 available.
INFO 2025-03-10 16:05:50,010 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200
OK"
INFO 2025-03-10 16:05:50,297 httpx:1740 uncategorized: HTTP Request: POST http://localhost:11434/api/pull "HTTP/1.1
200 OK"
INFO 2025-03-10 16:05:50,314 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/tags "HTTP/1.1
200 OK"
INFO: Started server process [89663]
INFO: Waiting for application startup.
INFO: ASGI 'lifespan' protocol appears unsupported.
INFO: Application startup complete.
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```
after:
```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Environment variable LLAMA_STACK_LOGGING found: all=warn
Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Environment variable LLAMA_STACK_LOGGING found: all=warn
WARNING 2025-03-10 16:05:20,429 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will
not work correctly.
INFO 2025-03-10 16:05:20,639 datasets:54 uncategorized: PyTorch version 2.5.1 available.
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Bug https://github.com/meta-llama/llama-stack/issues/1357
# What does this PR do?
Fix a bug of a wrong file name in inline::localfs datasetio provider
[//]: # (If resolving an issue, uncomment and update the line below)
# (Closes#1357)
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Josh Salomon <jsalomon@redhat.com>
# What does this PR do?
- recent merge https://github.com/meta-llama/llama-stack/pull/1410
introduce error
```
ValueError: Provider meta-reference (Api.agents) does not implement the following methods:
[('list_agent_sessions', 'not_actually_implemented'), ('list_agents', 'not_actually_implemented')]
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
llama stack run
```
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct
```
1379530386
[//]: # (## Documentation)
# What does this PR do?
remove Llama-3.2-1B-Instruct for fireworks as its no longer appears to
be hosted on website.
## Test Plan
python distro_codegen.py
# What does this PR do?
setting $LLAMA_STACK_LOG_FILE will pipe the logs to a file as well as
stdout. this is done by using a logging FileHandler
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
Add support for listing agents, describing an agent, and retrieving
session IDs for a given agent. This is only the API definition, the
implementations will come separately.
Closes: https://github.com/meta-llama/llama-stack/issues/1294
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
If implementation raises CancelledError (e.g. when it runs its own async
loop for jobs), the main server shutdown handler gets confused and
doesn't attempt to shut down the main loop tasks.
While at it, also fixing the following failure when this happens:
```
UnboundLocalError: cannot access local variable 'loop' where it is not
associated with a value
```
Shutdown handlers were not running because lifespan logic was broken
since ~Oct 2024. Fixed that too and enforcing `lifespan` now (making
sure server will crash when it fails to interact with app through
middleware).
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Spotted while working on
https://github.com/meta-llama/llama-stack/pull/1437
One way to trigger it without the PR above is to add `raise
CancelledError` in
any of the running providers' `shutdown` methods; then `kill -INT <pid>`
the
server process.
Validated this with the following test patch:
```
diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py
index b85c463a..10dad83e 100644
--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -174,6 +174,7 @@ def handle_signal(app, signum, _) -> None:
except asyncio.CancelledError:
pass
finally:
+ logger.info("Stopping event loop")
loop.stop()
loop = asyncio.get_running_loop()
diff --git a/llama_stack/providers/inline/post_training/torchtune/post_training.py b/llama_stack/providers/inline/post_training/torchtune/post_training.py
index b837362d..163f43d8 100644
--- a/llama_stack/providers/inline/post_training/torchtune/post_training.py
+++ b/llama_stack/providers/inline/post_training/torchtune/post_training.py
@@ -3,6 +3,7 @@
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
+import asyncio
from datetime import datetime
from typing import Any, Dict, Optional
@@ -43,6 +44,9 @@ class TorchtunePostTrainingImpl:
self.jobs = {}
self.checkpoints_dict = {}
+ async def shutdown(self) -> None:
+ raise asyncio.CancelledError("Shutdown")
+
async def supervised_fine_tune(
self,
job_uuid: str,
```
Without the fix:
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Finished server process [52099]
INFO 2025-03-07 23:25:33,548 __main__:143 server: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-03-07 23:25:33,550 __main__:150 server: Shutting down DatasetsRoutingTable
INFO 2025-03-07 23:25:33,551 __main__:177 server: Stopping event loop
ERROR 2025-03-07 23:25:33,552 asyncio:1785 uncategorized: unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-12' coro=<handle_signal.<locals>.shutdown() done, defined at
/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:145>
exception=UnboundLocalError("cannot access local variable 'loop' where it is not associated with a value")>
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:178 in shutdown │
│ │
│ 175 │ │ │ pass │
│ 176 │ │ finally: │
│ 177 │ │ │ logger.info("Stopping event loop") │
│ ❱ 178 │ │ │ loop.stop() │
│ 179 │ │
│ 180 │ loop = asyncio.get_running_loop() │
│ 181 │ loop.create_task(shutdown()) │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
UnboundLocalError: cannot access local variable 'loop' where it is not associated with a value
```
With the fix, now seeing the following messages when the server is
killed:
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Finished server process [50836]
INFO 2025-03-07 23:20:35,182 __main__:143 server: Received signal SIGINT (2). Exiting gracefully...
INFO 2025-03-07 23:20:35,184 __main__:149 server: Shutting down DatasetsRoutingTable
ERROR 2025-03-07 23:20:35,185 __main__:158 server: Failed to shutdown DatasetsRoutingTable: {CancelledError()}
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /usr/lib64/python3.11/asyncio/tasks.py:476 in wait_for │
│ │
│ 473 │ try: │
│ 474 │ │ # wait until the future completes or the timeout │
│ 475 │ │ try: │
│ ❱ 476 │ │ │ await waiter │
│ 477 │ │ except exceptions.CancelledError: │
│ 478 │ │ │ if fut.done(): │
│ 479 │ │ │ │ return fut.result() │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CancelledError
During handling of the above exception, another exception occurred:
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown │
│ │
│ 149 │ │ │ logger.info("Shutting down %s", impl_name) │
│ 150 │ │ │ try: │
│ 151 │ │ │ │ if hasattr(impl, "shutdown"): │
│ ❱ 152 │ │ │ │ │ await asyncio.wait_for(impl.shutdown(), timeout=5) │
│ 153 │ │ │ │ else: │
│ 154 │ │ │ │ │ logger.warning("No shutdown method for %s", impl_name) │
│ 155 │ │ │ except asyncio.TimeoutError: │
│ │
│ /usr/lib64/python3.11/asyncio/tasks.py:479 in wait_for │
│ │
│ 476 │ │ │ await waiter │
│ 477 │ │ except exceptions.CancelledError: │
│ 478 │ │ │ if fut.done(): │
│ ❱ 479 │ │ │ │ return fut.result() │
│ 480 │ │ │ else: │
│ 481 │ │ │ │ fut.remove_done_callback(cb) │
│ 482 │ │ │ │ # We must ensure that the task is not running │
│ │
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/routers/routing_tables.py:131 in shutdown │
│ │
│ 128 │ │ │ elif api == Api.tool_runtime: │
│ 129 │ │ │ │ p.tool_store = self │
│ 130 │ │
│ ❱ 131 │ async def shutdown(self) -> None: │
│ 132 │ │ for p in self.impls_by_provider_id.values(): │
│ 133 │ │ │ await p.shutdown() │
│ 134 │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CancelledError
INFO 2025-03-07 23:20:35,295 __main__:149 server: Shutting down DatasetIORouter
INFO 2025-03-07 23:20:35,296 __main__:149 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-07 23:20:35,297 __main__:149 server: Shutting down ScoringRouter
INFO 2025-03-07 23:20:35,298 __main__:149 server: Shutting down ModelsRoutingTable
INFO 2025-03-07 23:20:35,299 __main__:149 server: Shutting down InferenceRouter
INFO 2025-03-07 23:20:35,300 __main__:149 server: Shutting down ShieldsRoutingTable
INFO 2025-03-07 23:20:35,300 __main__:149 server: Shutting down SafetyRouter
INFO 2025-03-07 23:20:35,301 __main__:149 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-07 23:20:35,302 __main__:149 server: Shutting down VectorIORouter
INFO 2025-03-07 23:20:35,303 __main__:149 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-07 23:20:35,304 __main__:149 server: Shutting down ToolRuntimeRouter
INFO 2025-03-07 23:20:35,304 __main__:149 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-07 23:20:35,305 __main__:149 server: Shutting down TelemetryAdapter
INFO 2025-03-07 23:20:35,306 __main__:149 server: Shutting down TorchtunePostTrainingImpl
ERROR 2025-03-07 23:20:35,307 __main__:158 server: Failed to shutdown TorchtunePostTrainingImpl:
{CancelledError('Shutdown')}
╭───────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────╮
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py:152 in shutdown │
│ │
│ 149 │ │ │ logger.info("Shutting down %s", impl_name) │
│ 150 │ │ │ try: │
│ 151 │ │ │ │ if hasattr(impl, "shutdown"): │
│ ❱ 152 │ │ │ │ │ await asyncio.wait_for(impl.shutdown(), timeout=5) │
│ 153 │ │ │ │ else: │
│ 154 │ │ │ │ │ logger.warning("No shutdown method for %s", impl_name) │
│ 155 │ │ │ except asyncio.TimeoutError: │
│ │
│ /usr/lib64/python3.11/asyncio/tasks.py:489 in wait_for │
│ │
│ 486 │ │ │ │ raise │
│ 487 │ │ │
│ 488 │ │ if fut.done(): │
│ ❱ 489 │ │ │ return fut.result() │
│ 490 │ │ else: │
│ 491 │ │ │ fut.remove_done_callback(cb) │
│ 492 │ │ │ # We must ensure that the task is not running │
│ │
│ /home/ec2-user/src/llama-stack/schedule/llama_stack/providers/inline/post_training/torchtune/post_training. │
│ py:48 in shutdown │
│ │
│ 45 │ │ self.checkpoints_dict = {} │
│ 46 │ │
│ 47 │ async def shutdown(self) -> None: │
│ ❱ 48 │ │ raise asyncio.CancelledError("Shutdown") │
│ 49 │ │
│ 50 │ async def supervised_fine_tune( │
│ 51 │ │ self, │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
CancelledError: Shutdown
INFO 2025-03-07 23:20:35,352 __main__:149 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-07 23:20:35,353 __main__:149 server: Shutting down EvalRouter
INFO 2025-03-07 23:20:35,354 __main__:149 server: Shutting down DistributionInspectImpl
INFO 2025-03-07 23:20:35,355 __main__:177 server: Stopping event loop
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 488, in <module>
main()
File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 476, in main
uvicorn.run(**uvicorn_config)
File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/main.py", line 579, in run
server.run()
File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/uvicorn/server.py", line 66, in run
return asyncio.run(self.serve(sockets=sockets))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/asyncio/runners.py", line 189, in run
with Runner(debug=debug) as runner:
File "/usr/lib64/python3.11/asyncio/runners.py", line 63, in __exit__
self.close()
File "/usr/lib64/python3.11/asyncio/runners.py", line 71, in close
_cancel_all_tasks(loop)
File "/usr/lib64/python3.11/asyncio/runners.py", line 201, in _cancel_all_tasks
loop.run_until_complete(tasks.gather(*to_cancel, return_exceptions=True))
File "/usr/lib64/python3.11/asyncio/base_events.py", line 652, in run_until_complete
raise RuntimeError('Event loop stopped before Future completed.')
RuntimeError: Event loop stopped before Future completed.
++ error_handler 104
++ echo 'Error occurred in script at line: 104'
Error occurred in script at line: 104
++ exit 1
```
With all patches included, the shutdown now looks as follows:
```
$ kill -INT $(ps ax | grep llama_stack.distribution.server.server | grep -v nvim | awk -e '{print $1}' | sort | head -n 1)
```
```
20:56:09.308 [START]
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO 2025-03-10 20:56:43,961 __main__:140 server: Shutting down
INFO 2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable
INFO 2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter
INFO 2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter
INFO 2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable
INFO 2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter
INFO 2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable
INFO 2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter
INFO 2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter
INFO 2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter
INFO 2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter
INFO 2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl
WARNING 2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl
INFO 2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter
INFO 2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl
INFO: Application shutdown complete.
INFO: Finished server process [33862]
```
[//]: # (## Documentation)
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
It's a dict that may contain different types, as per
resolver:instantiate_provider implementation. (AFAIU it also never
contains ProviderSpecs, but *instances* of provider implementations.)
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
mypy passing if enabled checks for these modules. (See #1543)
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Added missing shutdown handler. (Currently empty.)
Without it, when server shuts down, it posts the following warning:
```
__main__:129 server: No shutdown method for TorchtunePostTrainingImpl
```
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
(The test plan assumes shutdown logic is fixed, see #1495)
Without the patch:
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO 2025-03-10 20:56:43,961 __main__:140 server: Shutting down
INFO 2025-03-10 20:56:43,962 __main__:124 server: Shutting down DatasetsRoutingTable
INFO 2025-03-10 20:56:43,964 __main__:124 server: Shutting down DatasetIORouter
INFO 2025-03-10 20:56:43,965 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-10 20:56:43,966 __main__:124 server: Shutting down ScoringRouter
INFO 2025-03-10 20:56:43,967 __main__:124 server: Shutting down ModelsRoutingTable
INFO 2025-03-10 20:56:43,968 __main__:124 server: Shutting down InferenceRouter
INFO 2025-03-10 20:56:43,969 __main__:124 server: Shutting down ShieldsRoutingTable
INFO 2025-03-10 20:56:43,971 __main__:124 server: Shutting down SafetyRouter
INFO 2025-03-10 20:56:43,972 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-10 20:56:43,973 __main__:124 server: Shutting down VectorIORouter
INFO 2025-03-10 20:56:43,974 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-10 20:56:43,975 __main__:124 server: Shutting down ToolRuntimeRouter
INFO 2025-03-10 20:56:43,976 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-10 20:56:43,977 __main__:124 server: Shutting down TelemetryAdapter
INFO 2025-03-10 20:56:43,978 __main__:124 server: Shutting down TorchtunePostTrainingImpl
WARNING 2025-03-10 20:56:43,979 __main__:129 server: No shutdown method for TorchtunePostTrainingImpl
INFO 2025-03-10 20:56:43,979 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-10 20:56:43,980 __main__:124 server: Shutting down EvalRouter
INFO 2025-03-10 20:56:43,981 __main__:124 server: Shutting down DistributionInspectImpl
INFO: Application shutdown complete.
INFO: Finished server process [33862]
```
Run with the patch and observe no warning:
```
$ kill -INT $(ps ax | grep llama_stack.distribution.server.server | grep -v nvim | awk -e '{print $1}' | sort | head -n 1)
```
```
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO 2025-03-11 00:32:56,863 __main__:140 server: Shutting down
INFO 2025-03-11 00:32:56,864 __main__:124 server: Shutting down DatasetsRoutingTable
INFO 2025-03-11 00:32:56,866 __main__:124 server: Shutting down DatasetIORouter
INFO 2025-03-11 00:32:56,867 __main__:124 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-11 00:32:56,868 __main__:124 server: Shutting down ScoringRouter
INFO 2025-03-11 00:32:56,869 __main__:124 server: Shutting down ModelsRoutingTable
INFO 2025-03-11 00:32:56,870 __main__:124 server: Shutting down InferenceRouter
INFO 2025-03-11 00:32:56,871 __main__:124 server: Shutting down ShieldsRoutingTable
INFO 2025-03-11 00:32:56,872 __main__:124 server: Shutting down SafetyRouter
INFO 2025-03-11 00:32:56,873 __main__:124 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-11 00:32:56,874 __main__:124 server: Shutting down VectorIORouter
INFO 2025-03-11 00:32:56,875 __main__:124 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-11 00:32:56,876 __main__:124 server: Shutting down ToolRuntimeRouter
INFO 2025-03-11 00:32:56,877 __main__:124 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-11 00:32:56,878 __main__:124 server: Shutting down TelemetryAdapter
INFO 2025-03-11 00:32:56,879 __main__:124 server: Shutting down TorchtunePostTrainingImpl
INFO 2025-03-11 00:32:56,880 __main__:124 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-11 00:32:56,881 __main__:124 server: Shutting down EvalRouter
INFO 2025-03-11 00:32:56,882 __main__:124 server: Shutting down DistributionInspectImpl
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>