# What does this PR do?
Clean up telemetry code since the telemetry API has been remove.
- moved telemetry files out of providers to core
- removed from Api
## Test Plan
❯ OTEL_SERVICE_NAME=llama_stack
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 uv run llama stack run
starter
❯ curl http://localhost:8321/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
-> verify traces in Grafana
CI
- Moved environment variable parsing and `setup_logging()` call from
module level to proper initialization points
- Added explicit `setup_logging()` calls in `server.py::create_app()`
and `library_client.py::AsyncLlamaStackAsLibraryClient.__init__()`
Module-level side effects are bad practice and can cause issues with
import order, testing, and circular dependencies. The previous
implementation ran logging setup on every import of the log module,
which is unpredictable and difficult to control.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Propagate test IDs from client to server via HTTP headers to maintain
proper test isolation when running with server-based stack configs.
Without
this, recorded/replayed inference requests in server mode would leak
across
tests.
Changes:
- Patch client _prepare_request to inject test ID into provider data
header
- Sync test context from provider data on server side before storage
operations
- Set LLAMA_STACK_TEST_STACK_CONFIG_TYPE env var based on stack config
- Configure console width for cleaner log output in CI
- Add SQLITE_STORE_DIR temp directory for test data isolation
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Addresses Issue #3271 - "Starting LLS server locally on a terminal with
120 chars width results in an output with empty lines".
This removes the specific 150-character width limit specified for the
Console, and will now auto-detect the terminal width instead. Now the
formatting of Console output is consistent across different sizes of
terminal windows.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes#3271
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Launching the server with several different sizes of terminal windows
results in Console output without unexpected spacing. e.g. `python -m
llama_stack.core.server.server /tmp/run.yaml --port 8321`
---------
Signed-off-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
# What does this PR do?
- categories like "core::server" is not recognized so it's level is not
set by 'all=debug'
- removed spammy telemetry debug logging
## Test Plan
test server launched with LLAMA_STACK_LOGGING='all=debug'
# What does this PR do?
This PR adds a step in pre-commit to enforce using `llama_stack` logger.
Currently, various parts of the code base uses different loggers. As a
custom `llama_stack` logger exist and used in the codebase, it is better
to standardize its utilization.
Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
# What does this PR do?
commands where the output is important like `llama stack build
--print-deps-only` (soon to be `llama stack show`) print some log.py
`cprint`'s on _every_ execution of the CLI
for example:
<img width="912" height="331" alt="Screenshot 2025-08-18 at 1 16 30 PM"
src="https://github.com/user-attachments/assets/e5bf18fb-74a1-438c-861a-8a26eea7d014"
/>
the yellow text is likely unnecessary.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Some fixes to MCP tests. And a bunch of fixes for Vector providers.
I also enabled a bunch of Vector IO tests to be used with
`LlamaStackLibraryClient`
## Test Plan
Run Responses tests with llama stack library client:
```
pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \
--text-model openai/gpt-4o \
--embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
-k "client_with_models"
```
Do the same with `-k openai_client`
The rest should be taken care of by CI.
Using commas is much more shell-friendly. A semi-colon is a statement
delimiter and must be escaped.
This change is backwards incompatible but I imagine not many people are
using this. I could be wrong. Looking for feedback.
# What does this PR do?
currently `print` is being used with custom formatting to achieve
telemetry output in the console_span_processor
This causes telemetry not to show up in log files when using
`LLAMA_STACK_LOG_FILE`. During testing it looks like telemetry is not
being captured when it is
switch to using Rich formatting with the logger and then strip the
formatting off when a log file is being used so the formatting looks
normal
## Test Plan
before:
console:
<img width="967" height="127" alt="Screenshot 2025-07-21 at 4 02 15 PM"
src="https://github.com/user-attachments/assets/b09518cc-9d38-4970-9877-70e2c41fcbb5"
/>
log file (no telemetry):
```
2025-07-21 16:01:32,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`...
2025-07-21 16:01:34,779 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed
2025-07-21 16:01:35,083 __main__:587 server: Listening on ['::', '0.0.0.0']:8321
2025-07-21 16:01:35,091 uvicorn.error:84 uncategorized: Started server process [68679]
2025-07-21 16:01:35,091 uvicorn.error:48 uncategorized: Waiting for application startup.
2025-07-21 16:01:35,092 __main__:163 server: Starting up
2025-07-21 16:01:35,092 uvicorn.error:62 uncategorized: Application startup complete.
2025-07-21 16:01:35,092 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
2025-07-21 16:01:37,167 uvicorn.access:473 uncategorized: 127.0.0.1:53145 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200
```
after:
console:
<img width="797" height="165" alt="Screenshot 2025-07-22 at 3 28 44 PM"
src="https://github.com/user-attachments/assets/44d40e3b-6502-439d-9ea5-38058b289962"
/>
log file:
```
2025-07-21 15:59:51,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`...
2025-07-21 15:59:53,801 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed
2025-07-21 15:59:54,059 __main__:587 server: Listening on ['::', '0.0.0.0']:8321
2025-07-21 15:59:54,066 uvicorn.error:84 uncategorized: Started server process [68578]
2025-07-21 15:59:54,067 uvicorn.error:48 uncategorized: Waiting for application startup.
2025-07-21 15:59:54,067 __main__:163 server: Starting up
2025-07-21 15:59:54,067 uvicorn.error:62 uncategorized: Application startup complete.
2025-07-21 15:59:54,068 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
2025-07-21 15:59:55,381 [TELEMETRY] 19:59:55.381 /v1/openai/v1/chat/completions
2025-07-21 15:59:55,619 uvicorn.access:473 uncategorized: 127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200
2025-07-21 15:59:55,621 [TELEMETRY] 19:59:55.621 /v1/openai/v1/chat/completions [StatusCode.OK] (240.07ms)
2025-07-21 15:59:55,622 [TELEMETRY] 19:59:55.620 127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
* Given that our API packages use "import *" in `__init.py__` we don't
need to do `from llama_stack.apis.models.models` but simply from
llama_stack.apis.models. The decision to use `import *` is debatable and
should probably be revisited at one point.
* Remove unneeded Ruff F401 rule
* Consolidate Ruff F403 rule in the pyprojectfrom
llama_stack.apis.models.models
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
The test wasn't using the correct virtual environment. Also augment the
console width for logs.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
The goal of this PR is code base modernization.
Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)
Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Set a formatter for log file handler that does not pollute log messages
with color tags.
## Test Plan
Successfully tested with `LLAMA_STACK_LOG_FILE=server.log llama stack
run ...`
# What does this PR do?
a user should be able to store a static logging configuration outside of
their environment. This would make sense to store in the run yaml given
that we store other things like server configuration in there.
The environment variable settings override the config settings if both
are available.
The format in the config looks like this:
```
logging_config:
category_levels:
VALID_CATEGORY: VALID_STRING_LOG_LEVEL
```
any specified category out of the following:
`core | server | router | inference | agents | safety | eval | tools |
client`
combined with any of the following log levels:
`debug | info | warning | error | critical`
can be placed in the category_levels list in order to achieve the
desired log level
## Test Plan
Test locally with a run config like the following:
```
version: '2'
image_name: ollama
logging_config:
category_levels:
server: debug
apis:
...
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
uvicorn has a `log_level` arg in uvicorn.run, pass in the effective
level set by the logger.
Additionally, third party libraries like httpx are using our logging
format, but not honoring our log level.
This seems unintended, so loop through all items in the loggerDict and
apply the same log level as what we have set.
## Test Plan
before:
```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Environment variable LLAMA_STACK_LOGGING found: all=warn
Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Environment variable LLAMA_STACK_LOGGING found: all=warn
WARNING 2025-03-10 16:05:49,706 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will
not work correctly.
INFO 2025-03-10 16:05:49,916 datasets:54 uncategorized: PyTorch version 2.5.1 available.
INFO 2025-03-10 16:05:50,010 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200
OK"
INFO 2025-03-10 16:05:50,297 httpx:1740 uncategorized: HTTP Request: POST http://localhost:11434/api/pull "HTTP/1.1
200 OK"
INFO 2025-03-10 16:05:50,314 httpx:1740 uncategorized: HTTP Request: GET http://localhost:11434/api/tags "HTTP/1.1
200 OK"
INFO: Started server process [89663]
INFO: Waiting for application startup.
INFO: ASGI 'lifespan' protocol appears unsupported.
INFO: Application startup complete.
INFO: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```
after:
```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Environment variable LLAMA_STACK_LOGGING found: all=warn
Using virtual environment: /Users/charliedoern/projects/Documents/llama-stack/venv
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Environment variable LLAMA_STACK_LOGGING found: all=warn
WARNING 2025-03-10 16:05:20,429 root:71 uncategorized: Warning: `bwrap` is not available. Code interpreter tool will
not work correctly.
INFO 2025-03-10 16:05:20,639 datasets:54 uncategorized: PyTorch version 2.5.1 available.
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
setting $LLAMA_STACK_LOG_FILE will pipe the logs to a file as well as
stdout. this is done by using a logging FileHandler
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
currently the `"Environment variable LLAMA_STACK_LOGGING found"` message
is printed with no color switch to cprint and highlight in yellow for
visibility
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Summary:
| File
"/Users/erichuang/projects/llama-stack/llama_stack/distribution/server/server.py",
line 213, in sse_generator
| logger.exception(f"Error in sse_generator: {e}")
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1864, in exception
| self.log(ERROR, msg, *args, exc_info=exc_info, **kwargs)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1879, in log
| self.logger.log(level, msg, *args, **kwargs)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1547, in log
| self._log(level, msg, args, **kwargs)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1624, in _log
| self.handle(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1634, in handle
| self.callHandlers(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 1696, in callHandlers
| hdlr.handle(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/logging/__init__.py",
line 968, in handle
| self.emit(record)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py",
line 167, in emit
| message_renderable = self.render_message(record, message)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/logging.py",
line 193, in render_message
| message_text = Text.from_markup(message) if use_markup else
Text(message)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/text.py",
line 287, in from_markup
| rendered_text = render(text, style, emoji=emoji,
emoji_variant=emoji_variant)
| File
"/opt/homebrew/Caskroom/miniconda/base/envs/myenv/lib/python3.10/site-packages/rich/markup.py",
line 167, in render
| raise MarkupError(
| rich.errors.MarkupError: closing tag '[/INST]' at position 105 doesn't
match any open tag
Test Plan:
reran failing rag_with_vector_db example
# What does this PR do?
This commit introduces a new logging system that allows loggers to be
assigned
a category while retaining the logger name based on the file name. The
log
format includes both the logger name and the category, producing output
like:
```
INFO 2025-03-03 21:44:11,323 llama_stack.distribution.stack:103 [core]: Tool_groups: builtin::websearch served by
tavily-search
```
Key features include:
- Category-based logging: Loggers can be assigned a category (e.g.,
"core", "server") when programming. The logger can be loaded like
this: `logger = get_logger(name=__name__, category="server")`
- Environment variable control: Log levels can be configured
per-category using the
`LLAMA_STACK_LOGGING` environment variable. For example:
`LLAMA_STACK_LOGGING="server=DEBUG;core=debug"` enables DEBUG level for
the "server"
and "core" categories.
- `LLAMA_STACK_LOGGING="all=debug"` sets DEBUG level globally for all
categories and
third-party libraries.
This provides fine-grained control over logging levels while maintaining
a clean and
informative log format.
The formatter uses the rich library which provides nice colors better
stack traces like so:
```
ERROR 2025-03-03 21:49:37,124 asyncio:1758 [uncategorized]: unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-16' coro=<handle_signal.<locals>.shutdown() done, defined at
/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:146>
exception=UnboundLocalError("local variable 'loop' referenced before assignment")>
╭────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮
│ /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py:178 in shutdown │
│ │
│ 175 │ │ except asyncio.CancelledError: │
│ 176 │ │ │ pass │
│ 177 │ │ finally: │
│ ❱ 178 │ │ │ loop.stop() │
│ 179 │ │
│ 180 │ loop = asyncio.get_running_loop() │
│ 181 │ loop.create_task(shutdown()) │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
UnboundLocalError: local variable 'loop' referenced before assignment
```
Co-authored-by: Ashwin Bharambe <@ashwinb>
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml
INFO 2025-03-03 21:55:35,918 __main__:365 [server]: Using config file: llama_stack/templates/ollama/run.yaml
INFO 2025-03-03 21:55:35,925 __main__:378 [server]: Run configuration:
INFO 2025-03-03 21:55:35,928 __main__:380 [server]: apis:
- agents
```
[//]: # (## Documentation)
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>