A bunch of miscellaneous cleanup focusing on tests, but ended up
speeding up starter distro substantially.
- Pulled llama stack client init for tests into `pytest_sessionstart` so
it does not clobber output
- Profiling of that told me where we were doing lots of heavy imports
for starter, so lazied them
- starter now starts 20seconds+ faster on my Mac
- A few other smallish refactors for `compat_client`
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Extend the Shields Protocol and implement the capability to unregister
previously registered shields and CLI for shields management.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes#2581
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
First of, test API for shields
1. Install and start Ollama:
`ollama serve`
2. Pull Llama Guard Model in Ollama:
`ollama pull llama-guard3:8b`
3. Configure env variables:
```
export ENABLE_OLLAMA=ollama
export OLLAMA_URL=http://localhost:11434
```
4. Build Llama Stack distro:
`llama stack build --template starter --image-type venv `
5. Start Llama Stack server:
`llama stack run starter --port 8321`
6. Check if Ollama model is available:
`curl -X GET http://localhost:8321/v1/models | jq '.data[] |
select(.provider_id=="ollama")'`
7. Register a new Shield using Ollama provider:
```
curl -X POST http://localhost:8321/v1/shields \
-H "Content-Type: application/json" \
-d '{
"shield_id": "test-shield",
"provider_id": "llama-guard",
"provider_shield_id": "ollama/llama-guard3:8b",
"params": {}
}'
```
`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`
8. Check if shield was registered:
`curl -X GET http://localhost:8321/v1/shields/test-shield`
`{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}%
`
9. Run shield:
```
curl -X POST http://localhost:8321/v1/safety/run-shield \
-H "Content-Type: application/json" \
-d '{
"shield_id": "test-shield",
"messages": [
{
"role": "user",
"content": "How can I hack into someone computer?"
}
],
"params": {}
}'
```
`{"violation":{"violation_level":"error","user_message":"I can't answer
that. Can I help with something
else?","metadata":{"violation_type":"S2"}}}% `
10. Unregister shield:
`curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`null% `
11. Verify shield was deleted:
`curl -X GET http://localhost:8321/v1/shields/test-shield`
`{"detail":"Invalid value: Shield 'test-shield' not found"}%`
All tests passed ✅
```
========================================================================== 430 passed, 194 warnings in 19.54s ==========================================================================
/Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited
loop.close()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Wrote HTML report to htmlcov-3.12/index.html
```
# What does this PR do?
1. Creates a new `SessionNotFoundError` class
2. Implements the new class where appropriate
Relates to #2379
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
1. Creates a new `ToolGroupNotFoundError` class
2. Implements the new class where appropriate
Relates to #2379
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
As the title says. Distributions is in, Templates is out.
`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.
Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
# What does this PR do?
Remove score_threshold based check from `OpenAIVectorStoreMixin`
Closes: https://github.com/meta-llama/llama-stack/issues/3018
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for removal of Conda support in Llama Stack
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes#2539
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
# What does this PR do?
closes#2995
update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin
## Test Plan
```
$ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct
...
======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ========================
```
# What does this PR do?
Adds support to Vector store Open AI APIs in Qdrant.
<!-- If resolving an issue, uncomment and update the line below -->
Closes#2463
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
# What does this PR do?
This PR (1) enables the files API for Weaviate and (2) enables
integration tests for Weaviate, which adds a docker container to the
github action.
This PR also handles a couple of edge cases for in creating the
collection and ensuring the tests all pass.
## Test Plan
CI enabled
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
Improve user experience by providing specific guidance when no API key
is available, showing both provider data header and config options with
the correct field name for each provider.
Also adds comprehensive test coverage for API key resolution scenarios.
addresses #2990 for providers using litellm openai mixin
## Test Plan
`./scripts/unit-tests.sh
tests/unit/providers/inference/test_litellm_openai_mixin.py`
get_vector_db() will raise an exception if a vector store won't be
returned
client handling is redundant
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
What does this PR do?
This PR adds support for Direct Preference Optimization (DPO) training
via the existing HuggingFace inline provider. It introduces a new DPO
training recipe, config schema updates, dataset integration, and
end-to-end testing to support preference-based fine-tuning with TRL.
Test Plan
Added integration test:
tests/integration/post_training/test_post_training.py::TestPostTraining::test_preference_optimize
Ran tests on both CPU and CUDA environments
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
I've been tinkering a little with a simple chat playground in the UI, so
I'm opening the PR with what's kind of a WIP.
If you look at the first commit, that includes the big part of the
changes. The rest of the files changed come from adding installing the
`shadcn` components.
Note this is missing a lot; e.g.,
- sessions
- document upload
- audio (the shadcn components install these by default from
https://shadcn-chatbot-kit.vercel.app/docs/components/chat)
I still need to wire up a lot more to make it actually fully functional
but it does basic chat using the LS Typescript Client.
Basic demo:
<img width="1329" height="1430" alt="Image"
src="https://github.com/user-attachments/assets/917a2096-36d4-4925-b83b-f1f2cda98698"
/>
<img width="1319" height="1424" alt="Image"
src="https://github.com/user-attachments/assets/fab1583b-1c72-4bf3-baf2-405aee13c6bb"
/>
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
This PR focuses on improving the developer experience by adding
comprehensive docstrings to the API data models across the Llama Stack.
These docstrings provide detailed explanations for each model and its
fields, making the API easier to understand and use.
**Key changes:**
- **Added Docstrings:** Added reST formatted docstrings to Pydantic
models in the `llama_stack/apis/` directory. This includes models for:
- Agents (`agents.py`)
- Benchmarks (`benchmarks.py`)
- Datasets (`datasets.py`)
- Inference (`inference.py`)
- And many other API modules.
- **OpenAPI Spec Update:** Regenerated the OpenAPI specification
(`docs/_static/llama-stack-spec.yaml` and
`docs/_static/llama-stack-spec.html`) to include the new docstrings.
This will be reflected in the API documentation, providing richer
information to users.
**Impact:**
- Developers using the Llama Stack API will have a better understanding
of the data structures.
- The auto-generated API documentation is now more informative.
---------
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
1. Creates a new `VectorStoreNotFoundError` class
2. Implements the new class where appropriate
Relates to #2379
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
1. Adds a broad schema for custom exception classes in the Llama Stack
project
2. Creates a new `DatasetNotFoundError` class
3. Implements the new class where appropriate
Relates to #2379
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
1. Creates a new `ModelNotFoundError` class
2. Implements the new class where appropriate
Relates to #2379
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
We tried to always keep Ollama enabled. However doing so makes the
provider implementation half-assed -- should it error when it cannot
connect to Ollama or not? What happens during periodic model refresh?
Etc. Instead do the same thing we do for vLLM -- use the `OLLAMA_URL` to
conditionally enable the provider.
## Test Plan
Run `uv run llama stack build --template starter --image-type venv
--run` with and without `OLLAMA_URL` set. Verify using
`llama-stack-client provider list` that ollama is correctly enabled.
# What does this PR do?
- Initialize route_impls to None in constructor to prevent
AttributeError
- Consolidate initialization checks to single point in request() method
- Improve error message to be more helpful ("Please call initialize()
first")
- Add comprehensive test suite to prevent regressions
The library client now has better error handling when users forget to
call initialize(), showing a clear ValueError instead of confusing
AttributeError. All initialization validation is now centralized in the
request() method, with internal methods (_call_non_streaming,
_call_streaming, _convert_body) relying on this single check for
cleaner, more maintainable code.
closes#2943
## Test Plan
`./scripts/unit-tests.sh`
# What does this PR do?
when --image-name is not provided the build script default to the
image_name in the config, this makes sure the same is done for the run
script
## Test Plan
llama stack build w/o --image-name
Implements a comprehensive recording and replay system for inference API
calls that eliminates dependency on online inference providers during
testing. The system treats inference as deterministic by recording real
API responses and replaying them in subsequent test runs. Applies to
OpenAI clients (which should cover many inference requests) as well as
Ollama AsyncClient.
For storing, we use a hybrid system: Sqlite for fast lookups and JSON
files for easy greppability / debuggability.
As expected, tests become much much faster (more than 3x in just
inference testing.)
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \
uv run pytest -s -v tests/integration/inference \
--stack-config=starter \
-k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
--text-model="ollama/llama3.2:3b-instruct-fp16" \
--embedding-model=sentence-transformers/all-MiniLM-L6-v2
```
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \
uv run pytest -s -v tests/integration/inference \
--stack-config=starter \
-k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
--text-model="ollama/llama3.2:3b-instruct-fp16" \
--embedding-model=sentence-transformers/all-MiniLM-L6-v2
```
- `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or
`replay`
- `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified
for record or replay modes)
# What does this PR do?
- Change max_seq_length to max_length in SFTConfig constructor
- TRL deprecated max_seq_length in Feb 2024 and removed it in v0.20.0
- Reference: https://github.com/huggingface/trl/pull/2895
This resolves the SFT training failure in CI tests
# What does this PR do?
OpenAI Chat Completions supports passing a base64 encoded PDF file to a
model, but Llama Stack currently does not allow for this behavior. This
PR extends our implementation of the OpenAI API spec to change that.
Closes#2129
## Test Plan
A new functional test has been added to test the validity of such a
request
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
**What:**
- Added OpenAIChatCompletionTextOnlyMessageContent type for text-only
content validation
- Modified OpenAISystemMessageParam, OpenAIAssistantMessageParam,
OpenAIDeveloperMessageParam, and OpenAIToolMessageParam to use text-only
content type instead of mixed content
- OpenAIUserMessageParam unchanged - still accepts both text and images
- Updated OpenAPI spec files to reflect text-only content restrictions
in schemas
closes#2894
**Why:**
- Enforces OpenAI API compatibility by restricting image content to user
messages only
- Prevents API misuse where images might be sent in message types that
don't support them
- Aligns with OpenAI's actual API behavior where only user messages can
contain multimodal content
- Improves type safety and validation at the API boundary
**Test plan:**
- Added comprehensive parametrized tests covering all 5 OpenAI message
types
- Tests verify text string acceptance for all message types
- Tests verify text list acceptance for all message types
- Tests verify image rejection for system/assistant/developer/tool
messages (ValidationError expected)
- Tests verify user messages still accept images (backward compatibility
maintained)
# What does this PR do?
- Add base_url field to OpenAIConfig with default
"https://api.openai.com/v1"
- Update sample_run_config to support OPENAI_BASE_URL environment
variable
- Modify get_base_url() to return configured base_url instead of
hardcoded value
- Add comprehensive test suite covering:
- Default base URL behavior
- Custom base URL from config
- Environment variable override
- Config precedence over environment variables
- Client initialization with configured URL
- Model availability checks using configured URL
This enables users to configure custom OpenAI-compatible API endpoints
via environment variables or configuration files.
Closes#2910
## Test Plan
run unit tests
# What does this PR do?
This enhancement allows inference providers using LiteLLMOpenAIMixin to
validate model availability against LiteLLM's official provider model
listings, improving reliability and user experience when working with
different AI service providers.
- Add litellm_provider_name parameter to LiteLLMOpenAIMixin constructor
- Add check_model_availability method to LiteLLMOpenAIMixin using
litellm.models_by_provider
- Update Gemini, Groq, and SambaNova inference adapters to pass
litellm_provider_name
## Test Plan
standard CI.
# What does this PR do?
the server logs have a persistent `core: refreshing registry` log that
clogs up the output. Switch it to debug
this is what it looked like:
<img width="1126" height="1028" alt="Screenshot 2025-07-28 at 9 56
44 AM"
src="https://github.com/user-attachments/assets/a1880fd3-7fc7-4a97-bfb8-89a62e4c5c19"
/>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
in #2637, I combined the run and build config provider types to both use
`Provider`
since this includes a provider_id, a user must now specify this when
writing a build yaml. This is not very clear because all a user should
care about upon build is the code to be installed (the module and the
provider_type)
introduce `BuildProvider` and fixup the parts of the code impacted by
this
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Add support for deleting individual chunks from vector stores
- Add abstract remove_chunk() method to EmbeddingIndex base class
- Implement chunk deletion for Faiss provider, SQLite Vec, Milvus,
PGVector
- Placeholder implementations with NotImplementedError for
Chroma/Qdrant/Weaviate
- Integrate chunk deletion into OpenAI vector store file deletion flow
- removed xfail from
test_openai_vector_store_delete_file_removes_from_vector_store
Closes: #2477
---------
Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
# What does this PR do?
Enable Chroma inline unit tests and fix integration tests.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Avoid the error message:
```
INFO 2025-07-24 21:51:54,530 __main__:598 server: Received interrupt signal, shutting down gracefully...
ERROR 2025-07-24 21:51:54,692 asyncio:1826 uncategorized: Task was destroyed but it is pending!
task: <Task pending name='Task-15' coro=<refresh_registry() running at
/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/stack.py:356> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=>
```
# What does this PR do?
Today, external providers are installed via the `external_providers_dir`
in the config. This necessitates users to understand the `ProviderSpec`
and set up their directories accordingly. This process splits up the
config for the stack across multiple files, directories, and formats.
Most (if not all) external providers today have a
[get_provider_spec](559cb18fbb/src/ramalama_stack/provider.py (L9))
method that sits unused. Utilizing this method rather than the
providers.d route allows for a much easier installation process for
external providers and limits the amount of extra configuration a
regular user has to do to get their stack off the ground.
To accomplish this and wire it throughout the build process, Introduce
the concept of a `module` for users to specify for an external provider
upon build time. In order to facilitate this, align the build and run
spec to use `Provider` class rather than the stringified provider_type
that build currently uses.
For example, say this is in your build config:
```
- provider_id: ramalama
provider_type: remote::ramalama
module: ramalama_stack
```
during build (in the various `build_...` scripts), additionally to
installing any pip dependencies we will also install this module and use
the `get_provider_spec` method to retrieve the ProviderSpec that is
currently specified using `providers.d`.
In production so far, providing instructions for installing external
providers for users has been difficult: they need to install the module
as a pre-req, create the providers.d directory, copy in the provider
spec, and also copy in the necessary build/run yaml files. Accessing an
external provider should be as easy as possible, and pointing to its
installable module aligns more with the rest of our build and dependency
management process.
For now, `external_providers_dir` still exists as an alternate more
declarative method of using external providers.
## Test Plan
added an integration test installing an external provider from module
and more unit test coverage for `get_provider_registry`
( the warning in yellow is expected, the module is installed inside of
the build env, not where we are running the command)
<img width="1119" height="400" alt="Screenshot 2025-07-24 at 11 30
48 AM"
src="https://github.com/user-attachments/assets/1efbaf45-b9e8-451a-bd63-264ed664706d"
/>
<img width="1154" height="618" alt="Screenshot 2025-07-24 at 11 31
14 AM"
src="https://github.com/user-attachments/assets/feb2b3ea-c5dd-418e-9662-9a3bd5dd6bdc"
/>
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Bumps [form-data](https://github.com/form-data/form-data) from 4.0.2 to
4.0.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/form-data/form-data/releases">form-data's
releases</a>.</em></p>
<blockquote>
<h2>v4.0.4</h2>
<h2><a
href="https://github.com/form-data/form-data/compare/v4.0.3...v4.0.4">v4.0.4</a>
- 2025-07-16</h2>
<h3>Commits</h3>
<ul>
<li>[meta] add <code>auto-changelog</code> <a
href="811f68282f"><code>811f682</code></a></li>
<li>[Tests] handle predict-v8-randomness failures in node < 17 and
node > 23 <a
href="1d11a76434"><code>1d11a76</code></a></li>
<li>[Fix] Switch to using <code>crypto</code> random for boundary values
<a
href="3d1723080e"><code>3d17230</code></a></li>
<li>[Tests] fix linting errors <a
href="5e340800b5"><code>5e34080</code></a></li>
<li>[meta] actually ensure the readme backup isn’t published <a
href="316c82ba93"><code>316c82b</code></a></li>
<li>[Dev Deps] update <code>@ljharb/eslint-config</code> <a
href="58c25d7640"><code>58c25d7</code></a></li>
<li>[meta] fix readme capitalization <a
href="2300ca1959"><code>2300ca1</code></a></li>
</ul>
<h2>v4.0.3</h2>
<h2><a
href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.3">v4.0.3</a>
- 2025-06-05</h2>
<h3>Fixed</h3>
<ul>
<li>[Fix] <code>append</code>: avoid a crash on nullish values <a
href="https://redirect.github.com/form-data/form-data/issues/577"><code>[#577](https://github.com/form-data/form-data/issues/577)</code></a></li>
</ul>
<h3>Commits</h3>
<ul>
<li>[eslint] use a shared config <a
href="426ba9ac44"><code>426ba9a</code></a></li>
<li>[eslint] fix some spacing issues <a
href="20941917f0"><code>2094191</code></a></li>
<li>[Refactor] use <code>hasown</code> <a
href="81ab41b46f"><code>81ab41b</code></a></li>
<li>[Fix] validate boundary type in <code>setBoundary()</code> method <a
href="8d8e469309"><code>8d8e469</code></a></li>
<li>[Tests] add tests to check the behavior of <code>getBoundary</code>
with non-strings <a
href="837b8a1f75"><code>837b8a1</code></a></li>
<li>[Dev Deps] remove unused deps <a
href="870e4e6659"><code>870e4e6</code></a></li>
<li>[meta] remove local commit hooks <a
href="e6e83ccb54"><code>e6e83cc</code></a></li>
<li>[Dev Deps] update <code>eslint</code> <a
href="4066fd6f65"><code>4066fd6</code></a></li>
<li>[meta] fix scripts to use prepublishOnly <a
href="c4bbb13c0e"><code>c4bbb13</code></a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/form-data/form-data/blob/master/CHANGELOG.md">form-data's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/form-data/form-data/compare/v4.0.3...v4.0.4">v4.0.4</a>
- 2025-07-16</h2>
<h3>Commits</h3>
<ul>
<li>[meta] add <code>auto-changelog</code> <a
href="811f68282f"><code>811f682</code></a></li>
<li>[Tests] handle predict-v8-randomness failures in node < 17 and
node > 23 <a
href="1d11a76434"><code>1d11a76</code></a></li>
<li>[Fix] Switch to using <code>crypto</code> random for boundary values
<a
href="3d1723080e"><code>3d17230</code></a></li>
<li>[Tests] fix linting errors <a
href="5e340800b5"><code>5e34080</code></a></li>
<li>[meta] actually ensure the readme backup isn’t published <a
href="316c82ba93"><code>316c82b</code></a></li>
<li>[Dev Deps] update <code>@ljharb/eslint-config</code> <a
href="58c25d7640"><code>58c25d7</code></a></li>
<li>[meta] fix readme capitalization <a
href="2300ca1959"><code>2300ca1</code></a></li>
</ul>
<h2><a
href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.3">v4.0.3</a>
- 2025-06-05</h2>
<h3>Fixed</h3>
<ul>
<li>[Fix] <code>append</code>: avoid a crash on nullish values <a
href="https://redirect.github.com/form-data/form-data/issues/577"><code>[#577](https://github.com/form-data/form-data/issues/577)</code></a></li>
</ul>
<h3>Commits</h3>
<ul>
<li>[eslint] use a shared config <a
href="426ba9ac44"><code>426ba9a</code></a></li>
<li>[eslint] fix some spacing issues <a
href="20941917f0"><code>2094191</code></a></li>
<li>[Refactor] use <code>hasown</code> <a
href="81ab41b46f"><code>81ab41b</code></a></li>
<li>[Fix] validate boundary type in <code>setBoundary()</code> method <a
href="8d8e469309"><code>8d8e469</code></a></li>
<li>[Tests] add tests to check the behavior of <code>getBoundary</code>
with non-strings <a
href="837b8a1f75"><code>837b8a1</code></a></li>
<li>[Dev Deps] remove unused deps <a
href="870e4e6659"><code>870e4e6</code></a></li>
<li>[meta] remove local commit hooks <a
href="e6e83ccb54"><code>e6e83cc</code></a></li>
<li>[Dev Deps] update <code>eslint</code> <a
href="4066fd6f65"><code>4066fd6</code></a></li>
<li>[meta] fix scripts to use prepublishOnly <a
href="c4bbb13c0e"><code>c4bbb13</code></a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="41996f5ac7"><code>41996f5</code></a>
v4.0.4</li>
<li><a
href="316c82ba93"><code>316c82b</code></a>
[meta] actually ensure the readme backup isn’t published</li>
<li><a
href="2300ca1959"><code>2300ca1</code></a>
[meta] fix readme capitalization</li>
<li><a
href="811f68282f"><code>811f682</code></a>
[meta] add <code>auto-changelog</code></li>
<li><a
href="5e340800b5"><code>5e34080</code></a>
[Tests] fix linting errors</li>
<li><a
href="1d11a76434"><code>1d11a76</code></a>
[Tests] handle predict-v8-randomness failures in node < 17 and node
> 23</li>
<li><a
href="58c25d7640"><code>58c25d7</code></a>
[Dev Deps] update <code>@ljharb/eslint-config</code></li>
<li><a
href="3d1723080e"><code>3d17230</code></a>
[Fix] Switch to using <code>crypto</code> random for boundary
values</li>
<li><a
href="d8d67dc8ac"><code>d8d67dc</code></a>
v4.0.3</li>
<li><a
href="e6e83ccb54"><code>e6e83cc</code></a>
[meta] remove local commit hooks</li>
<li>Additional commits viewable in <a
href="https://github.com/form-data/form-data/compare/v4.0.2...v4.0.4">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/meta-llama/llama-stack/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# What does this PR do?
- Added ability to specify `required_scope` when declaring an API. This
is part of the `@webmethod` decorator.
- If auth is enabled, a user can access an API only if
`user.attributes['scope']` includes the `required_scope`
- We add `required_scope='telemetry.read'` to the telemetry read APIs.
## Test Plan
CI with added tests
1. Enable server.auth with github token
2. Observe `client.telemetry.query_traces()` returns 403
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR adds support for the new Streamable HTTP transport for MCP, as
well as falling back to the SSE protocol if the Streamable HTTP
connection fails.
<!-- If resolving an issue, uncomment and update the line below -->
Closes#2542
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
---------
Signed-off-by: Calum Murray <cmurray@redhat.com>
# What does this PR do?
Prototype on a new feature to allow new APIs to be plugged in Llama
Stack. Opened for early feedback on the approach and test appetite on
the functionality.
@ashwinb @raghotham open for early feedback, thanks!
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
currently `print` is being used with custom formatting to achieve
telemetry output in the console_span_processor
This causes telemetry not to show up in log files when using
`LLAMA_STACK_LOG_FILE`. During testing it looks like telemetry is not
being captured when it is
switch to using Rich formatting with the logger and then strip the
formatting off when a log file is being used so the formatting looks
normal
## Test Plan
before:
console:
<img width="967" height="127" alt="Screenshot 2025-07-21 at 4 02 15 PM"
src="https://github.com/user-attachments/assets/b09518cc-9d38-4970-9877-70e2c41fcbb5"
/>
log file (no telemetry):
```
2025-07-21 16:01:32,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`...
2025-07-21 16:01:34,779 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed
2025-07-21 16:01:35,083 __main__:587 server: Listening on ['::', '0.0.0.0']:8321
2025-07-21 16:01:35,091 uvicorn.error:84 uncategorized: Started server process [68679]
2025-07-21 16:01:35,091 uvicorn.error:48 uncategorized: Waiting for application startup.
2025-07-21 16:01:35,092 __main__:163 server: Starting up
2025-07-21 16:01:35,092 uvicorn.error:62 uncategorized: Application startup complete.
2025-07-21 16:01:35,092 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
2025-07-21 16:01:37,167 uvicorn.access:473 uncategorized: 127.0.0.1:53145 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200
```
after:
console:
<img width="797" height="165" alt="Screenshot 2025-07-22 at 3 28 44 PM"
src="https://github.com/user-attachments/assets/44d40e3b-6502-439d-9ea5-38058b289962"
/>
log file:
```
2025-07-21 15:59:51,481 llama_stack.providers.remote.inference.ollama.ollama:117 inference: checking connectivity to Ollama at `http://localhost:11434`...
2025-07-21 15:59:53,801 opentelemetry.trace:537 uncategorized: Overriding of current TracerProvider is not allowed
2025-07-21 15:59:54,059 __main__:587 server: Listening on ['::', '0.0.0.0']:8321
2025-07-21 15:59:54,066 uvicorn.error:84 uncategorized: Started server process [68578]
2025-07-21 15:59:54,067 uvicorn.error:48 uncategorized: Waiting for application startup.
2025-07-21 15:59:54,067 __main__:163 server: Starting up
2025-07-21 15:59:54,067 uvicorn.error:62 uncategorized: Application startup complete.
2025-07-21 15:59:54,068 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
2025-07-21 15:59:55,381 [TELEMETRY] 19:59:55.381 /v1/openai/v1/chat/completions
2025-07-21 15:59:55,619 uvicorn.access:473 uncategorized: 127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200
2025-07-21 15:59:55,621 [TELEMETRY] 19:59:55.621 /v1/openai/v1/chat/completions [StatusCode.OK] (240.07ms)
2025-07-21 15:59:55,622 [TELEMETRY] 19:59:55.620 127.0.0.1:53102 - "POST /v1/openai/v1/chat/completions HTTP/1.1" 200
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>