# What does this PR do?
Enable ruff for scripts.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
support nvidia hosted 3.2 11b/90b vision models. they are not hosted on
the common https://integrate.api.nvidia.com/v1. they are hosted on their
own individual urls.
## Test Plan
`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v
tests/client-sdk/inference/test_vision_inference.py
--inference-model=meta/llama-3.2-11b-vision-instruct -k image`
# What does this PR do?
Adds a container file that can be used to build the playground UI.
This file will be built by this PR in the stack-ops repo:
https://github.com/meta-llama/llama-stack-ops/pull/9
Docker command in the docs will need to change once I know the address
of the official repository.
## Test Plan
Tested image on my local Openshift Instance using this helm chart:
https://github.com/Jaland/llama-stack-helm/tree/main/llama-stack
[//]: # (## Documentation)
---------
Co-authored-by: Jamie Land <hokie10@gmail.com>
# What does this PR do?
Run additional tests in a matrix to accelerate the process and clearly
identify failing providers.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Add the option to not verify SSL certificates for the remote-vllm
provider. This allows llama stack server to talk to remote LLMs which
have self-signed certificates
Partially addresses #1545
# Summary:
Includes fixes to get test_agents working with openAI model, e.g. tool
parsing and message conversion
# Test Plan:
```
LLAMA_STACK_CONFIG=dev pytest -s -v tests/integration/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --text-model openai/gpt-4o-mini
```
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1550).
* #1556
* __->__ #1550
# What does this PR do?
This avoids flaky timeout issue observed in CI builds, e.g.
3891286596
## Test Plan
Ran multiple times and pass consistently.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
This PR adds dependabot updates for Python dependencies. In addition:
* Consistent weekly schedule on a specific day
* Specific commit messages
* `open-pull-requests-limit` is intentional to avoid upgrading
dependencies that will likely cause regressions. We want to keep the
focus here on security updates only
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
I think this was included accidentally via
https://github.com/meta-llama/llama-stack/pull/1475.
@raghotham @ashwinb let me know if it's intentional to include this.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
rather than have unit and functional tests run on all PRs, we should
only have them run on PRs changing relevant files
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
A PTY is unnecessary for interactive mode since `subprocess.run()`
already inherits the calling terminal’s stdin, stdout, and stderr,
allowing natural interaction. Using a PTY can introduce unwanted side
effects like buffering issues and inconsistent signal handling. Standard
input/output is sufficient for most interactive programs.
This commit simplifies the command execution by:
1. Removing PTY-based execution in favor of direct subprocess handling
2. Consolidating command execution into a single run_command function
3. Improving error handling with specific subprocess error types
4. Adding proper type hints and documentation
5. Maintaining Ctrl+C handling for graceful interruption
## Test Plan
```
llama stack run
```
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Currently there is no shutdown method implemented for the `ProviderImpl`
class
This leads to the following warning
```shell
INFO: Waiting for application shutdown.
INFO 2025-03-17 17:25:13,280 __main__:145 server: Shutting down
INFO 2025-03-17 17:25:13,282 __main__:129 server: Shutting down ModelsRoutingTable
INFO 2025-03-17 17:25:13,284 __main__:129 server: Shutting down DatasetsRoutingTable
INFO 2025-03-17 17:25:13,286 __main__:129 server: Shutting down DatasetIORouter
INFO 2025-03-17 17:25:13,287 __main__:129 server: Shutting down TelemetryAdapter
INFO 2025-03-17 17:25:13,288 __main__:129 server: Shutting down InferenceRouter
INFO 2025-03-17 17:25:13,290 __main__:129 server: Shutting down ShieldsRoutingTable
INFO 2025-03-17 17:25:13,291 __main__:129 server: Shutting down SafetyRouter
INFO 2025-03-17 17:25:13,292 __main__:129 server: Shutting down VectorDBsRoutingTable
INFO 2025-03-17 17:25:13,293 __main__:129 server: Shutting down VectorIORouter
INFO 2025-03-17 17:25:13,294 __main__:129 server: Shutting down ToolGroupsRoutingTable
INFO 2025-03-17 17:25:13,295 __main__:129 server: Shutting down ToolRuntimeRouter
INFO 2025-03-17 17:25:13,296 __main__:129 server: Shutting down MetaReferenceAgentsImpl
INFO 2025-03-17 17:25:13,297 __main__:129 server: Shutting down ScoringFunctionsRoutingTable
INFO 2025-03-17 17:25:13,298 __main__:129 server: Shutting down ScoringRouter
INFO 2025-03-17 17:25:13,299 __main__:129 server: Shutting down BenchmarksRoutingTable
INFO 2025-03-17 17:25:13,300 __main__:129 server: Shutting down EvalRouter
INFO 2025-03-17 17:25:13,301 __main__:129 server: Shutting down DistributionInspectImpl
INFO 2025-03-17 17:25:13,303 __main__:129 server: Shutting down ProviderImpl
WARNING 2025-03-17 17:25:13,304 __main__:134 server: No shutdown method for ProviderImpl
INFO: Application shutdown complete.
INFO: Finished server process [1]
```
## Test Plan
Start a server and shut it down
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
**Description:** Updates the client example output as well as add a
suggested formatting for some of the required and optional cli flags.
If the re-formatting is unnecessary, I can remove it from this PR and
just have this fix the example output
# What does this PR do?
Update the container build script so that it is compatible with podman.
The --progress=plain is now the default option and can be overriden.
## Test Plan
N/A
[//]: # (## Documentation)
Signed-off-by: Jeff MAURY <jmaury@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
The generate_response_prompt had an import error, fixed that error.
Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com>
# What does this PR do?
current passthrough impl returns chatcompletion_message.content as a
TextItem() , not a straight string. so it's not compatible with other
providers, and causes parsing error downstream.
change away from the generic pydantic conversion, and explicitly parse
out content.text
## Test Plan
setup llama server with passthrough
```
llama-stack-client eval run-benchmark "MMMU_Pro_standard" --model-id meta-llama/Llama-3-8B --output-dir /tmp/ --num-examples 20
```
works without parsing error
# What does this PR do?
Web updates to point to latest releases for Mobile SDK
- point to `latest-release` branch for mobile sdk repos to minimize the
number of change points on the site.
- updates to some instructions
# What does this PR do?
current docs are very tailored to `conda`
also adds guidance around running code examples within virtual
environment for both `conda` and `virtualenv`
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
create a new dataset BFCL_v3 from
https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html
overall each question asks the model to perform a task described in
natural language, and additionally a set of available functions and
their schema are given for the model to choose from. the model is
required to write the function call form including function name and
parameters , to achieve the stated purpose. the results are validated
against provided ground truth, to make sure that the generated function
call and the ground truth function call are syntactically and
semantically equivalent, by checking their AST .
## Test Plan
start server by
```
llama stack run ./llama_stack/templates/ollama/run.yaml
```
then send traffic
```
llama-stack-client eval run-benchmark "bfcl" --model-id meta-llama/Llama-3.2-3B-Instruct --output-dir /tmp/gpqa --num-examples 2
```
[//]: # (## Documentation)
# What does this PR do?
a user should be able to store a static logging configuration outside of
their environment. This would make sense to store in the run yaml given
that we store other things like server configuration in there.
The environment variable settings override the config settings if both
are available.
The format in the config looks like this:
```
logging_config:
category_levels:
VALID_CATEGORY: VALID_STRING_LOG_LEVEL
```
any specified category out of the following:
`core | server | router | inference | agents | safety | eval | tools |
client`
combined with any of the following log levels:
`debug | info | warning | error | critical`
can be placed in the category_levels list in order to achieve the
desired log level
## Test Plan
Test locally with a run config like the following:
```
version: '2'
image_name: ollama
logging_config:
category_levels:
server: debug
apis:
...
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
It will fail if the newly generated spec docs are different.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
$ pre-commit run --all-files
check for merge conflicts................................................Passed
trim trailing whitespace.................................................Passed
check for added large files..............................................Passed
fix end of files.........................................................Passed
Insert license in comments...............................................Passed
ruff.....................................................................Passed
ruff-format..............................................................Passed
blacken-docs.............................................................Passed
uv-lock..................................................................Passed
uv-export................................................................Passed
mypy.....................................................................Passed
Distribution Template Codegen............................................Passed
API Spec Codegen.........................................................Passed
```
Now add a field to existing API. Repeat:
```
$ pre-commit run --all-files
check for merge conflicts................................................Passed
trim trailing whitespace.................................................Passed
check for added large files..............................................Passed
fix end of files.........................................................Passed
Insert license in comments...............................................Passed
ruff.....................................................................Passed
ruff-format..............................................................Passed
blacken-docs.............................................................Passed
uv-lock..................................................................Passed
uv-export................................................................Passed
mypy.....................................................................Passed
Distribution Template Codegen............................................Passed
API Spec Codegen.........................................................Failed
- hook id: openapi-codegen
- files were modified by this hook
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Useful for local development. Now you can just trigger the script and
not care about specific arguments to pass to run unit tests.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
$ . ./venv/bin/activate
$ ./scripts/run_tests.sh
$ echo $?
0
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Co-authored-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com>
# What does this PR do?
quick fix as the vision_inference test dog.jpg path has been changed.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
# What does this PR do?
Updates the help text for the `llama model prompt-format` command to
clarify that users should provide a specific model name (e.g.,
Llama3.1-8B, Llama3.2-11B-Vision), not a model family. Removes the
default value and field for `--model-name` to prevent users from
mistakenly thinking a model family name is acceptable. Adds guidance to
run `llama model list` to view valid model names.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Output of `llama model prompt-format -h` Before:
```
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h
usage: llama model prompt-format [-h] [-m MODEL_NAME]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Model Family (llama3_1, llama3_X, etc.)
Example:
llama model prompt-format <options>
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format --model-name llama3_1
usage: llama model prompt-format [-h] [-m MODEL_NAME]
llama model prompt-format: error: llama3_1 is not a valid Model. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct
```
Output of `llama model prompt-format -h` After:
```
(venv) alina@fedora:~/dev/llama/llama-stack$ llama model prompt-format -h
usage: llama model prompt-format [-h] [-m MODEL_NAME]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Example: Llama3.1-8B or Llama3.2-11B-Vision, etc
(Run `llama model list` to see a list of valid model names)
Example:
llama model prompt-format <options>
```
Signed-off-by: Alina Ryan <aliryan@redhat.com>
# What does this PR do?
Updated all instances of datetime.now() to use timezone.utc for
consistency in handling time across different systems. This ensures that
timestamps are always in Coordinated Universal Time (UTC), avoiding
issues with time zone discrepancies and promoting uniformity in
time-related data.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
currently the `inspect` API for providers is really a `list` API. Create
a new `providers` API which has a GET `providers/{provider_id}` inspect
API
which returns "user friendly" configuration to the end user. Also add a
GET `/providers` endpoint which returns the list of providers as
`inspect/providers` does today.
This API follows CRUD and is more intuitive/RESTful.
This work is part of the RFC at
https://github.com/meta-llama/llama-stack/pull/1359
sensitive fields are redacted using `redact_sensetive_fields` on the
server side before returning a response:
<img width="456" alt="Screenshot 2025-03-13 at 4 40 21 PM"
src="https://github.com/user-attachments/assets/9465c221-2a26-42f8-a08a-6ac4a9fecce8"
/>
## Test Plan
using https://github.com/meta-llama/llama-stack-client-python/pull/181 a
user is able to to run the following:
`llama stack build --template ollama --image-type venv`
`llama stack run --image-type venv
~/.llama/distributions/ollama/ollama-run.yaml`
`llama-stack-client providers inspect ollama`
<img width="378" alt="Screenshot 2025-03-13 at 4 39 35 PM"
src="https://github.com/user-attachments/assets/8273d05d-8bc3-44c6-9e4b-ef95e48d5466"
/>
also, was able to run the new test_list integration test locally with
ollama:
<img width="1509" alt="Screenshot 2025-03-13 at 11 03 40 AM"
src="https://github.com/user-attachments/assets/9b9db166-f02f-45b0-86a4-306d85149bc8"
/>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
Before the change, it was only doing it during the merge.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
$ git checkout d263edbf90
$ pre-commit run --all-files
check for merge conflicts................................................Failed
- hook id: check-merge-conflict
- exit code: 1
docs/_static/llama-stack-spec.yaml:3179: Merge conflict string '<<<<<<<' found
docs/_static/llama-stack-spec.yaml:3185: Merge conflict string '=======' found
docs/_static/llama-stack-spec.yaml:3190: Merge conflict string '>>>>>>>' found
[...]
```
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
- Issues/PRs inactive for 60 days are marked as stale
- Stale items are closed after 30 additional days of inactivity
- Adds appropriate warning and closing messages
- Sets daily schedule for stale checks
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Added a GitHub Action to run inference tests for the Ollama provider.
This ensures we have coverage for Ollama integration.
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Introduced a new CI job that dynamically generates a build matrix based
on available templates from `llama_stack/templates/*/build.yaml`.
This allows automated testing for all templates without manual
intervention.
The CI currently builds for venv and containers.
Signed-off-by: Sébastien Han <seb@redhat.com>
~Will pass once https://github.com/meta-llama/llama-stack/pull/1228
merges.~
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
- Fix issue w/ passthrough provider
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
llama stack run
[//]: # (## Documentation)