# What does this PR do?
Move around bits. This makes the copies from llama-models _much_ easier
to maintain and ensures we don't entangle meta-reference specific
tidbits into llama-models code even by accident.
Also, kills the meta-reference-quantized-gpu distro and rolls
quantization deps into meta-reference-gpu.
## Test Plan
```
LLAMA_MODELS_DEBUG=1 \
with-proxy llama stack run meta-reference-gpu \
--env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
--env INFERENCE_CHECKPOINT_DIR=<DIR> \
--env MODEL_PARALLEL_SIZE=4 \
--env QUANTIZATION_TYPE=fp8_mixed
```
Start a server with and without quantization. Point integration tests to
it using:
```
pytest -s -v tests/integration/inference/test_text_inference.py \
--stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
Running full Tool Calling required some updates to work e2e.
- Remove `python_start` and `python_end` tags
- Tool Call messages and Tool Resposne messages should end with
`<|eom|>`
- System prompt needed updates
```
You are a helpful assisant who can can answer general questions or invoke tools when necessary.
In addition to tool calls, you should also augment your responses by using the tool outputs.
```
### Test Plan
- Start server with meta-reference
```
LLAMA_STACK_DISABLE_VERSION_CHECK=1 LLAMA_MODELS_DEBUG=1 INFERENCE_MODEL=meta-llama/$MODEL llama stack run meta-reference-gpu
```
- Added **NEW** tests with 5 test cases for multi-turn tool calls
```
pytest -s -v --stack-config http://localhost:8321 tests/integration/inference/test_text_inference.py --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
- Also verified all vision and agent tests pass
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
# What does this PR do?
This PR modifies some of the docs to help them map to (1) the mental
model of software engineers building AI models starting with RAG and
then moving to Agents and (2) aligning the navbar somewhat closer to the
diagram on the home page.
## Test Plan
N/A Tested locally.
# Documentation
Take a look at the screen shot for below and after.
## Before

## After

---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
- **chore: mypy for strong_typing**
- **chore: mypy for remote::vllm**
- **chore: mypy for remote::ollama**
- **chore: mypy for providers.datatype**
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Another for https://github.com/meta-llama/llama-stack/issues/1815
This links the `CONTRIBUTING.md` file directly so that we don't have to
maintain two different files.
Also I updated the title for RAG under Building AI Applications.
## Changes
Look of what the Contributing page looks like, proof it sources directly
from the markdown file.

---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
Don't return list for runtime tools. Instead return Response object for
pagination and consistency with other APIs.
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Previously, the integration tests started the server, but never really
used it because `--stack-config=ollama` uses the ollama template and the
inline "llama stack as library" client, not the HTTP client.
This PR makes sure we test it both ways.
We also add agents tests to the mix.
## Test Plan
GitHub
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Move pagination logic from LocalFS and HuggingFace implementations into
a common helper function to ensure consistent pagination behavior across
providers. This reduces code duplication and centralizes pagination
logic in one place.
## Test Plan
Run this script:
```
from llama_stack_client import LlamaStackClient
# Initialize the client
client = LlamaStackClient(base_url="http://localhost:8321")
# Register a dataset
response = client.datasets.register(
purpose="eval/messages-answer", # or "eval/question-answer" or "post-training/messages"
source={"type": "uri", "uri": "huggingface://datasets/llamastack/simpleqa?split=train"},
dataset_id="my_dataset", # optional, will be auto-generated if not provided
metadata={"description": "My evaluation dataset"}, # optional
)
# Verify the dataset was registered by listing all datasets
datasets = client.datasets.list()
print(f"Registered datasets: {[d.identifier for d in datasets]}")
# You can then access the data using the datasetio API
# rows = client.datasets.iterrows(dataset_id="my_dataset", start_index=1, limit=2)
rows = client.datasets.iterrows(dataset_id="my_dataset")
print(f"Data: {rows.data}")
```
And play with `start_index` and `limit`.
[//]: # (## Documentation)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
The goal of this PR is to make the pages easier to navigate by surfacing
the child pages on the navbar, updating some of the copy, moving some of
the files around.
Some changes:
1. Clarifying Titles
2. Restructuring "Distributions" more formally in its own page to be
consistent with Providers and adding some clarity to the child pages to
surface them and make them easier to navigate
3. Updated sphinx config to not collapse navigation by default
4. Updated copyright year to be calculated dynamically
5. Moved `docs/source/distributions/index.md` ->
`docs/source/distributions/starting_llama_stack_server.md`
Another for https://github.com/meta-llama/llama-stack/issues/1815
## Test Plan
Tested locally and pages build (screen shots for example).
## Documentation
### Before:

### After:

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
* Added `--text-model` in example command.
* Added link to integration tests instruction and a note on specifying
models.
This is to avoid confusion when all tests are skipped because no model
is provided.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
This PR updates the sink name configuration for traces and metrics in
LlamaStack to align with the latest changes introduced in version 0.1.8.
Previously, when using the `otel` sink along with other sinks (like
`console` and `sqlite`), the system threw a **ValueError**, with the
message:
```shell
Value error, 'otel' is not a valid TelemetrySink [type=value_error, input_value='console,otel,sqlite', input_type=str]
For further information visit https://errors.pydantic.dev/2.10/v/value_error
```
## Test Plan
- **Test 1:**
Ran the LlamaStack server with a configuration containing
`console,otel,sqlite` as sinks.
- **Expected result:** No errors related to invalid sink names.
- **Result:** The system ran without throwing a `ValueError`.
- **Test 2:**
Verified that the `otel_trace`, `otel_metric` sink now works in
combination with other sinks (`console`, `sqlite`).
- **Expected result:** Telemetry data is correctly sent to all specified
sinks without errors.
- **Result:** All telemetry data was successfully sent to the specified
sinks.
# What does this PR do?
https://github.com/meta-llama/llama-stack/pull/1828 removed
__root_span__ attribute which is still needed
## Test Plan
added telemetry integration test
LLAMA_STACK_CONFIG=http://localhost:5001 pytest -s -v
tests/integration/telemetry --safety-shield meta-llama/Llama-Guard-3-8B
--text-model accounts/fireworks/models/llama-v3p3-70b-instruct
# What does this PR do?
This PR converts blocking Milvus Client calls to non-blocking.
Another one for https://github.com/meta-llama/llama-stack/issues/1489
## Test Plan
I ran the integration tests from
https://github.com/meta-llama/llama-stack/pull/1467 with:
```python
pytest -s -v tests/integration/vector_io/test_vector_io.py \
--stack-config inference=sentence-transformers,vector_io=inline::milvus \
--embedding-model all-miniLM-L6-V2 --env MILVUS_DB_PATH=/tmp/moo.db
INFO 2025-03-28 21:35:22,726 tests.integration.conftest:41 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS
/Users/farceo/dev/llama-stack/.venv/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
=============================================================================================================================================================================================================================================================== test session starts ===============================================================================================================================================================================================================================================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/farceo/dev/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'cov': '6.0.0', 'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/farceo/dev/llama-stack
configfile: pyproject.toml
plugins: cov-6.0.0, html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 7 items
tests/integration/vector_io/test_vector_io.py::test_vector_db_retrieve[emb=all-miniLM-L6-V2] PASSED
tests/integration/vector_io/test_vector_io.py::test_vector_db_register[emb=all-miniLM-L6-V2] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case0] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case1] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case2] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case3] PASSED
tests/integration/vector_io/test_vector_io.py::test_insert_chunks[emb=all-miniLM-L6-V2-test_case4] PASSED
========================================================================================================================================================================================================================================================= 7 passed, 2 warnings in 40.33s ==========================================================================================================================================================================================================================================================
```
[//]: # (## Documentation)
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
- this is a flaky test dependent on model output
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
<img width="853" alt="image"
src="https://github.com/user-attachments/assets/e7607877-22a9-48e3-adac-e991d1070ec0"
/>
[//]: # (## Documentation)
# What does this PR do?
Adding chunk_size_in_tokens to playground rag_tool insert.
# Closes#1825
## Test Plan
Tested locally.
[//]: # (## Documentation)
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
This is the second attempt to switch to system packages by default. Now
with a hack to detect conda environment - in which case conda image-type
is used.
Note: Conda will only be used when --image-name is unset *and*
CONDA_DEFAULT_ENV is set. This means that users without conda will
correctly fall back to using system packages when no --image-* arguments
are passed at all.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Uses virtualenv:
```
$ llama stack build --template ollama --image-type venv
$ llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using virtual environment: /home/ec2-user/src/llama-stack/schedule/.local
[...]
```
Uses system packages (virtualenv already initialized):
```
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
INFO 2025-03-27 20:46:22,882 llama_stack.cli.stack.run:142 server: No image type or image name provided. Assuming environment packages.
[...]
```
Attempt to run from environment packages without necessary packages
installed:
```
$ python -m venv barebones
$ . ./barebones/bin/activate
$ pip install -e . # to install llama command
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
ModuleNotFoundError: No module named 'fastapi'
```
^ failed as expected because the environment doesn't have necessary
packages installed.
Now install some packages in the new environment:
```
$ pip install fastapi opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp aiosqlite ollama openai datasets faiss-cpu mcp autoevals
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```
Now see if setting CONDA_DEFAULT_ENV will change what happens by
default:
```
$ export CONDA_DEFAULT_ENV=base
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using conda environment: base
Conda environment base does not exist.
[...]
```
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
- hide distro doc (docker needs to be thoroughly tested).
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
- docs
[//]: # (## Documentation)
# What does this PR do?
No violators are currently in-tree. This is just hardening the api specs
for future consistency.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
* Fix location of `run.yaml` relative to the cloned llama stack
repository
* Drop `-it` from `docker run` commands as its not needed running
services
## Test Plan
* Verified running the llama stack following updated instruction
CC: @ashwinb
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
# What does this PR do?
Set a formatter for log file handler that does not pollute log messages
with color tags.
## Test Plan
Successfully tested with `LLAMA_STACK_LOG_FILE=server.log llama stack
run ...`
# What does this PR do?
- we no longer query vector db when uploading documents as attachments
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
pytest --stack-config="http://localhost:8321" -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct
```
```
pytest --stack-config=fireworks -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct --record-responses
```
<img width="1160" alt="image"
src="https://github.com/user-attachments/assets/90700f79-c002-4474-bb41-7bc0a39dc91c"
/>
[//]: # (## Documentation)
# What does this PR do?
Re-enable isort enforcement.
It was disabled in 1a73f8305b, probably by
mistake.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
- the distribution folder is referencing template, and have dead docker
compose scripts
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[//]: # (## Documentation)
Fixes multiple issues
1. llama stack build of dependencies was breaking with incompatible
numpy / pandas when importing datasets
Moved the notebook to start a local server instead of using library as a
client. This way the setup is cleaner since its all contained and by
using `uv run --with` we can test both the server setup process too in
CI and release time.
2. The change to [1] surfaced some other issues
- running `llama stack run` was defaulting to conda env name
- provider data was not being managed properly
- Some notebook cells (telemetry for evals) were not updated with latest
changes
Fixed all the issues and update the notebook.
### Test
1. Manually run it all in local env
2. `pytest -v -s --nbval-lax docs/getting_started.ipynb`