# What does this PR do?
- this is a flaky test dependent on model output
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
<img width="853" alt="image"
src="https://github.com/user-attachments/assets/e7607877-22a9-48e3-adac-e991d1070ec0"
/>
[//]: # (## Documentation)
# What does this PR do?
Adding chunk_size_in_tokens to playground rag_tool insert.
# Closes#1825
## Test Plan
Tested locally.
[//]: # (## Documentation)
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
This is the second attempt to switch to system packages by default. Now
with a hack to detect conda environment - in which case conda image-type
is used.
Note: Conda will only be used when --image-name is unset *and*
CONDA_DEFAULT_ENV is set. This means that users without conda will
correctly fall back to using system packages when no --image-* arguments
are passed at all.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Uses virtualenv:
```
$ llama stack build --template ollama --image-type venv
$ llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using virtual environment: /home/ec2-user/src/llama-stack/schedule/.local
[...]
```
Uses system packages (virtualenv already initialized):
```
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
INFO 2025-03-27 20:46:22,882 llama_stack.cli.stack.run:142 server: No image type or image name provided. Assuming environment packages.
[...]
```
Attempt to run from environment packages without necessary packages
installed:
```
$ python -m venv barebones
$ . ./barebones/bin/activate
$ pip install -e . # to install llama command
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
ModuleNotFoundError: No module named 'fastapi'
```
^ failed as expected because the environment doesn't have necessary
packages installed.
Now install some packages in the new environment:
```
$ pip install fastapi opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp aiosqlite ollama openai datasets faiss-cpu mcp autoevals
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```
Now see if setting CONDA_DEFAULT_ENV will change what happens by
default:
```
$ export CONDA_DEFAULT_ENV=base
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using conda environment: base
Conda environment base does not exist.
[...]
```
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
- hide distro doc (docker needs to be thoroughly tested).
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
- docs
[//]: # (## Documentation)
# What does this PR do?
No violators are currently in-tree. This is just hardening the api specs
for future consistency.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
* Fix location of `run.yaml` relative to the cloned llama stack
repository
* Drop `-it` from `docker run` commands as its not needed running
services
## Test Plan
* Verified running the llama stack following updated instruction
CC: @ashwinb
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
# What does this PR do?
Set a formatter for log file handler that does not pollute log messages
with color tags.
## Test Plan
Successfully tested with `LLAMA_STACK_LOG_FILE=server.log llama stack
run ...`
# What does this PR do?
- we no longer query vector db when uploading documents as attachments
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
pytest --stack-config="http://localhost:8321" -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct
```
```
pytest --stack-config=fireworks -v tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.3-70B-Instruct --record-responses
```
<img width="1160" alt="image"
src="https://github.com/user-attachments/assets/90700f79-c002-4474-bb41-7bc0a39dc91c"
/>
[//]: # (## Documentation)
# What does this PR do?
Re-enable isort enforcement.
It was disabled in 1a73f8305b, probably by
mistake.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
- the distribution folder is referencing template, and have dead docker
compose scripts
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[//]: # (## Documentation)
Fixes multiple issues
1. llama stack build of dependencies was breaking with incompatible
numpy / pandas when importing datasets
Moved the notebook to start a local server instead of using library as a
client. This way the setup is cleaner since its all contained and by
using `uv run --with` we can test both the server setup process too in
CI and release time.
2. The change to [1] surfaced some other issues
- running `llama stack run` was defaulting to conda env name
- provider data was not being managed properly
- Some notebook cells (telemetry for evals) were not updated with latest
changes
Fixed all the issues and update the notebook.
### Test
1. Manually run it all in local env
2. `pytest -v -s --nbval-lax docs/getting_started.ipynb`
# What does this PR do?
This is to stay consistent with other APIs.
This change registers files in API, even though there are still no
providers. Removing tests that require a provider existing for a merged
API to enable it in API layer.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
- as title
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
notebook
[//]: # (## Documentation)
# What does this PR do?
Enable mypy pydantic plugin.
Since the project heavily relies on pydantic models, it's probably wise
to enable the plugin to avoid some potential spurious violation warnings
the further we expand mypy coverage for the code base.
It should be generally risk-free to enable the plugin for the repo.
Some info on what plugin brings to the table:
https://docs.pydantic.dev/latest/integrations/mypy/#mypy-plugin-capabilities
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
This updates the changelog manually for now until we fix the changelog
workflow that requires change in repo settings (see [my comment in
Discord](1354127000)).
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
This PR adds support for NVIDIA's NeMo Customizer API to the Llama Stack
post-training module. The integration enables users to fine-tune models
using NVIDIA's cloud-based customization service through a consistent
Llama Stack interface.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Yet to be done
Things pending under this PR:
- [x] Integration of fine-tuned model(new checkpoint) for inference with
nvidia llm distribution
- [x] distribution integration of API
- [x] Add test cases for customizer(In Progress)
- [x] Documentation
```
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/post_training/test_supervised_fine_tuning.py
============================================================================================================================================================================ test session starts =============================================================================================================================================================================
platform linux -- Python 3.10.0, pytest-8.3.4, pluggy-1.5.0 -- /home/ubuntu/llama-stack/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.0', 'Platform': 'Linux-6.8.0-1021-gcp-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'nbval': '0.11.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'html': '4.1.1', 'asyncio': '0.25.3'}}
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: nbval-0.11.0, metadata-3.1.1, anyio-4.8.0, html-4.1.1, asyncio-0.25.3
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items
tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_post_training_provider_registration[txt=8B] PASSED [ 50%]
tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_list_training_jobs[txt=8B] PASSED [100%]
======================================================================================================================================================================== 2 passed, 1 warning in 0.10s ========================================================================================================================================================================
```
cc: @mattf @dglogo @sumitb
---------
Co-authored-by: Ubuntu <ubuntu@llama-stack-customizer-dev-inst-2tx95fyisatvlic4we8hidx5tfj.us-central1-a.c.brevdevprod.internal>
# What does this PR do?
* Removes the use of `huggingface-cli`
* Simplifies HF cache mount path
* Simplifies vLLM server startup command
* Separates PVC/secret creation from deployment/service
* Fixes a typo: "pod" should be "deployment"
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
This PR updates the sqlite-vec database calls to be non-blocking. Note
that each operation creates a new connection, which incurs some
performance overhead but is reasonable given [SQLite's threading and
connections constraints](https://www.sqlite.org/threadsafe.html).
Summary of changes:
- Refactored `SQLiteVecIndex` class to store database path instead of
connection object
- Added `_create_sqlite_connection()` helper function to create
connections on demand
- Ensured proper connection closure in all database operations
- Fixed test fixtures to use a file-based SQLite database for
thread-safety
- Updated the `SQLiteVecVectorIOAdapter` class to handle per-operation
connections
This PR helps chip away at
https://github.com/meta-llama/llama-stack/issues/1489
## Test Plan
sqlite-vec unit tests passed locally as well as a test script using the
client as a library.
## Misc
FYI @varshaprasad96 @kevincogan
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
**What**
Instead of adhoc creating a vectordb and chunking when documents ae sent
as an attachment to agent turn, we directly pass raw text from document
into messages to model for user context, and let model perform
summarization directly.
This removes the magic behaviour, and yields better performance than
existing approach.
**Improved Performance**
- RAG lifecycle notebook
- Model: 0.3 factuality score
- (+ websearch) Agent: 0.44 factuality score
- (+ vector db) Agent: 0.3 factuality score
- (+ raw context) Agent: 0.6 factuality score
Closes https://github.com/meta-llama/llama-stack/issues/1478
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
- [NEW] added section in RAG lifecycle notebook shows better performance
<img width="840" alt="image"
src="https://github.com/user-attachments/assets/a0c4e816-809a-41c0-9124-89825983e3f5"
/>
[//]: # (## Documentation)
# What does this PR do?
Gets rid of errors like the below, which is on all webmethod decorated
functions
llama_stack/apis/agents/agents.py:398: error: Value of type variable "T"
of function cannot be "Callable[[Agents, AgentConfig], Coroutine[Any,
Any, AgentCreateResponse]]" [type-var]
## Test Plan
Run mypy and observes mypy errors gone
# What does this PR do?
1) Uses otel compatible id generation for stack
2) Stack starts returning trace id info in the header of response
3) We inject the same trace id that we have into otel in order to force
it to use our trace ids.
## Test Plan
```
curl -i --request POST \
--url http://localhost:8321/v1/inference/chat-completion \
--header 'content-type: application/json' \
--data '{
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "where do humans live"
}
}
],
"stream": false
}'
HTTP/1.1 200 OK
date: Fri, 21 Mar 2025 21:51:19 GMT
server: uvicorn
content-length: 1712
content-type: application/json
x-trace-id: 595101ede31ece116ebe35b26d67e8cf
{"metrics":[{"metric":"prompt_tokens","value":10,"unit":null},{"metric":"completion_tokens","value":320,"unit":null},{"metric":"total_tokens","value":330,"unit":null}],"completion_message":{"role":"assistant","content":"Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. **Continents:** Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. **Countries:** There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. **Cities and towns:** Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. **Rural areas:** Some humans live in rural areas, such as villages, farms, and countryside.\n5. **Islands:** Humans inhabit many islands around the world, including tropical islands, island nations, and islands in the Arctic and Antarctic regions.\n6. **Underwater habitats:** A few humans live in underwater habitats, such as research stations and submarines.\n7. **Space:** A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nIn terms of specific environments, humans live in a wide range of ecosystems, including:\n\n* Deserts\n* Forests\n* Grasslands\n* Mountains\n* Oceans\n* Rivers\n* Tundras\n* Wetlands\n\nOverall, humans are incredibly adaptable and can be found living in almost every corner of the globe.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null}
```
Same trace id in Jaeger and sqlite:


# What does this PR do?
## Test Plan
LLAMA_STACK_CONFIG=dev pytest -s -v
tests/integration/agents/test_agents.py::test_custom_tool
--safety-shield meta-llama/Llama-Guard-3-8B --text-model
accounts/fireworks/models/llama-v3p1-8b-instruct
and verify trace in jaeger UI
https://llama-stack.readthedocs.io/en/latest/building_applications/telemetry.html#
# What does this PR do?
- We cannot directly return a literal type
> Note: this is not final jobs API change
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
<img width="837" alt="image"
src="https://github.com/user-attachments/assets/18a17561-35f9-443d-987d-54afdd6ff40c"
/>
[//]: # (## Documentation)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Typo fix for `output_dir` flag and misspelling of aggregate
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
N/A
[//]: # (## Documentation)
Required to startup a distribution with prompt guard
Closes: #1723
## Test Plan
distribution starts with patch applied
Signed-off-by: Derek Higgins <derekh@redhat.com>
# What does this PR do?
This is to avoid errors like the following when running inference
integration tests:
```
ERROR tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=8B-inference:completion:stop_sequence] - llama_stack.distribution.stack.EnvVarError: Environment variable 'VLLM_URL' not set or empty at providers.inference[0].config.url
```
It's also good to have a default, which is consistent with vLLM API
server.
## Test Plan
Integration tests can run without the error above.
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>