# What does this PR do?
Imported `ToolConfig` from the `llama_stack.apis.inference` module to
resolve missing reference and ensure proper functionality within the
`groq.py` file.
Signed-off-by: Sébastien Han <seb@redhat.com>
## Test Plan
Without the change, pytest will run with the following error:
```
uv run pytest -v -s -k "ollama" llama_stack/providers/tests/
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 379 items / 1 error / 349 deselected / 30 selected
=================================================== ERRORS ===================================================
__________________ ERROR collecting llama_stack/providers/tests/inference/groq/test_init.py __________________
llama_stack/providers/tests/inference/groq/test_init.py:11: in <module>
from llama_stack.providers.remote.inference.groq.groq import GroqInferenceAdapter
llama_stack/providers/remote/inference/groq/groq.py:72: in <module>
class GroqInferenceAdapter(Inference, ModelRegistryHelper, NeedsRequestProviderData):
llama_stack/providers/remote/inference/groq/groq.py:102: in GroqInferenceAdapter
tool_config: Optional[ToolConfig] = None,
E NameError: name 'ToolConfig' is not defined
========================================== short test summary info ===========================================
ERROR llama_stack/providers/tests/inference/groq/test_init.py - NameError: name 'ToolConfig' is not defined
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================== 349 deselected, 22 warnings, 1 error in 0.28s ================================
```
With the change the test continues to run and fails with a different
error:
```
uv run pytest -v -s llama_stack/providers/tests/
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 342 items / 1 error
=================================================== ERRORS ===================================================
______________ ERROR collecting llama_stack/providers/tests/inference/test_vision_inference.py _______________
llama_stack/providers/tests/inference/test_vision_inference.py:29: in <module>
class TestVisionModelInference:
llama_stack/providers/tests/inference/test_vision_inference.py:35: in TestVisionModelInference
ImageContentItem(image=dict(data=PASTA_IMAGE)),
E pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
E image.data
E Input should be a valid string, unable to parse raw data as a unicode string [type=string_unicode, input_value=b'\xff\xd8\xff\xe0\x00\x1...0\xe6\x9f5\xb5?\xff\xd9', input_type=bytes]
E For further information visit https://errors.pydantic.dev/2.10/v/string_unicode
========================================== short test summary info ===========================================
ERROR llama_stack/providers/tests/inference/test_vision_inference.py - pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================= 22 warnings, 1 error in 0.25s ========================================
```
Which is fixed in https://github.com/meta-llama/llama-stack/pull/1003.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Sébastien Han <seb@redhat.com>
Fixes#966.
Verified that:
1. Correct list of APIs are printed out when running `llama stack
list-providers`
2. `llama stack list-providers <api>` works as expected.
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
Refactored tests by removing unused exception alias (as exc_info) in
pytest.raises, improving code clarity and reducing lint warnings.
exc_info was never used.
Signed-off-by: Sébastien Han <seb@redhat.com>
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Replaced references to `memory` with `vector_io` in
`DEFAULT_PROVIDER_COMBINATIONS` and adjusted corresponding fixture
imports to ensure proper configuration for vector I/O during tests. This
change aligns with the new testing structure.
Followup of https://github.com/meta-llama/llama-stack/pull/830 when the
memory fixture was removed.
Signed-off-by: Sébastien Han <seb@redhat.com>
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
very small fix I noticed some unused arguments, but this seems like the
easiest one to remove since its passed in explicitly.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.
This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.
- [ ] Addresses issue (#issue)
## Test Plan
LLAMA_STACK_CONFIG=together pytest
\-\-inference\-model=meta\-llama/Llama\-3\.3\-70B\-Instruct -s -v
tests/client-sdk/agents/test_agents.py::test_override_system_message_behavior
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
datasets.rst was removed from torchtune repo.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Replace a missing 404 document with another one that exists. (Removed it
from
the list when memory_optimizations.rst was already pulled.)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
the example script can gracefully exit if the boolean returned from
initialize is used properly
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.
This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.
- [ ] Addresses issue (#issue)
## Test Plan
python -m unittest
llama_stack.providers.tests.inference.test_prompt_adapter
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937).
* #938
* __->__ #937
This fixes the following timeout issue when installing PyTorch via uv.
Also see reference: https://github.com/astral-sh/uv/pull/1694,
https://github.com/astral-sh/uv/issues/1549
```
Installing pip dependencies
Using Python 3.10.16 environment at: /home/yutang/.conda/envs/distribution-myenv
× Failed to download and build `antlr4-python3-runtime==4.9.3`
├─▶ Failed to extract archive
├─▶ failed to unpack
│ `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
├─▶ failed to unpack
│ `antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py` into
│ `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
├─▶ error decoding response body
├─▶ request or response body error
╰─▶ operation timed out
help: `antlr4-python3-runtime` (v4.9.3) was included because `torchtune`
(v0.5.0) depends on `omegaconf` (v2.3.0) which depends on
`antlr4-python3-runtime>=4.9.dev0, <4.10.dev0`
Failed to build target distribution-myenv with return code 1
```
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
To work with the updated iOSCalendarAssistantWithLocalInf
[here](https://github.com/meta-llama/llama-stack-apps/compare/ios_local).
In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.
- [ ] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
add support to the NVIDIA Inference provider for image inputs
## Test Plan
1. Run local [Llama 3.2 11b vision
instruct](https://build.nvidia.com/meta/llama-3.2-11b-vision-instruct?snippet_tab=Docker)
NIM
2. Start a stack, e.g. `llama stack run
llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=http://localhost:8000`
3. Run image tests, e.g. `LLAMA_STACK_BASE_URL=http://localhost:8321
pytest -v tests/client-sdk/inference/test_inference.py
--vision-inference-model meta-llama/Llama-3.2-11B-Vision-Instruct -k
image`
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
## What does this PR do?
See issue: #747 -- `uv` is just plain better. This PR does the bare
minimum of replacing `pip install` by `uv pip install` and ensuring `uv`
exists in the environment.
## Test Plan
First: create new conda, `uv pip install -e .` on `llama-stack` -- all
is good.
Next: run `llama stack build --template together` followed by `llama
stack run together` -- all good
Next: run `llama stack build --template together --image-name yoyo`
followed by `llama stack run together --image-name yoyo` -- all good
Next: fresh conda and `uv pip install -e .` and `llama stack build
--template together --image-type venv` -- all good.
Docker: `llama stack build --template together --image-type container`
works!
This commit adds support for XPU and CPU devices into meta-reference
stack for text models. On creation stack automatically identifies which
device to use checking available accelerate capabilities in the
following order: CUDA, then XPU, finally CPU. This behaviour can be
overwritten with the `DEVICE` environment variable. In this case
explicitly specified device will be used.
Tested with:
```
torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference
```
Results:
* Tested on: system with single CUDA device, system with single XPU
device and on pure CPU system
* Results: all test pass except `test_completion_logprobs`
* `test_completion_logprobs` fails in the same way as on a baseline,
i.e. unrelated with this change: `AssertionError: Unexpected top_k=3`
Requires: https://github.com/meta-llama/llama-models/pull/233
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
# What does this PR do?
Fixes a bug where agents were not working when both rag and
code-interpreter were added as tools.
## Test Plan
Added a new client_sdk test which tests for this scenario
```
LLAMA_STACK_CONFIG=together pytest -s -v tests/client-sdk -k 'test_rag_and_code_agent'
```
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
# What does this PR do?
- Discussion in
https://github.com/meta-llama/llama-stack/pull/906#discussion_r1936260819
- image.data should accept base64 string as input instead of binary
bytes, change prompt_adapter to account for that.
## Test Plan
```
pytest -v tests/client-sdk/inference/test_inference.py
```
with test in https://github.com/meta-llama/llama-stack/pull/906
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
- Fix typo
- Support Llama 3.3 70B
## Test Plan
Run the following scripts and obtain the test results
Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={API_KEY}
```
Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED
=========================================== 1 passed, 1 warning in 1.26s ============================================
```
Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={API_KEY}
```
Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED
=========================================== 1 passed, 1 warning in 0.52s ============================================
```
## Sources
Please link relevant resources if necessary.
## Before submitting
- [N] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [Y] Updated relevant documentation.
- [N] Wrote necessary unit or integration tests.
# What does this PR do?
1) As per @mattf's suggestion, we want to mark the pytest as xfail for
providers that do not support the functionality. In this diff, we xfail
the logProbs inference tests for providers who does not support log
probs.
( log probs is only supported by together, fireworks and vllm)
2) Added logProbs support for together according to their developer
[doc](https://docs.together.ai/docs/logprobs).
## Test Plan
1) Together & Fireworks
```
export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/together/run.yaml
/opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py
```
```
tests/client-sdk/inference/test_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of planets in our solar system?-Earth] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of the planets that have rings around them?-Saturn] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_non_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
========================================================================================== 15 passed, 2 warnings in 19.46s ===========================================================================================
```
```
export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/fireworks/run.yaml
/opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py
```
All tests passed
2) Ollama - LogProbs tests are marked as xfailed.
```
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet)
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet)
```
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Fixes: #902
For the test verified that llama stack can run if built:
* With default "base" conda environment
* With new custom conda environment using `--image-name XXX` option
In both cases llama stack starts fine (was failing with "base") before
this patch.
CC: @ashwinb
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
We desperately need to document our APIs. This is the basic requirement
of having a Spec :)
This PR updates the OpenAPI generator so documentation for request
parameters and object fields can be properly added to the OpenAPI specs.
From there, this should get picked by Stainless, etc.
## Test Plan:
Updated client-sdk (See
https://github.com/meta-llama/llama-stack-client-python/pull/104) and
then ran:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=../../llama_stack/templates/fireworks/run.yaml pytest -s -v inference/test_inference.py agents/test_agents.py
```
# What does this PR do?
allows template distribution connect to hosted or local NIM:
use --env NVIDIA_BASE_URL=http://localhost:8000 to connect to a local
NIM running at localhost:8000
use --env NVIDIA_API_KEY=blah when connecting to hosted NIM, e.g.
NVIDIA_BASE_URL=https://integrate.api.nvidia.com
## Test Plan
- `llama stack run ./llama_stack/templates/nvidia/run.yaml` -> error,
e.g. API key is required for hosted NVIDIA NIM
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=https://integrate.api.nvidia.com` -> error, e.g. API key
is required for hosted NVIDIA NIM
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_API_KEY=REDACTED` -> successful connection to NIM on
https://integrate.api.nvidia.com
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=https://integrate.api.nvidia.com --env
NVIDIA_API_KEY=REDACTED` -> successful connection to NIM running on
integrate.api.nvidia.com
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=http://localhost:8000` -> successful connection to NIM
running on localhost:8000
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=http://localhost:8000 --env NVIDIA_API_KEY=REDACTED` ->
successful connection to NIM running on http://localhost:8000
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=http://bogus` -> runtime error, e.g. ConnectionError
(TODO: this should be a startup error)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
fix type mismatch in /v1/inference/completion
## Test Plan
`llama stack run ./llama_stack/templates/nvidia/run.yaml`
`LLAMA_STACK_BASE_URL="http://localhost:8321" pytest -v
tests/client-sdk/inference/test_inference.py`
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Chroma method had the wrong signature.
## Test Plan
Start Chroma: `chroma run --path /tmp/foo/chroma2 --host localhost
--port 6001`
Modify run.yaml to include Chroma server pointing to localhost:6001 and
run `llama stack run`
Then:
```bash
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v agents/test_agents.py -k rag
```
passes
# What does this PR do?
Add win platform run command for stack
- [x] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
https://github.com/meta-llama/llama-stack/pull/889
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
This PR implements windows platform support for build_container.sh
execution from terminal. Additionally, it resolves "no support for
Terminos and PTY for Window PC" issues.
- [x] Addresses issue (#issue)
Releates issues: https://github.com/meta-llama/llama-stack/issues/826,
https://github.com/meta-llama/llama-stack/issues/726
## Test Plan
Changes were tested manually by executing standard scripts from LLama
guide:
- llama stack build --template ollama --image-type container
- llama stack build --list-templates
- llama stack build
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
Add response format for agents structured output.
- [ ] Using structured output for agents (interior_design app as an
example) (#issue)
https://github.com/meta-llama/llama-stack-apps/issues/122
## Test Plan
E2E test plan with llama-stack-apps interior_design
Please describe:
Test ran:
- provide instructions so it can be reproduced.
Start your distro:
llama stack run llama_stack/templates/fireworks/run.yaml --env
FIREWORKS_API_KEY=<API_KEY>
Run api test:
```PYTHONPATH=. python examples/interior_design_assistant/api.py localhost 5000 examples/interior_design_assistant/resources/documents/ examples/interior_design_assistant/resources/images/fireplaces```
## Sources
Results:
https://github.com/meta-llama/llama-stack-client-python/pull/72
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
fixed report generation:
1) do not initialize a new client in report.py - instead get it from
pytest fixture
2) Add "provider" for "safety" and "agents" section
3) add logprobs functionality in "inference" section
## Test Plan
See the regenerated report
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
- Fix loading SambaNovaImpl issue
- Add LlamaGuard model support for inference
## Test Plan
Run the following unit test scripts and results
### Embedding
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models)
llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_batch_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models)
=================================================================================================================== 2 skipped, 1 warning in 0.32s ===================================================================================================================
```
### Vision
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_vision_inference.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image0-expected_strings0] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image1-expected_strings1] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-sambanova] PASSED
=================================================================================================================== 3 passed, 1 warning in 2.68s ====================================================================================================================
```
### Text
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED
=================================================================================================================== 1 passed, 1 warning in 0.46s ====================================================================================================================
```
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED
=================================================================================================================== 1 passed, 1 warning in 0.48s ====================================================================================================================
```
## Before submitting
- [] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [Y] Updated relevant documentation.
- [Y] Wrote necessary unit or integration tests.
# What does this PR do?
When you re-initialize the library client in a notebook, we were seeing
this error:
```
Getting traces for session_id=5c8d1969-0957-49d2-b852-32cbb8ef8caf
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-11-d74bb6cdd3ab>](https://localhost:8080/#) in <cell line: 0>()
7 agent_logs = []
8
----> 9 for span in client.telemetry.query_spans(
10 attribute_filters=[
11 {"key": "session_id", "op": "eq", "value": session_id},
10 frames
[/usr/local/lib/python3.11/dist-packages/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py](https://localhost:8080/#) in query_traces(self, attribute_filters, limit, offset, order_by)
246 ) -> QueryTracesResponse:
247 return QueryTracesResponse(
--> 248 data=await self.trace_store.query_traces(
249 attribute_filters=attribute_filters,
250 limit=limit,
AttributeError: 'TelemetryAdapter' object has no attribute 'trace_store'
```
This is happening because the we were skipping some required steps for
the object state as part of the global _TRACE_PROVIDER check. This PR
moves the initialization of the object state out of the TRACE_PROVIDER
init.