# What does this PR do?
Moved model availability check logic into a dedicated
check_model_availability function. Eliminated redundant code by reusing
the helper function in both embedding and non-embedding model
registration.
Signed-off-by: Sébastien Han <seb@redhat.com>
## Test Plan
Run Ollama and serve 2 models to get most the unit test pass:
```
ollama run llama3.2:3b-instruct-fp16 --keepalive 2m &
ollama run llama3.1:8b --keepalive 2m &
```
Run the unit test:
```
uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 65 items / 60 deselected / 5 selected
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_unsupported_model[-ollama] PASSED [ 20%]
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_nonexistent_model[-ollama] PASSED [ 40%]
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] FAILED [ 60%]
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_initialize_model_during_registering[-ollama] FAILED [ 80%]
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_invalid_llama_model[-ollama] PASSED [100%]
================================================== FAILURES ==================================================
_______________________ TestModelRegistration.test_register_with_llama_model[-ollama] ________________________
llama_stack/providers/tests/inference/test_model_registration.py:54: in test_register_with_llama_model
_ = await models_impl.register_model(
llama_stack/providers/utils/telemetry/trace_protocol.py:91: in async_wrapper
result = await method(self, *args, **kwargs)
llama_stack/distribution/routers/routing_tables.py:245: in register_model
registered_model = await self.register_object(model)
llama_stack/distribution/routers/routing_tables.py:192: in register_object
registered_obj = await register_object_with_provider(obj, p)
llama_stack/distribution/routers/routing_tables.py:53: in register_object_with_provider
return await p.register_model(obj)
llama_stack/providers/utils/telemetry/trace_protocol.py:91: in async_wrapper
result = await method(self, *args, **kwargs)
llama_stack/providers/remote/inference/ollama/ollama.py:368: in register_model
await check_model_availability(model.provider_resource_id)
llama_stack/providers/remote/inference/ollama/ollama.py:359: in check_model_availability
raise ValueError(
E ValueError: Model 'custom-model' is not available in Ollama. Available models: llama3.1:8b, llama3.2:3b-instruct-fp16
__________________ TestModelRegistration.test_initialize_model_during_registering[-ollama] ___________________
llama_stack/providers/tests/inference/test_model_registration.py:85: in test_initialize_model_during_registering
mock_load_model.assert_called_once()
/opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/unittest/mock.py:956: in assert_called_once
raise AssertionError(msg)
E AssertionError: Expected 'load_model' to have been called once. Called 0 times.
-------------------------------------------- Captured stderr call --------------------------------------------
W0207 11:55:26.777000 90854 .venv/lib/python3.13/site-packages/torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
========================================== short test summary info ===========================================
FAILED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] - ValueError: Model 'custom-model' is not available in Ollama. Available models: llama3.1:8b, llama3.2:3b-i...
FAILED llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_initialize_model_during_registering[-ollama] - AssertionError: Expected 'load_model' to have been called once. Called 0 times.
=========================== 2 failed, 3 passed, 60 deselected, 2 warnings in 1.84s ===========================
```
We only "care" about the `test_register_nonexistent_model` for this
code.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
currently this is the output when you run a distribution locally without
running `llama stack build`:
```
Traceback (most recent call last):
File "/Users/charliedoern/Documents/llama-sdk.py", line 25, in <module>
models = client.models.list()
^^^^^^^^^^^^^^^^^^^^
File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 107, in list
raise exc
File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 95, in list
return self._get(
^^^^^^^^^^
File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/_base_client.py", line 1212, in get
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 168, in request
return asyncio.run(self.async_client.request(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 258, in request
if not self.endpoint_impls:
^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncLlamaStackAsLibraryClient' object has no attribute 'endpoint_impls'
```
the intended exception is never raised, add an except for an
AttributeError so users can catch when they call things like
`models.list()` and so that a more useful error telling them that the
client is not properly initialized is printed.
## Test Plan
Please describe:
- I ran the script found here:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#run-inference-with-python-sdk
locally with the changes in this PR and the exception was caught
successfully.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Previously, the test was failing due to a pydantic validation error
caused by passing raw binary image data instead of a valid Unicode
string. This fix encodes the image data as base64, ensuring it is a
valid string format compatible with `ImageContentItem`.
Error:
```
______________ ERROR collecting llama_stack/providers/tests/inference/test_vision_inference.py _______________
llama_stack/providers/tests/inference/test_vision_inference.py:31: in <module>
class TestVisionModelInference:
llama_stack/providers/tests/inference/test_vision_inference.py:37: in TestVisionModelInference
ImageContentItem(image=dict(data=PASTA_IMAGE)),
E pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
E image.data
E Input should be a valid string, unable to parse raw data as a unicode string [type=string_unicode, input_value=b'\xff\xd8\xff\xe0\x00\x1...0\xe6\x9f5\xb5?\xff\xd9', input_type=bytes]
E For further information visit
https://errors.pydantic.dev/2.10/v/string_unicode
```
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Execute the following:
```
ollama run llama3.2-vision --keepalive 2m &
uv run pytest -v -s -k "ollama" --inference-model=llama3.2-vision:latest llama_stack/providers/tests/inference/test_vision_inference.py
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image0-expected_strings0] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-ollama-image1-expected_strings1] FAILED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-ollama] FAILED
```
The last two tests are failing because Cloudflare blocked me from
accessing
https://www.healthypawspetinsurance.com/Images/V3/DogAndPuppyInsurance/Dog_CTA_Desktop_HeroImage.jpg
but this has no impact on the current fix.
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Enables HTTPS option for Llama Stack.
While doing so, introduces a `ServerConfig` sub-structure to house all
server related configuration (port, ssl, etc.)
Also simplified the `start_container.sh` entrypoint to simply be
`python` instead of a complex bash command line.
## Test Plan
Conda:
Run:
```bash
$ llama stack build --template together
$ llama stack run --port 8322 # ensure server starts
$ llama-stack-client configure --endpoint http://localhost:8322
$ llama-stack-client models list
```
Create a self-signed SSL key / cert pair. Then, using a local checkout
of `llama-stack-client-python`, change
https://github.com/meta-llama/llama-stack-client-python/blob/main/src/llama_stack_client/_base_client.py#L759
to add `kwargs.setdefault("verify", False)` so SSL verification is
disabled. Then:
```bash
$ llama stack run --port 8322 --tls-keyfile <KEYFILE> --tls-certfile <CERTFILE>
$ llama-stack-client configure --endpoint https://localhost:8322 # notice the `https`
$ llama-stack-client models list
```
Also tested with containers (but of course one needs to make sure the
cert and key files are appropriately provided to the container.)
# What does this PR do?
Imported `ToolConfig` from the `llama_stack.apis.inference` module to
resolve missing reference and ensure proper functionality within the
`groq.py` file.
Signed-off-by: Sébastien Han <seb@redhat.com>
## Test Plan
Without the change, pytest will run with the following error:
```
uv run pytest -v -s -k "ollama" llama_stack/providers/tests/
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 379 items / 1 error / 349 deselected / 30 selected
=================================================== ERRORS ===================================================
__________________ ERROR collecting llama_stack/providers/tests/inference/groq/test_init.py __________________
llama_stack/providers/tests/inference/groq/test_init.py:11: in <module>
from llama_stack.providers.remote.inference.groq.groq import GroqInferenceAdapter
llama_stack/providers/remote/inference/groq/groq.py:72: in <module>
class GroqInferenceAdapter(Inference, ModelRegistryHelper, NeedsRequestProviderData):
llama_stack/providers/remote/inference/groq/groq.py:102: in GroqInferenceAdapter
tool_config: Optional[ToolConfig] = None,
E NameError: name 'ToolConfig' is not defined
========================================== short test summary info ===========================================
ERROR llama_stack/providers/tests/inference/groq/test_init.py - NameError: name 'ToolConfig' is not defined
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================== 349 deselected, 22 warnings, 1 error in 0.28s ================================
```
With the change the test continues to run and fails with a different
error:
```
uv run pytest -v -s llama_stack/providers/tests/
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
============================================ test session starts =============================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 342 items / 1 error
=================================================== ERRORS ===================================================
______________ ERROR collecting llama_stack/providers/tests/inference/test_vision_inference.py _______________
llama_stack/providers/tests/inference/test_vision_inference.py:29: in <module>
class TestVisionModelInference:
llama_stack/providers/tests/inference/test_vision_inference.py:35: in TestVisionModelInference
ImageContentItem(image=dict(data=PASTA_IMAGE)),
E pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
E image.data
E Input should be a valid string, unable to parse raw data as a unicode string [type=string_unicode, input_value=b'\xff\xd8\xff\xe0\x00\x1...0\xe6\x9f5\xb5?\xff\xd9', input_type=bytes]
E For further information visit https://errors.pydantic.dev/2.10/v/string_unicode
========================================== short test summary info ===========================================
ERROR llama_stack/providers/tests/inference/test_vision_inference.py - pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageContentItem
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================= 22 warnings, 1 error in 0.25s ========================================
```
Which is fixed in https://github.com/meta-llama/llama-stack/pull/1003.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Sébastien Han <seb@redhat.com>
Fixes#966.
Verified that:
1. Correct list of APIs are printed out when running `llama stack
list-providers`
2. `llama stack list-providers <api>` works as expected.
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
Refactored tests by removing unused exception alias (as exc_info) in
pytest.raises, improving code clarity and reducing lint warnings.
exc_info was never used.
Signed-off-by: Sébastien Han <seb@redhat.com>
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Replaced references to `memory` with `vector_io` in
`DEFAULT_PROVIDER_COMBINATIONS` and adjusted corresponding fixture
imports to ensure proper configuration for vector I/O during tests. This
change aligns with the new testing structure.
Followup of https://github.com/meta-llama/llama-stack/pull/830 when the
memory fixture was removed.
Signed-off-by: Sébastien Han <seb@redhat.com>
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
very small fix I noticed some unused arguments, but this seems like the
easiest one to remove since its passed in explicitly.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.
This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.
- [ ] Addresses issue (#issue)
## Test Plan
LLAMA_STACK_CONFIG=together pytest
\-\-inference\-model=meta\-llama/Llama\-3\.3\-70B\-Instruct -s -v
tests/client-sdk/agents/test_agents.py::test_override_system_message_behavior
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
datasets.rst was removed from torchtune repo.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Replace a missing 404 document with another one that exists. (Removed it
from
the list when memory_optimizations.rst was already pulled.)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
the example script can gracefully exit if the boolean returned from
initialize is used properly
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.
This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.
- [ ] Addresses issue (#issue)
## Test Plan
python -m unittest
llama_stack.providers.tests.inference.test_prompt_adapter
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937).
* #938
* __->__ #937
This fixes the following timeout issue when installing PyTorch via uv.
Also see reference: https://github.com/astral-sh/uv/pull/1694,
https://github.com/astral-sh/uv/issues/1549
```
Installing pip dependencies
Using Python 3.10.16 environment at: /home/yutang/.conda/envs/distribution-myenv
× Failed to download and build `antlr4-python3-runtime==4.9.3`
├─▶ Failed to extract archive
├─▶ failed to unpack
│ `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
├─▶ failed to unpack
│ `antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py` into
│ `/home/yutang/.cache/uv/sdists-v7/.tmpDWX4iK/antlr4-python3-runtime-4.9.3/src/antlr4/ListTokenSource.py`
├─▶ error decoding response body
├─▶ request or response body error
╰─▶ operation timed out
help: `antlr4-python3-runtime` (v4.9.3) was included because `torchtune`
(v0.5.0) depends on `omegaconf` (v2.3.0) which depends on
`antlr4-python3-runtime>=4.9.dev0, <4.10.dev0`
Failed to build target distribution-myenv with return code 1
```
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
To work with the updated iOSCalendarAssistantWithLocalInf
[here](https://github.com/meta-llama/llama-stack-apps/compare/ios_local).
In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.
- [ ] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
add support to the NVIDIA Inference provider for image inputs
## Test Plan
1. Run local [Llama 3.2 11b vision
instruct](https://build.nvidia.com/meta/llama-3.2-11b-vision-instruct?snippet_tab=Docker)
NIM
2. Start a stack, e.g. `llama stack run
llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=http://localhost:8000`
3. Run image tests, e.g. `LLAMA_STACK_BASE_URL=http://localhost:8321
pytest -v tests/client-sdk/inference/test_inference.py
--vision-inference-model meta-llama/Llama-3.2-11B-Vision-Instruct -k
image`
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
## What does this PR do?
See issue: #747 -- `uv` is just plain better. This PR does the bare
minimum of replacing `pip install` by `uv pip install` and ensuring `uv`
exists in the environment.
## Test Plan
First: create new conda, `uv pip install -e .` on `llama-stack` -- all
is good.
Next: run `llama stack build --template together` followed by `llama
stack run together` -- all good
Next: run `llama stack build --template together --image-name yoyo`
followed by `llama stack run together --image-name yoyo` -- all good
Next: fresh conda and `uv pip install -e .` and `llama stack build
--template together --image-type venv` -- all good.
Docker: `llama stack build --template together --image-type container`
works!
This commit adds support for XPU and CPU devices into meta-reference
stack for text models. On creation stack automatically identifies which
device to use checking available accelerate capabilities in the
following order: CUDA, then XPU, finally CPU. This behaviour can be
overwritten with the `DEVICE` environment variable. In this case
explicitly specified device will be used.
Tested with:
```
torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference
```
Results:
* Tested on: system with single CUDA device, system with single XPU
device and on pure CPU system
* Results: all test pass except `test_completion_logprobs`
* `test_completion_logprobs` fails in the same way as on a baseline,
i.e. unrelated with this change: `AssertionError: Unexpected top_k=3`
Requires: https://github.com/meta-llama/llama-models/pull/233
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
# What does this PR do?
Fixes a bug where agents were not working when both rag and
code-interpreter were added as tools.
## Test Plan
Added a new client_sdk test which tests for this scenario
```
LLAMA_STACK_CONFIG=together pytest -s -v tests/client-sdk -k 'test_rag_and_code_agent'
```
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
# What does this PR do?
- Discussion in
https://github.com/meta-llama/llama-stack/pull/906#discussion_r1936260819
- image.data should accept base64 string as input instead of binary
bytes, change prompt_adapter to account for that.
## Test Plan
```
pytest -v tests/client-sdk/inference/test_inference.py
```
with test in https://github.com/meta-llama/llama-stack/pull/906
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
- Fix typo
- Support Llama 3.3 70B
## Test Plan
Run the following scripts and obtain the test results
Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={API_KEY}
```
Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED
=========================================== 1 passed, 1 warning in 1.26s ============================================
```
Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={API_KEY}
```
Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED
=========================================== 1 passed, 1 warning in 0.52s ============================================
```
## Sources
Please link relevant resources if necessary.
## Before submitting
- [N] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [Y] Updated relevant documentation.
- [N] Wrote necessary unit or integration tests.
# What does this PR do?
1) As per @mattf's suggestion, we want to mark the pytest as xfail for
providers that do not support the functionality. In this diff, we xfail
the logProbs inference tests for providers who does not support log
probs.
( log probs is only supported by together, fireworks and vllm)
2) Added logProbs support for together according to their developer
[doc](https://docs.together.ai/docs/logprobs).
## Test Plan
1) Together & Fireworks
```
export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/together/run.yaml
/opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py
```
```
tests/client-sdk/inference/test_inference.py::test_text_completion_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of planets in our solar system?-Earth] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_non_streaming[meta-llama/Llama-3.1-8B-Instruct-What are the names of the planets that have rings around them?-Saturn] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What's the name of the Sun in latin?-Sol] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_streaming[meta-llama/Llama-3.1-8B-Instruct-What is the name of the US captial?-Washington] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_non_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_streaming[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
tests/client-sdk/inference/test_inference.py::test_image_chat_completion_base64_url[meta-llama/Llama-3.2-11B-Vision-Instruct] PASSED
========================================================================================== 15 passed, 2 warnings in 19.46s ===========================================================================================
```
```
export LLAMA_STACK_CONFIG=/Users/sxyi/llama-stack/llama_stack/templates/fireworks/run.yaml
/opt/miniconda3/envs/stack/bin/pytest -s -v /Users/sxyi/llama-stack/tests/client-sdk/inference/test_inference.py
```
All tests passed
2) Ollama - LogProbs tests are marked as xfailed.
```
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_non_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet)
tests/client-sdk/inference/test_inference.py::test_completion_log_probs_streaming[meta-llama/Llama-3.1-8B-Instruct] XFAIL (remote::ollama doesn't support log probs yet)
```
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Fixes: #902
For the test verified that llama stack can run if built:
* With default "base" conda environment
* With new custom conda environment using `--image-name XXX` option
In both cases llama stack starts fine (was failing with "base") before
this patch.
CC: @ashwinb
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
We desperately need to document our APIs. This is the basic requirement
of having a Spec :)
This PR updates the OpenAPI generator so documentation for request
parameters and object fields can be properly added to the OpenAPI specs.
From there, this should get picked by Stainless, etc.
## Test Plan:
Updated client-sdk (See
https://github.com/meta-llama/llama-stack-client-python/pull/104) and
then ran:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=../../llama_stack/templates/fireworks/run.yaml pytest -s -v inference/test_inference.py agents/test_agents.py
```
# What does this PR do?
allows template distribution connect to hosted or local NIM:
use --env NVIDIA_BASE_URL=http://localhost:8000 to connect to a local
NIM running at localhost:8000
use --env NVIDIA_API_KEY=blah when connecting to hosted NIM, e.g.
NVIDIA_BASE_URL=https://integrate.api.nvidia.com
## Test Plan
- `llama stack run ./llama_stack/templates/nvidia/run.yaml` -> error,
e.g. API key is required for hosted NVIDIA NIM
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=https://integrate.api.nvidia.com` -> error, e.g. API key
is required for hosted NVIDIA NIM
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_API_KEY=REDACTED` -> successful connection to NIM on
https://integrate.api.nvidia.com
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=https://integrate.api.nvidia.com --env
NVIDIA_API_KEY=REDACTED` -> successful connection to NIM running on
integrate.api.nvidia.com
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=http://localhost:8000` -> successful connection to NIM
running on localhost:8000
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=http://localhost:8000 --env NVIDIA_API_KEY=REDACTED` ->
successful connection to NIM running on http://localhost:8000
- `llama stack run ./llama_stack/templates/nvidia/run.yaml --env
NVIDIA_BASE_URL=http://bogus` -> runtime error, e.g. ConnectionError
(TODO: this should be a startup error)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
fix type mismatch in /v1/inference/completion
## Test Plan
`llama stack run ./llama_stack/templates/nvidia/run.yaml`
`LLAMA_STACK_BASE_URL="http://localhost:8321" pytest -v
tests/client-sdk/inference/test_inference.py`
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Chroma method had the wrong signature.
## Test Plan
Start Chroma: `chroma run --path /tmp/foo/chroma2 --host localhost
--port 6001`
Modify run.yaml to include Chroma server pointing to localhost:6001 and
run `llama stack run`
Then:
```bash
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v agents/test_agents.py -k rag
```
passes
# What does this PR do?
Add win platform run command for stack
- [x] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
https://github.com/meta-llama/llama-stack/pull/889
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
This PR implements windows platform support for build_container.sh
execution from terminal. Additionally, it resolves "no support for
Terminos and PTY for Window PC" issues.
- [x] Addresses issue (#issue)
Releates issues: https://github.com/meta-llama/llama-stack/issues/826,
https://github.com/meta-llama/llama-stack/issues/726
## Test Plan
Changes were tested manually by executing standard scripts from LLama
guide:
- llama stack build --template ollama --image-type container
- llama stack build --list-templates
- llama stack build
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
Add response format for agents structured output.
- [ ] Using structured output for agents (interior_design app as an
example) (#issue)
https://github.com/meta-llama/llama-stack-apps/issues/122
## Test Plan
E2E test plan with llama-stack-apps interior_design
Please describe:
Test ran:
- provide instructions so it can be reproduced.
Start your distro:
llama stack run llama_stack/templates/fireworks/run.yaml --env
FIREWORKS_API_KEY=<API_KEY>
Run api test:
```PYTHONPATH=. python examples/interior_design_assistant/api.py localhost 5000 examples/interior_design_assistant/resources/documents/ examples/interior_design_assistant/resources/images/fireplaces```
## Sources
Results:
https://github.com/meta-llama/llama-stack-client-python/pull/72
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
fixed report generation:
1) do not initialize a new client in report.py - instead get it from
pytest fixture
2) Add "provider" for "safety" and "agents" section
3) add logprobs functionality in "inference" section
## Test Plan
See the regenerated report
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.