The provider selling point *is* using the same provider for both.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
reduces duplication and centralizes information to be easier to find for
contributors
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
# What does this PR do?
switch sambanova inference adaptor to LiteLLM usage to simplify
integration and solve issues with current adaptor when streaming and
tool calling, models and templates updated
## Test Plan
pytest -s -v tests/integration/inference/test_text_inference.py
--stack-config=sambanova
--text-model=sambanova/Meta-Llama-3.3-70B-Instruct
pytest -s -v tests/integration/inference/test_vision_inference.py
--stack-config=sambanova
--vision-model=sambanova/Llama-3.2-11B-Vision-Instruct
# Issue
Closes#2073
# What does this PR do?
- Removes the `datasets.rst` from the list of document urls as it no
longer exists in torchtune. Referenced PR:
https://github.com/pytorch/torchtune/pull/1781
- Added a step to run `uv sync`. Previously, I would get the following
error:
```
➜ llama-stack git:(remove-deprecated-rst) uv venv --python 3.10
source .venv/bin/activate
Using CPython 3.10.13 interpreter at: /usr/bin/python3.10
Creating virtual environment at: .venv
Activate with: source .venv/bin/activate
(llama-stack) ➜ llama-stack git:(remove-deprecated-rst) INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run
zsh: llama: command not found...
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
To test: Run through `rag_agent` example in the `detailed_tutorial.md`
file.
[//]: # (## Documentation)
# What does this PR do?
For the Issue :-
#[2010](https://github.com/meta-llama/llama-stack/issues/2010)
Currently, if we try to connect the Llama stack server to a remote
Milvus instance that has TLS enabled, the connection fails because TLS
support is not implemented in the Llama stack codebase. As a result,
users are unable to use secured Milvus deployments out of the box.
After adding this , the user will be able to connect to remote::Milvus
which is TLS enabled .
if TLS enabled :-
```
vector_io:
- provider_id: milvus
provider_type: remote::milvus
config:
uri: "http://<host>:<port>"
token: "<user>:<password>"
secure: True
server_pem_path: "path/to/server.pem"
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
I have already tested it by connecting to a Milvus instance which is TLS
enabled and i was able to start llama stack server .
# What does this PR do?
The builtin implementation of code interpreter is not robust and has a
really weak sandboxing shell (the `bubblewrap` container). Given the
availability of better MCP code interpreter servers coming up, we should
use them instead of baking an implementation into the Stack and
expanding the vulnerability surface to the rest of the Stack.
This PR only does the removal. We will add examples with how to
integrate with MCPs in subsequent ones.
## Test Plan
Existing tests.
# What does this PR do?
This commit adds a new authentication system to the Llama Stack server
with support for Kubernetes and custom authentication providers. Key
changes include:
- Implemented KubernetesAuthProvider for validating Kubernetes service
account tokens
- Implemented CustomAuthProvider for validating tokens against external
endpoints - this is the same code that was already present.
- Added test for Kubernetes
- Updated server configuration to support authentication settings
- Added documentation for authentication configuration and usage
The authentication system supports:
- Bearer token validation
- Kubernetes service account token validation
- Custom authentication endpoints
## Test Plan
Setup a Kube cluster using Kind or Minikube.
Run a server with:
```
server:
port: 8321
auth:
provider_type: kubernetes
config:
api_server_url: http://url
ca_cert_path: path/to/cert (optional)
```
Run:
```
curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers
```
Or replace "my-user" with your service account.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Implemetation of NeMO Datastore register, unregister API.
Open Issues:
- provider_id gets set to `localfs` in client.datasets.register() as it
is specified in routing_tables.py: DatasetsRoutingTable
see: #1860
Currently I have passed `"provider_id":"nvidia"` in metadata and have
parsed that in `DatasetsRoutingTable`
(Not the best approach, but just a quick workaround to make it work for
now.)
## Test Plan
- Unit test cases: `pytest
tests/unit/providers/nvidia/test_datastore.py`
```bash
========================================================== test session starts ===========================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-0.26.0, nbval-0.11.0, metadata-3.1.1, html-4.1.1, cov-6.1.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items
tests/unit/providers/nvidia/test_datastore.py .. [100%]
============================================================ warnings summary ============================================================
====================================================== 2 passed, 1 warning in 0.84s ======================================================
```
cc: @dglogo, @mattf, @yanxi0830
adding the --gpu all flag to Docker run commands
for meta-reference-gpu distributions ensures models are loaded into GPU
instead of CPU.
Remove docs for meta-reference-quantized-gpu
The distribution was removed in #1887
but these files were left behind.
Fixes: #1798
# What does this PR do?
Fixes doc to add --gpu all command to docker run
[//]: # (If resolving an issue, uncomment and update the line below)
Closes#1798
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
verified in docker documentation but untested
---------
Signed-off-by: Derek Higgins <derekh@redhat.com>
# What does this PR do?
IBM watsonx ai added as the inference [#1741
](https://github.com/meta-llama/llama-stack/issues/1741)
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
---------
Co-authored-by: Sajikumar JS <sajikumar.js@ibm.com>
# What does this PR do?
Adds custom model registration functionality to NVIDIAInferenceAdapter
which let's the inference happen on:
- post-training model
- non-llama models in API Catalogue(behind
https://integrate.api.nvidia.com and endpoints compatible with
AyncOpenAI)
## Example Usage:
```python
from llama_stack.apis.models import Model, ModelType
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
client = LlamaStackAsLibraryClient("nvidia")
_ = client.initialize()
client.models.register(
model_id=model_name,
model_type=ModelType.llm,
provider_id="nvidia"
)
response = client.inference.chat_completion(
model_id=model_name,
messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Write a limerick about the wonders of GPU computing."}],
)
```
## Test Plan
```bash
pytest tests/unit/providers/nvidia/test_supervised_fine_tuning.py
========================================================== test session starts ===========================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 6 items
tests/unit/providers/nvidia/test_supervised_fine_tuning.py ...... [100%]
============================================================ warnings summary ============================================================
../miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076
/home/ubuntu/miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'contentEncoding'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================== 6 passed, 1 warning in 1.51s ======================================================
```
[//]: # (## Documentation)
Updated Readme.md
cc: @dglogo, @sumitb, @mattf
# What does this PR do?
This PR adds support for NVIDIA's NeMo Evaluator API to the Llama Stack
eval module. The integration enables users to evaluate models via the
Llama Stack interface.
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
1. Added unit tests and successfully ran from root of project:
`./scripts/unit-tests.sh tests/unit/providers/nvidia/test_eval.py`
```
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_cancel PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_result PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_status PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_register_benchmark PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_run_eval PASSED
```
2. Verified I could build the Llama Stack image: `LLAMA_STACK_DIR=$(pwd)
llama stack build --template nvidia --image-type venv`
Documentation added to
`llama_stack/providers/remote/eval/nvidia/README.md`
---------
Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
# What does this PR do?
the ramalama team has decided to rename their external provider
`ramalama-stack` (more catchy!). Update docs accordingly
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
Add examples for how to define RAGDocuments. Not sure if this is the
best place for these docs. @raghotham Please advise
## Test Plan
None, documentation
[//]: # (## Documentation)
Signed-off-by: Kevin <kpostlet@redhat.com>
# What does this PR do?
- Update NVIDIA documentation links to GA docs
- Remove reference to notebooks until merged
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
# What does this PR do?
This is helpful when debugging issues with vLLM + Llama Stack after this
PR https://github.com/vllm-project/vllm/pull/15593
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
This was not caught as part of the CI build:
dd62a2388c.
[This PR](https://github.com/meta-llama/llama-stack/pull/1354) was too
old and didn't include the additional CI builds yet.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
- Adds a note about unexpected Brave Search output appearing even when
Tavily Search is called. This behavior is expected for now and is a work
in progress https://github.com/meta-llama/llama-stack/issues/1229. The
note aims to clear any confusion for new users.
- Adds two example scripts demonstrating how to build an agent using:
1. WebSearch tool
2. WolframAlpha tool
These examples provide new users with an instant understanding of how to
integrate these tools.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Tested these example scripts using following steps:
step 1. `ollama run llama3.2:3b-instruct-fp16 --keepalive 60m`
step 2.
```
export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export LLAMA_STACK_PORT=8321
```
step 3: `llama stack run --image-type conda
~/llama-stack/llama_stack/templates/ollama/run.yaml`
step 4: run the example script with your api keys.
expected output:


[//]: # (## Documentation)
# What does this PR do?
Add NVIDIA platform docs that serve as a starting point for Llama Stack
users and explains all supported microservices.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
# What does this PR do?
PR adds instructions to setup vLLM remote endpoint for vllm-remote llama
stack distribution.
## Test Plan
* Verified with manual tests of the configured vllm-remote against vllm
endpoint running on the system with Intel GPU
* Also verified with ci pytests (see cmdline below). Test passes in the
same capacity as it does on the A10 Nvidia setup (some tests do fail
which seems to be known issues with vllm remote llama stack
distribution)
```
pytest -s -v tests/integration/inference/test_text_inference.py \
--stack-config=http://localhost:5001 \
--text-model=meta-llama/Llama-3.2-3B-Instruct
```
CC: @ashwinb
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
# What does this PR do?
TLDR: Changes needed to get 100% passing tests for OpenAI API
verification tests when run against Llama Stack with the `together`,
`fireworks`, and `openai` providers. And `groq` is better than before,
at 88% passing.
This cleans up the OpenAI API support for image message types
(specifically `image_url` types) and handling of the `response_format`
chat completion parameter. Both of these required a few more Pydantic
model definitions in our Inference API, just to move from the
not-quite-right stubs I had in place to something fleshed out to match
the actual OpenAI API specs.
As part of testing this, I also found and fixed a bug in the litellm
implementation of openai_completion and openai_chat_completion, so the
providers based on those should actually be working now.
The method `prepare_openai_completion_params` in
`llama_stack/providers/utils/inference/openai_compat.py` was improved to
actually recursively clean up input parameters, including handling of
lists, dicts, and dumping of Pydantic models to dicts. These changes
were required to get to 100% passing tests on the OpenAI API
verification against the `openai` provider.
With the above, the together.ai provider was passing as well as it is
without Llama Stack. But, since we have Llama Stack in the middle, I
took the opportunity to clean up the together.ai provider so that it now
also passes the OpenAI API spec tests we have at 100%. That means
together.ai is now passing our verification test better when using an
OpenAI client talking to Llama Stack than it is when hitting together.ai
directly, without Llama Stack in the middle.
And, another round of work for Fireworks to improve translation of
incoming OpenAI chat completion requests to Llama Stack chat completion
requests gets the fireworks provider passing at 100%. The server-side
fireworks.ai tool calling support with OpenAI chat completions and Llama
4 models isn't great yet, but by pointing the OpenAI clients at Llama
Stack's API we can clean things up and get everything working as
expected for Llama 4 models.
## Test Plan
### OpenAI API Verification Tests
I ran the OpenAI API verification tests as below and 100% of the tests
passed.
First, start a Llama Stack server that runs the `openai` provider with
the `gpt-4o` and `gpt-4o-mini` models deployed. There's not a template
setup to do this out of the box, so I added a
`tests/verifications/openai-api-verification-run.yaml` to do this.
First, ensure you have the necessary API key environment variables set:
```
export TOGETHER_API_KEY="..."
export FIREWORKS_API_KEY="..."
export OPENAI_API_KEY="..."
```
Then, run a Llama Stack server that serves up all these providers:
```
llama stack run \
--image-type venv \
tests/verifications/openai-api-verification-run.yaml
```
Finally, generate a new verification report against all these providers,
both with and without the Llama Stack server in the middle.
```
python tests/verifications/generate_report.py \
--run-tests \
--provider \
together \
fireworks \
groq \
openai \
together-llama-stack \
fireworks-llama-stack \
groq-llama-stack \
openai-llama-stack
```
You'll see that most of the configurations with Llama Stack in the
middle now pass at 100%, even though some of them do not pass at 100%
when hitting the backend provider's API directly with an OpenAI client.
### OpenAI Completion Integration Tests with vLLM:
I also ran the smaller `test_openai_completion.py` test suite (that's
not yet merged with the verification tests) on multiple of the
providers, since I had to adjust the method signature of
openai_chat_completion a bit and thus had to touch lots of these
providers to match. Here's the tests I ran there, all passing:
```
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run
```
in another terminal
```
LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct"
```
### OpenAI Completion Integration Tests with ollama
```
INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run
```
in another terminal
```
LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0"
```
### OpenAI Completion Integration Tests with together.ai
```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" llama stack build --template together --image-type venv --run
```
in another terminal
```
LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct-Turbo" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct-Turbo"
```
### OpenAI Completion Integration Tests with fireworks.ai
```
INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack build --template fireworks --image-type venv --run
```
in another terminal
```
LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.1-8B-Instruct"
---------
Signed-off-by: Ben Browning <bbrownin@redhat.com>
# What does this PR do?
current text for 'llama stack build' and 'llama stack run' says that if
no argument is passed to '--image-name' that the active Conda
environment will be used
in reality, the active enviroment is used whether it is from conda,
virtualenv, etc.
## Test Plan
N/A
## Documentation
N/A
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Small docs update and an update for `start-stack.sh` with missing color
and if statment logic.
# What does this PR do?
1. Makes a small change to start-stack.sh to resolve this error:
```cmd
/home/aireilly/.local/lib/python3.13/site-packages/llama_stack/distribution/start_stack.sh: line 76: [: missing ]'
```
2. Adds a missing $GREEN colour to start-stack.sh
3. Updated `docs/source/getting_started/detailed_tutorial.md` with some
small changes and corrections.
## Test Plan
Procedures described in
`docs/source/getting_started/detailed_tutorial.md` were verified on
Linux Fedora 41.
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Co-authored-by: Francisco Arceo <farceo@redhat.com>
# What does this PR do?
Incorporating some feedback into the docs.
- **`docs/source/getting_started/index.md`:**
- Demo actually does RAG now
- Simplified the installation command for dependencies.
- Updated demo script examples to align with the latest API changes.
- Replaced manual document manipulation with `RAGDocument` for clarity
and maintainability.
- Introduced new logic for model and embedding selection using the Llama
Stack Client SDK.
- Enhanced examples to showcase proper agent initialization and logging.
- **`docs/source/getting_started/detailed_tutorial.md`:**
- Updated the section for listing models to include proper code
formatting with `bash`.
- Removed and reorganized the "Run the Demos" section for clarity.
- Adjusted tab-item structures and added new instructions for demo
scripts.
- **`docs/_static/css/my_theme.css`:**
- Updated heading styles to include `h2`, `h3`, and `h4` for consistent
font weight.
- Added a new style for `pre` tags to wrap text and break long words,
this is particularly useful for rendering long output from generation.
## Test Plan
Tested locally. Screenshot for reference:
<img width="1250" alt="Screenshot 2025-04-10 at 10 12 12 PM"
src="https://github.com/user-attachments/assets/ce1c8986-e072-4c6f-a697-ed0d8fb75b34"
/>
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Fixes a couple of errors in PVC/Secret setup and adds context for
expected Hugging Face token
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
# What does this PR do?
Another doc enhancement for
https://github.com/meta-llama/llama-stack/issues/1818
Summary of changes:
- `docs/source/distributions/configuration.md`
- Updated dropdown title to include a more user-friendly description.
- `docs/_static/css/my_theme.css`
- Added styling for `<h3>` elements to set a normal font weight.
- `docs/source/distributions/starting_llama_stack_server.md`
- Changed section headers from bold text to proper markdown headers
(e.g., `##`).
- Improved descriptions for starting Llama Stack server using different
methods (library, container, conda, Kubernetes).
- Enhanced clarity and structure by converting instructions into
markdown headers and improved formatting.
- `docs/source/getting_started/index.md`
- Major restructuring of the "Quick Start" guide:
- Added new introductory section for Llama Stack and its capabilities.
- Reorganized steps into clearer subsections with proper markdown
headers.
- Replaced dropdowns with tabbed content for OS-specific instructions.
- Added detailed steps for setting up and running the Llama Stack server
and client.
- Introduced new sections for running basic inference and building
agents.
- Enhanced readability and visual structure with emojis, admonitions,
and examples.
- `docs/source/providers/index.md`
- Updated the list of LLM inference providers to include "Ollama."
- Expanded the list of vector databases to include "SQLite-Vec."
Let me know if you need further details!
## Test Plan
Renders locally, included screenshot.
# Documentation
For https://github.com/meta-llama/llama-stack/issues/1818
<img width="1332" alt="Screenshot 2025-04-09 at 11 07 12 AM"
src="https://github.com/user-attachments/assets/c106efb9-076c-4059-a4e0-a30fa738585b"
/>
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
* Manage UI deps in pyproject
* Use a new "ui" dep group to pull the deps with "uv"
* Simplify the run command
* Bump versions in requirements.txt
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
1. Adding some lightweight JS to detect the default browser setting for
dark/light mode
3. Setting default screen setting to light mode as to not change default
behavior.
From the docs: https://github.com/MrDogeBro/sphinx_rtd_dark_mode
>This lets you choose which theme the user sees when they load the docs
for the first time ever. After the first time however, this setting has
no effect as the users preference is stored in local storage within
their browser. This option accepts a boolean for the value. If this
option is true (the default option), users will start in dark mode when
first visiting the site. If this option is false, users will start in
light mode when they first visit the site.
# Closes#1915
## Test Plan
Tested locally on my Mac on Safari and Chrome.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
download the getting started w/ ollama model instead of downloading and
running it.
directly running it was necessary before
https://github.com/meta-llama/llama-stack/pull/1854
## Test Plan
run the code on the page
# What does this PR do?
Providers that live outside of the llama-stack codebase are now
supported.
A new property `external_providers_dir` has been added to the main
config and can be configured as follow:
```
external_providers_dir: /etc/llama-stack/providers.d/
```
Where the expected structure is:
```
providers.d/
inference/
custom_ollama.yaml
vllm.yaml
vector_io/
qdrant.yaml
```
Where `custom_ollama.yaml` is:
```
adapter:
adapter_type: custom_ollama
pip_packages: ["ollama", "aiohttp"]
config_class: llama_stack_ollama_provider.config.OllamaImplConfig
module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []
```
Obviously the package must be installed on the system, here is the
`llama_stack_ollama_provider` example:
```
$ uv pip show llama-stack-ollama-provider
Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv
Name: llama-stack-ollama-provider
Version: 0.1.0
Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages
Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider
Requires:
Required-by:
```
Closes: https://github.com/meta-llama/llama-stack/issues/658
Signed-off-by: Sébastien Han <seb@redhat.com>
Add the content to use AMD GPU as the vLLM server. Split the original
part to two sub chapters,
1. AMD vLLM server
2. NVIDIA vLLM server (orignal)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: Alex He <alehe@amd.com>
# What does this PR do?
## Test Plan
export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic;
LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference
--vision-model $MODEL --text-model $MODEL