# What does this PR do?
In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.
- [x] Addresses issue (#issue)
```
from .nvidia import NVIDIAInferenceAdapter
File "/localhome/local-cdgamarose/llama-stack/llama_stack/providers/remote/inference/nvidia/nvidia.py", line 37, in <module>
from .openai_utils import (
File "/localhome/local-cdgamarose/llama-stack/llama_stack/providers/remote/inference/nvidia/openai_utils.py", line 11, in <module>
from llama_models.llama3.api.datatypes import (
ImportError: cannot import name 'CompletionMessage' from 'llama_models.llama3.api.datatypes' (/localhome/local-cdgamarose/.local/lib/python3.10/site-packages/llama_models/llama3/api/datatypes.py)
++ error_handler 62
```
## Test Plan
Deploy NIM using docker from
https://build.nvidia.com/meta/llama-3_1-8b-instruct?snippet_tab=Docker
```
(lsmyenv) local-cdgamarose@a4u8g-0006:~/llama-stack$ python3 -m pytest -s -v --providers inference=nvidia llama_stack/providers/tests/inference/ --env NVIDIA_BASE_URL=http://localhost:8000 -k test_completion --inference-model Llama3.1-8B-Instruct
======================================================================================== test session starts =========================================================================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /localhome/local-cdgamarose/anaconda3/envs/lsmyenv/bin/python3
cachedir: .pytest_cache
rootdir: /localhome/local-cdgamarose/llama-stack
configfile: pyproject.toml
plugins: anyio-4.7.0, asyncio-0.25.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 24 items / 21 deselected / 3 selected
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[-nvidia] Initializing NVIDIAInferenceAdapter(http://localhost:8000)...
Checking NVIDIA NIM health...
Checking NVIDIA NIM health...
PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_logprobs[-nvidia] SKIPPED (Other inference providers don't support completion() yet)
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[-nvidia] SKIPPED (This test is not quite robust)
====================================================================== 1 passed, 2 skipped, 21 deselected, 2 warnings in 1.57s =======================================================================
```
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
# What does this PR do?
add the completion api to the nvidia inference provider
## Test Plan
while running the meta/llama-3.1-8b-instruct NIM from
https://build.nvidia.com/meta/llama-3_1-8b-instruct?snippet_tab=Docker
```
➜ pytest -s -v --providers inference=nvidia llama_stack/providers/tests/inference/ --env NVIDIA_BASE_URL=http://localhost:8000 -k test_completion --inference-model Llama3.1-8B-Instruct
=============================================== test session starts ===============================================
platform linux -- Python 3.10.15, pytest-8.3.3, pluggy-1.5.0 -- /home/matt/.conda/envs/stack/bin/python
cachedir: .pytest_cache
rootdir: /home/matt/Documents/Repositories/meta-llama/llama-stack
configfile: pyproject.toml
plugins: anyio-4.6.2.post1, asyncio-0.24.0, httpx-0.34.0
asyncio: mode=strict, default_loop_scope=None
collected 20 items / 18 deselected / 2 selected
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[-nvidia] PASSED
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[-nvidia] SKIPPED
============================= 1 passed, 1 skipped, 18 deselected, 6 warnings in 5.40s =============================
```
the structured output functionality works but the accuracy fails
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
# What does this PR do?
this PR adds a basic inference adapter to NVIDIA NIMs
what it does -
- chat completion api
- tool calls
- streaming
- structured output
- logprobs
- support hosted NIM on integrate.api.nvidia.com
- support downloaded NIM containers
what it does not do -
- completion api
- embedding api
- vision models
- builtin tools
- have certainty that sampling strategies are correct
## Feature/Issue validation/testing/test plan
`pytest -s -v --providers inference=nvidia
llama_stack/providers/tests/inference/ --env NVIDIA_API_KEY=...`
all tests should pass. there are pydantic v1 warnings.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Did you read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
- [x] Did you write any new necessary tests?
Thanks for contributing 🎉!