llama-stack/llama_stack/providers/remote/inference/nvidia
Rashmi Pawar c169c164b3
fix: NVIDIA embedding results in InternalServerError (#1851)
Closes #1819 

## Test Plan

```bash
pytest -v tests/integration/inference/test_embedding.py  --stack-config=http://localhost:5002 --embedding-model=nvidia/llama-3.2-nv-embedqa-1b-v2
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0 -- /home/ubuntu/miniconda/envs/nvidia-1/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 23 items                                                                                                                                                                 

tests/integration/inference/test_embedding.py::test_embedding_text[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[string]] PASSED                                                [  4%]
tests/integration/inference/test_embedding.py::test_embedding_text[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[text]] PASSED                                                  [  8%]
tests/integration/inference/test_embedding.py::test_embedding_image[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[url,base64]] XFAIL (nvidia/llama-3.2-nv-embedqa-1b-v2 doe...) [ 13%]
tests/integration/inference/test_embedding.py::test_embedding_image[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[url,string,base64,text]] XFAIL (nvidia/llama-3.2-nv-embed...) [ 17%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-end] PASSED                                              [ 21%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-start] PASSED                                            [ 26%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-short-end] PASSED                                             [ 30%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-short-start] PASSED                                           [ 34%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-text-None] PASSED                                  [ 39%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-text-none] PASSED                                  [ 43%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-str-None] PASSED                                   [ 47%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-str-none] PASSED                                   [ 52%]
tests/integration/inference/test_embedding.py::test_embedding_output_dimension[emb=nvidia/llama-3.2-nv-embedqa-1b-v2] PASSED                                                 [ 56%]
tests/integration/inference/test_embedding.py::test_embedding_task_type[emb=nvidia/llama-3.2-nv-embedqa-1b-v2] PASSED                                                        [ 60%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-None] PASSED                                             [ 65%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-none] PASSED                                             [ 69%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-end] PASSED                                              [ 73%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-start] PASSED                                            [ 78%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-NONE] PASSED                                       [ 82%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-END] PASSED                                        [ 86%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-START] PASSED                                      [ 91%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-left] PASSED                                       [ 95%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-right] PASSED                                      [100%]

===================================================================== 21 passed, 2 xfailed, 1 warning in 7.18s =====================================================================
```

[//]: # (## Documentation)

cc: @dglogo @mattf @sumitb
2025-04-01 13:31:29 +02:00
..
__init__.py add NVIDIA NIM inference adapter (#355) 2024-11-23 15:59:00 -08:00
config.py chore: move all Llama Stack types from llama-models to llama-stack (#1098) 2025-02-14 09:10:59 -08:00
models.py fix: register provider model name and HF alias in run.yaml (#1304) 2025-02-27 16:39:23 -08:00
nvidia.py fix: NVIDIA embedding results in InternalServerError (#1851) 2025-04-01 13:31:29 +02:00
openai_utils.py chore(lint): update Ruff ignores for project conventions and maintainability (#1184) 2025-02-28 09:36:49 -08:00
utils.py style: remove prints in codebase (#1146) 2025-02-18 19:41:37 -08:00