llama-stack/llama_stack/providers/remote
Rashmi Pawar c169c164b3
fix: NVIDIA embedding results in InternalServerError (#1851)
Closes #1819 

## Test Plan

```bash
pytest -v tests/integration/inference/test_embedding.py  --stack-config=http://localhost:5002 --embedding-model=nvidia/llama-3.2-nv-embedqa-1b-v2
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0 -- /home/ubuntu/miniconda/envs/nvidia-1/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 23 items                                                                                                                                                                 

tests/integration/inference/test_embedding.py::test_embedding_text[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[string]] PASSED                                                [  4%]
tests/integration/inference/test_embedding.py::test_embedding_text[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[text]] PASSED                                                  [  8%]
tests/integration/inference/test_embedding.py::test_embedding_image[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[url,base64]] XFAIL (nvidia/llama-3.2-nv-embedqa-1b-v2 doe...) [ 13%]
tests/integration/inference/test_embedding.py::test_embedding_image[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[url,string,base64,text]] XFAIL (nvidia/llama-3.2-nv-embed...) [ 17%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-end] PASSED                                              [ 21%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-start] PASSED                                            [ 26%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-short-end] PASSED                                             [ 30%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-short-start] PASSED                                           [ 34%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-text-None] PASSED                                  [ 39%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-text-none] PASSED                                  [ 43%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-str-None] PASSED                                   [ 47%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-str-none] PASSED                                   [ 52%]
tests/integration/inference/test_embedding.py::test_embedding_output_dimension[emb=nvidia/llama-3.2-nv-embedqa-1b-v2] PASSED                                                 [ 56%]
tests/integration/inference/test_embedding.py::test_embedding_task_type[emb=nvidia/llama-3.2-nv-embedqa-1b-v2] PASSED                                                        [ 60%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-None] PASSED                                             [ 65%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-none] PASSED                                             [ 69%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-end] PASSED                                              [ 73%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-start] PASSED                                            [ 78%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-NONE] PASSED                                       [ 82%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-END] PASSED                                        [ 86%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-START] PASSED                                      [ 91%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-left] PASSED                                       [ 95%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-right] PASSED                                      [100%]

===================================================================== 21 passed, 2 xfailed, 1 warning in 7.18s =====================================================================
```

[//]: # (## Documentation)

cc: @dglogo @mattf @sumitb
2025-04-01 13:31:29 +02:00
..
agents test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
datasetio refactor: extract pagination logic into shared helper function (#1770) 2025-03-31 13:08:29 -07:00
inference fix: NVIDIA embedding results in InternalServerError (#1851) 2025-04-01 13:31:29 +02:00
post_training feat: Add nemo customizer (#1448) 2025-03-25 11:01:10 -07:00
safety feat: added nvidia as safety provider (#1248) 2025-03-17 14:39:23 -07:00
tool_runtime fix(api): don't return list for runtime tools (#1686) 2025-04-01 09:53:11 +02:00
vector_io chore: Updating Milvus Client calls to be non-blocking (#1830) 2025-03-28 22:14:07 -04:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00