llama-stack/llama_stack
Rashmi Pawar c169c164b3
fix: NVIDIA embedding results in InternalServerError (#1851)
Closes #1819 

## Test Plan

```bash
pytest -v tests/integration/inference/test_embedding.py  --stack-config=http://localhost:5002 --embedding-model=nvidia/llama-3.2-nv-embedqa-1b-v2
=============================================================================== test session starts ================================================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0 -- /home/ubuntu/miniconda/envs/nvidia-1/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 23 items                                                                                                                                                                 

tests/integration/inference/test_embedding.py::test_embedding_text[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[string]] PASSED                                                [  4%]
tests/integration/inference/test_embedding.py::test_embedding_text[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[text]] PASSED                                                  [  8%]
tests/integration/inference/test_embedding.py::test_embedding_image[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[url,base64]] XFAIL (nvidia/llama-3.2-nv-embedqa-1b-v2 doe...) [ 13%]
tests/integration/inference/test_embedding.py::test_embedding_image[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-list[url,string,base64,text]] XFAIL (nvidia/llama-3.2-nv-embed...) [ 17%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-end] PASSED                                              [ 21%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-start] PASSED                                            [ 26%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-short-end] PASSED                                             [ 30%]
tests/integration/inference/test_embedding.py::test_embedding_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-short-start] PASSED                                           [ 34%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-text-None] PASSED                                  [ 39%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-text-none] PASSED                                  [ 43%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-str-None] PASSED                                   [ 47%]
tests/integration/inference/test_embedding.py::test_embedding_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-long-str-none] PASSED                                   [ 52%]
tests/integration/inference/test_embedding.py::test_embedding_output_dimension[emb=nvidia/llama-3.2-nv-embedqa-1b-v2] PASSED                                                 [ 56%]
tests/integration/inference/test_embedding.py::test_embedding_task_type[emb=nvidia/llama-3.2-nv-embedqa-1b-v2] PASSED                                                        [ 60%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-None] PASSED                                             [ 65%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-none] PASSED                                             [ 69%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-end] PASSED                                              [ 73%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-start] PASSED                                            [ 78%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-NONE] PASSED                                       [ 82%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-END] PASSED                                        [ 86%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-START] PASSED                                      [ 91%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-left] PASSED                                       [ 95%]
tests/integration/inference/test_embedding.py::test_embedding_text_truncation_error[emb=nvidia/llama-3.2-nv-embedqa-1b-v2-right] PASSED                                      [100%]

===================================================================== 21 passed, 2 xfailed, 1 warning in 7.18s =====================================================================
```

[//]: # (## Documentation)

cc: @dglogo @mattf @sumitb
2025-04-01 13:31:29 +02:00
..
apis fix(api): don't return list for runtime tools (#1686) 2025-04-01 09:53:11 +02:00
cli fix: skip apis with no providers during llama stack build (#1835) 2025-03-29 08:39:35 -07:00
distribution fix(api): don't return list for runtime tools (#1686) 2025-04-01 09:53:11 +02:00
models/llama feat: Support "stop" parameter in remote:vLLM (#1715) 2025-03-24 12:42:55 -07:00
providers fix: NVIDIA embedding results in InternalServerError (#1851) 2025-04-01 13:31:29 +02:00
strong_typing fix: Support types.UnionType in schemas (#1721) 2025-03-20 09:54:02 -07:00
templates docs: fix remote-vllm instructions (#1805) 2025-03-27 10:19:51 -04:00
__init__.py export LibraryClient 2024-12-13 12:08:00 -08:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py chore: Remove style tags from log formatter (#1808) 2025-03-27 10:18:21 -04:00
schema_utils.py chore: make mypy happy with webmethod (#1758) 2025-03-22 08:17:23 -07:00