Commit graph

6 commits

Author SHA1 Message Date
Sébastien Han
6fa257b475
chore(lint): update Ruff ignores for project conventions and maintainability (#1184)
- Added new ignores from flake8-bugbear (`B007`, `B008`)
- Ignored `C901` (high function complexity) for now, pending review
- Maintained PyTorch conventions (`N812`, `N817`)
- Allowed `E731` (lambda assignments) for flexibility
- Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`)
- Documented rationale for each ignored rule

This keeps our linting aligned with project needs while tracking
potential fixes.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-28 09:36:49 -08:00
Yuan Tang
6824d23dc9
test: Only run embedding tests for remote::nvidia (#1317)
This fixes release build failure
3796497240:

```
=================================== FAILURES ===================================
______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-None] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-none] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-None] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-none] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-NONE] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-END] __________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-START] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-left] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-right] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
=========================== short test summary info ============================
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-None] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-none] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-None] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-none] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-NONE] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-END] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-START] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-left] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-right] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
= 9 failed, 48 passed, 2 skipped, 3 deselected, 3 xfailed, 1 xpassed, 121 warnings in 90.16s (0:01:30) =
Error: Process completed with exit code 1.
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 22:35:52 -05:00
Matthew Farrellee
e28cedd833
feat: add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation (#1213)
# What does this PR do?

updates nvidia inference provider's embedding implementation to use new
signature

add support for task_type, output_dimensions, text_truncation parameters

## Test Plan

`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model
baai/bge-m3`
2025-02-27 16:58:11 -08:00
Matthew Farrellee
46da187c07
fix: remove list of list tests, no longer relevant after #1161 (#1205)
# What does this PR do?

#1161 updated the embedding signature making the nested list tests
irrelevant
2025-02-21 08:07:35 -08:00
Matthew Farrellee
3099c5243f
fix: update URL import, URL -> ImageContentItemImageURL (#1204)
# What does this PR do?

fixes test to use new name for URL import

## Test Plan

`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model
baai/bge-m3`
2025-02-21 08:02:21 -08:00
Matthew Farrellee
832c535aaf
feat(providers): add NVIDIA Inference embedding provider and tests (#935)
# What does this PR do?

add /v1/inference/embeddings implementation to NVIDIA provider

**open topics** -
- *asymmetric models*. NeMo Retriever includes asymmetric models, which
are models that embed differently depending on if the input is destined
for storage or lookup against storage. the /v1/inference/embeddings api
does not allow the user to indicate the type of embedding to perform.
see https://github.com/meta-llama/llama-stack/issues/934
- *truncation*. embedding models typically have a limited context
window, e.g. 1024 tokens is common though newer models have 8k windows.
when the input is larger than this window the endpoint cannot perform
its designed function. two options: 0. return an error so the user can
reduce the input size and retry; 1. perform truncation for the user and
proceed (common strategies are left or right truncation). many users
encounter context window size limits and will struggle to write reliable
programs. this struggle is especially acute without access to the
model's tokenizer. the /v1/inference/embeddings api does not allow the
user to delegate truncation policy. see
https://github.com/meta-llama/llama-stack/issues/933
- *dimensions*. "Matryoshka" embedding models are available. they allow
users to control the number of embedding dimensions the model produces.
this is a critical feature for managing storage constraints. embeddings
of 1024 dimensions what achieve 95% recall for an application may not be
worth the storage cost if a 512 dimensions can achieve 93% recall.
controlling embedding dimensions allows applications to determine their
recall and storage tradeoffs. the /v1/inference/embeddings api does not
allow the user to control the output dimensions. see
https://github.com/meta-llama/llama-stack/issues/932

## Test Plan

- `llama stack run llama_stack/templates/nvidia/run.yaml`
- `LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model
baai/bge-m3`


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-20 16:59:48 -08:00