llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-16 23:03:49 +00:00

History

Jiayi Ni d875e427bf Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / generate-matrix (push) Successful in 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Test Llama Stack Build / build (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 9s Details API Conformance Tests / check-schema-compatibility (push) Successful in 16s Details UI Tests / ui-tests (22) (push) Successful in 33s Details Pre-commit / pre-commit (push) Successful in 1m33s Details refactor: use `extra_body` to pass in `input_type` params for asymmetric embedding models for NVIDIA Inference Provider (#3804 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Previously, the NVIDIA inference provider implemented a custom `openai_embeddings` method with a hardcoded `input_type="query"` parameter, which is required by NVIDIA asymmetric embedding models([https://github.com/llamastack/llama-stack/pull/3205](https://github.com/llamastack/llama-stack/pull/3205)). Recently `extra_body` parameter is added to the embeddings API ([https://github.com/llamastack/llama-stack/pull/3794](https://github.com/llamastack/llama-stack/pull/3794)). So, this PR updates the NVIDIA inference provider to use the base `OpenAIMixin.openai_embeddings` method instead and pass the `input_type` through the `extra_body` parameter for asymmetric embedding models. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run the following command for the ```embedding_model```: ```nvidia/llama-3.2-nv-embedqa-1b-v2```, ```nvidia/nv-embedqa-e5-v5```, ```nvidia/nv-embedqa-mistral-7b-v2```, and ```snowflake/arctic-embed-l```. ``` pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model={embedding_model} --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" --inference-mode=record ```		2025-10-14 13:52:55 -07:00
..
anthropic	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
azure	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
bedrock	feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794 )	2025-10-12 19:01:52 -07:00
cerebras	feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794 )	2025-10-12 19:01:52 -07:00
databricks	feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers (#3777 )	2025-10-10 16:21:44 -07:00
fireworks	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
gemini	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
groq	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
llama_openai_compat	feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794 )	2025-10-12 19:01:52 -07:00
nvidia	refactor: use `extra_body` to pass in `input_type` params for asymmetric embedding models for NVIDIA Inference Provider (#3804 )	2025-10-14 13:52:55 -07:00
ollama	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
openai	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
passthrough	feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794 )	2025-10-12 19:01:52 -07:00
runpod	feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers (#3777 )	2025-10-10 16:21:44 -07:00
sambanova	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
tgi	feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794 )	2025-10-12 19:01:52 -07:00
together	feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794 )	2025-10-12 19:01:52 -07:00
vertexai	feat: use SecretStr for inference provider auth credentials (#3724 )	2025-10-10 07:32:50 -07:00
vllm	feat(api)!: BREAKING CHANGE: support passing `extra_body` through to providers (#3777 )	2025-10-10 16:21:44 -07:00
watsonx	fix: Fixed WatsonX remote inference provider (#3801 )	2025-10-14 14:52:32 +02:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00