refactor: use extra_body to pass in input_type params for asymmetric embedding models for NVIDIA Inference Provider (#3804)

# What does this PR do?  Previously, the NVIDIA inference provider implemented a custom `openai_embeddings` method with a hardcoded `input_type="query"` parameter, which is required by NVIDIA asymmetric embedding models([https://github.com/llamastack/llama-stack/pull/3205](https://github.com/llamastack/llama-stack/pull/3205)). Recently `extra_body` parameter is added to the embeddings API ([https://github.com/llamastack/llama-stack/pull/3794](https://github.com/llamastack/llama-stack/pull/3794)). So, this PR updates the NVIDIA inference provider to use the base `OpenAIMixin.openai_embeddings` method instead and pass the `input_type` through the `extra_body` parameter for asymmetric embedding models.   ## Test Plan  Run the following command for the ```embedding_model```: ```nvidia/llama-3.2-nv-embedqa-1b-v2```, ```nvidia/nv-embedqa-e5-v5```, ```nvidia/nv-embedqa-mistral-7b-v2```, and ```snowflake/arctic-embed-l```. ``` pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model={embedding_model} --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" --inference-mode=record ```
2025-12-05 18:27:22 +00:00 · 2025-10-14 13:52:55 -07:00 · 2025-10-14 13:52:55 -07:00 · d875e427bf
commit d875e427bf
parent 866c13cdc2
3 changed files with 75 additions and 70 deletions
--- a/llama_stack/providers/remote/inference/nvidia/NVIDIA.md
+++ b/llama_stack/providers/remote/inference/nvidia/NVIDIA.md
@ -139,16 +139,13 @@ print(f"Structured Response: {structured_response.choices[0].message.content}")

 The following example shows how to create embeddings for an NVIDIA NIM.

-> [!NOTE]
-> NVIDIA asymmetric embedding models (e.g., `nvidia/llama-3.2-nv-embedqa-1b-v2`) require an `input_type` parameter not present in the standard OpenAI embeddings API. The NVIDIA Inference Adapter automatically sets `input_type="query"` when using the OpenAI-compatible embeddings endpoint for NVIDIA. For passage embeddings, use the `embeddings` API with `task_type="document"`.
-
 ```python
-response = client.inference.embeddings(
-    model_id="nvidia/llama-3.2-nv-embedqa-1b-v2",
-    contents=["What is the capital of France?"],
-    task_type="query",
+response = client.embeddings.create(
+    model="nvidia/llama-3.2-nv-embedqa-1b-v2",
+    input=["What is the capital of France?"],
+    extra_body={"input_type": "query"},
 )
-print(f"Embeddings: {response.embeddings}")
+print(f"Embeddings: {response.data}")
 ```

 ### Vision Language Models Example