docs: fix the docs for NVIDIA Inference Provider (#3055)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 4s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s
Python Package Build Test / build (3.12) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 21s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s
Pre-commit / pre-commit (push) Successful in 1m40s
Test Llama Stack Build / build (push) Failing after 14s

# What does this PR do?
Fix the NVIDIA inference docs by updating API methods, model IDs, and
embedding example.

## Test Plan
N/A
This commit is contained in:
Jiayi Ni 2025-08-08 02:27:55 -07:00 committed by GitHub
parent e90fe25890
commit 9e78f2da96
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 11 additions and 9 deletions

View file

@ -157,7 +157,7 @@ docker run \
If you've set up your local development environment, you can also build the image using your local virtual environment. If you've set up your local development environment, you can also build the image using your local virtual environment.
```bash ```bash
INFERENCE_MODEL=meta-llama/Llama-3.1-8b-Instruct INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
llama stack build --distro nvidia --image-type venv llama stack build --distro nvidia --image-type venv
llama stack run ./run.yaml \ llama stack run ./run.yaml \
--port 8321 \ --port 8321 \

View file

@ -129,7 +129,7 @@ docker run \
If you've set up your local development environment, you can also build the image using your local virtual environment. If you've set up your local development environment, you can also build the image using your local virtual environment.
```bash ```bash
INFERENCE_MODEL=meta-llama/Llama-3.1-8b-Instruct INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
llama stack build --distro nvidia --image-type venv llama stack build --distro nvidia --image-type venv
llama stack run ./run.yaml \ llama stack run ./run.yaml \
--port 8321 \ --port 8321 \

View file

@ -42,8 +42,8 @@ client.initialize()
### Create Completion ### Create Completion
```python ```python
response = client.completion( response = client.inference.completion(
model_id="meta-llama/Llama-3.1-8b-Instruct", model_id="meta-llama/Llama-3.1-8B-Instruct",
content="Complete the sentence using one word: Roses are red, violets are :", content="Complete the sentence using one word: Roses are red, violets are :",
stream=False, stream=False,
sampling_params={ sampling_params={
@ -56,8 +56,8 @@ print(f"Response: {response.content}")
### Create Chat Completion ### Create Chat Completion
```python ```python
response = client.chat_completion( response = client.inference.chat_completion(
model_id="meta-llama/Llama-3.1-8b-Instruct", model_id="meta-llama/Llama-3.1-8B-Instruct",
messages=[ messages=[
{ {
"role": "system", "role": "system",
@ -78,8 +78,10 @@ print(f"Response: {response.completion_message.content}")
### Create Embeddings ### Create Embeddings
```python ```python
response = client.embeddings( response = client.inference.embeddings(
model_id="meta-llama/Llama-3.1-8b-Instruct", contents=["foo", "bar", "baz"] model_id="nvidia/llama-3.2-nv-embedqa-1b-v2",
contents=["What is the capital of France?"],
task_type="query",
) )
print(f"Embeddings: {response.embeddings}") print(f"Embeddings: {response.embeddings}")
``` ```