phoenix-oss/llama-stack-mirror

Fork 1

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-08-16 14:38:00 +00:00

Jiayi Ni 9e78f2da96

Integration Tests (Replay) / discover-tests (push) Successful in 4s

Details

Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s

Details

SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s

Details

Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped

Details

Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s

Details

Test Llama Stack Build / build-single-provider (push) Failing after 11s

Details

Test Llama Stack Build / generate-matrix (push) Successful in 14s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s

Details

Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s

Details

SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s

Details

Test External API and Providers / test-external (venv) (push) Failing after 11s

Details

Unit Tests / unit-tests (3.12) (push) Failing after 12s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s

Details

Python Package Build Test / build (3.12) (push) Failing after 23s

Details

Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s

Details

Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s

Details

Unit Tests / unit-tests (3.13) (push) Failing after 9s

Details

Update ReadTheDocs / update-readthedocs (push) Failing after 9s

Details

Python Package Build Test / build (3.13) (push) Failing after 21s

Details

Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s

Details

Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s

Details

Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s

Details

Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s

Details

Pre-commit / pre-commit (push) Successful in 1m40s

Details

Test Llama Stack Build / build (push) Failing after 14s

Details

docs: fix the docs for NVIDIA Inference Provider (#3055 )

# What does this PR do?
Fix the NVIDIA inference docs by updating API methods, model IDs, and
embedding example.

## Test Plan
N/A

2025-08-08 11:27:55 +02:00

2 KiB

Raw Permalink Blame History

NVIDIA Inference Provider for LlamaStack

This provider enables running inference using NVIDIA NIM.

Features

Endpoints for completions, chat completions, and embeddings for registered models

Getting Started

Prerequisites

LlamaStack with NVIDIA configuration
Access to NVIDIA NIM deployment
NIM for model to use for inference is deployed

Setup

Build the NVIDIA environment:

llama stack build --distro nvidia --image-type venv

Basic Usage using the LlamaStack Python Client

Initialize the client

import os

os.environ["NVIDIA_API_KEY"] = (
    ""  # Required if using hosted NIM endpoint. If self-hosted, not required.
)
os.environ["NVIDIA_BASE_URL"] = "http://nim.test"  # NIM URL

from llama_stack.core.library_client import LlamaStackAsLibraryClient

client = LlamaStackAsLibraryClient("nvidia")
client.initialize()

Create Completion

response = client.inference.completion(
    model_id="meta-llama/Llama-3.1-8B-Instruct",
    content="Complete the sentence using one word: Roses are red, violets are :",
    stream=False,
    sampling_params={
        "max_tokens": 50,
    },
)
print(f"Response: {response.content}")

Create Chat Completion

response = client.inference.chat_completion(
    model_id="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {
            "role": "system",
            "content": "You must respond to each message with only one word",
        },
        {
            "role": "user",
            "content": "Complete the sentence using one word: Roses are red, violets are:",
        },
    ],
    stream=False,
    sampling_params={
        "max_tokens": 50,
    },
)
print(f"Response: {response.completion_message.content}")

Create Embeddings

response = client.inference.embeddings(
    model_id="nvidia/llama-3.2-nv-embedqa-1b-v2",
    contents=["What is the capital of France?"],
    task_type="query",
)
print(f"Embeddings: {response.embeddings}")

2 KiB Raw Permalink Blame History

NVIDIA Inference Provider for LlamaStack

Features

Getting Started

Prerequisites

Setup

Basic Usage using the LlamaStack Python Client

Initialize the client

Create Completion

Create Chat Completion

Create Embeddings

2 KiB

Raw Permalink Blame History