llama-stack-mirror/tests/integration/inference/test_openai_embeddings.py
slekkala1 8d8261961e
Some checks failed
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
API Conformance Tests / check-schema-compatibility (push) Successful in 6s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m17s
chore: Refactor fireworks to use OpenAIMixin (#3480)
# What does this PR do?
Refactor Fireworks to use OpenAIMixin

Closes https://github.com/llamastack/llama-stack/issues/3391
Related to https://github.com/llamastack/llama-stack/issues/3387

## Test Plan
```
(llama-stack) (base) swapna942@swapna942-mac llama-stack % FIREWORKS_API_KEY=**** ./scripts/integration-tests.sh --stack-config server:ci-tests --setup fireworks --subdirs inference --pattern openai

tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] 
instantiating llama_stack_client
Port 8321 is already in use, assuming server is already running...
llama_stack_client instantiated in 0.031s
PASSED [  2%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [  4%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [  6%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [  8%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 10%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 12%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 14%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 17%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 19%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 21%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:completion:sanity] PASSED [ 23%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:completion:suffix] SKIPPED [ 25%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:completion:sanity] PASSED [ 27%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-1] SKIPPED [ 29%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=accounts/fireworks/models/llama-v3p1-8b-instruct] SKIPPED [ 31%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 34%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_01] PASSED [ 36%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_01] PASSED [ 38%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True] PASSED [ 40%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True] PASSED [ 42%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=accounts/fireworks/models/llama-v3p1-8b-instruct] SKIPPED [ 44%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 46%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 48%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_float[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 51%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_dimensions[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 53%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 55%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_empty_list_error[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 57%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 59%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] PASSED [ 61%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_encoding_format_base64[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 63%]
tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5] SKIPPED [ 65%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=accounts/fireworks/models/llama-v3p1-8b-instruct-0] SKIPPED [ 68%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 70%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_02] PASSED [ 72%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_02] PASSED [ 74%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False] PASSED [ 76%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False] PASSED [ 78%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_01] PASSED [ 80%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_01] PASSED [ 82%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_01] PASSED [ 85%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True] PASSED [ 87%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True] PASSED [ 89%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_02] PASSED [ 91%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_02] PASSED [ 93%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:streaming_02] PASSED [ 95%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False] PASSED [ 97%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False] PASSED [100%]

========================================== slowest 10 durations ==========================================
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_multiple_strings[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-False]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[openai_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_with_user_parameter[openai_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-True]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_different_inputs_different_outputs[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=accounts/fireworks/models/llama-v3p1-8b-instruct-inference:chat_completion:non_streaming_02]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_single_string[llama_stack_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_base64_batch_processing[openai_client-emb=nomic-ai/nomic-embed-text-v1.5]
30.01s teardown tests/integration/inference/test_openai_embeddings.py::test_openai_embeddings_invalid_model_error[openai_client-emb=nomic-ai/nomic-embed-text-v1.5]
================= 36 passed, 11 skipped, 50 deselected, 4 warnings in 1429.05s (0:23:49) =================
+ exit_code=0
+ set +x
 All tests completed successfully
```
2025-09-22 13:19:36 -04:00

311 lines
12 KiB
Python

# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
import base64
import struct
import pytest
from openai import OpenAI
from llama_stack.core.library_client import LlamaStackAsLibraryClient
def decode_base64_to_floats(base64_string: str) -> list[float]:
"""Helper function to decode base64 string to list of float32 values."""
embedding_bytes = base64.b64decode(base64_string)
float_count = len(embedding_bytes) // 4 # 4 bytes per float32
embedding_floats = struct.unpack(f"{float_count}f", embedding_bytes)
return list(embedding_floats)
def provider_from_model(client_with_models, model_id):
models = {m.identifier: m for m in client_with_models.models.list()}
models.update({m.provider_resource_id: m for m in client_with_models.models.list()})
provider_id = models[model_id].provider_id
providers = {p.provider_id: p for p in client_with_models.providers.list()}
return providers[provider_id]
def skip_if_model_doesnt_support_user_param(client, model_id):
provider = provider_from_model(client, model_id)
if provider.provider_type in (
"remote::together", # service returns 400
"remote::fireworks", # service returns 400 malformed input
):
pytest.skip(f"Model {model_id} hosted by {provider.provider_type} does not support user param.")
def skip_if_model_doesnt_support_encoding_format_base64(client, model_id):
provider = provider_from_model(client, model_id)
if provider.provider_type in (
"remote::together", # param silently ignored, always returns floats
"remote::fireworks", # param silently ignored, always returns list of floats
):
pytest.skip(f"Model {model_id} hosted by {provider.provider_type} does not support encoding_format='base64'.")
def skip_if_model_doesnt_support_variable_dimensions(client_with_models, model_id):
provider = provider_from_model(client_with_models, model_id)
if provider.provider_type in (
"remote::together", # returns 400
"inline::sentence-transformers",
):
pytest.skip(
f"Model {model_id} hosted by {provider.provider_type} does not support variable output embedding dimensions."
)
if provider.provider_type == "remote::openai" and "text-embedding-3" not in model_id:
pytest.skip(
f"Model {model_id} hosted by {provider.provider_type} does not support variable output embedding dimensions."
)
@pytest.fixture(params=["openai_client", "llama_stack_client"])
def compat_client(request, client_with_models):
if request.param == "openai_client" and isinstance(client_with_models, LlamaStackAsLibraryClient):
pytest.skip("OpenAI client tests not supported with library client")
return request.getfixturevalue(request.param)
def skip_if_model_doesnt_support_openai_embeddings(client, model_id):
provider = provider_from_model(client, model_id)
if provider.provider_type in (
"inline::meta-reference",
"remote::bedrock",
"remote::cerebras",
"remote::databricks",
"remote::runpod",
"remote::sambanova",
"remote::tgi",
):
pytest.skip(f"Model {model_id} hosted by {provider.provider_type} doesn't support OpenAI embeddings.")
@pytest.fixture
def openai_client(client_with_models):
base_url = f"{client_with_models.base_url}/v1/openai/v1"
return OpenAI(base_url=base_url, api_key="fake")
def test_openai_embeddings_single_string(compat_client, client_with_models, embedding_model_id):
"""Test OpenAI embeddings endpoint with a single string input."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
input_text = "Hello, world!"
response = compat_client.embeddings.create(
model=embedding_model_id,
input=input_text,
encoding_format="float",
)
assert response.object == "list"
assert response.model == embedding_model_id
assert len(response.data) == 1
assert response.data[0].object == "embedding"
assert response.data[0].index == 0
assert isinstance(response.data[0].embedding, list)
assert len(response.data[0].embedding) > 0
assert all(isinstance(x, float) for x in response.data[0].embedding)
def test_openai_embeddings_multiple_strings(compat_client, client_with_models, embedding_model_id):
"""Test OpenAI embeddings endpoint with multiple string inputs."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
input_texts = ["Hello, world!", "How are you today?", "This is a test."]
response = compat_client.embeddings.create(
model=embedding_model_id,
input=input_texts,
encoding_format="float",
)
assert response.object == "list"
assert response.model == embedding_model_id
assert len(response.data) == len(input_texts)
for i, embedding_data in enumerate(response.data):
assert embedding_data.object == "embedding"
assert embedding_data.index == i
assert isinstance(embedding_data.embedding, list)
assert len(embedding_data.embedding) > 0
assert all(isinstance(x, float) for x in embedding_data.embedding)
def test_openai_embeddings_with_encoding_format_float(compat_client, client_with_models, embedding_model_id):
"""Test OpenAI embeddings endpoint with float encoding format."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
input_text = "Test encoding format"
response = compat_client.embeddings.create(
model=embedding_model_id,
input=input_text,
encoding_format="float",
)
assert response.object == "list"
assert len(response.data) == 1
assert isinstance(response.data[0].embedding, list)
assert all(isinstance(x, float) for x in response.data[0].embedding)
def test_openai_embeddings_with_dimensions(compat_client, client_with_models, embedding_model_id):
"""Test OpenAI embeddings endpoint with custom dimensions parameter."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
skip_if_model_doesnt_support_variable_dimensions(client_with_models, embedding_model_id)
input_text = "Test dimensions parameter"
dimensions = 16
response = compat_client.embeddings.create(
model=embedding_model_id,
input=input_text,
dimensions=dimensions,
)
assert response.object == "list"
assert len(response.data) == 1
# Note: Not all models support custom dimensions, so we don't assert the exact dimension
assert isinstance(response.data[0].embedding, list)
assert len(response.data[0].embedding) > 0
def test_openai_embeddings_with_user_parameter(compat_client, client_with_models, embedding_model_id):
"""Test OpenAI embeddings endpoint with user parameter."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
skip_if_model_doesnt_support_user_param(client_with_models, embedding_model_id)
input_text = "Test user parameter"
user_id = "test-user-123"
response = compat_client.embeddings.create(
model=embedding_model_id,
input=input_text,
user=user_id,
)
assert response.object == "list"
assert len(response.data) == 1
assert isinstance(response.data[0].embedding, list)
assert len(response.data[0].embedding) > 0
def test_openai_embeddings_empty_list_error(compat_client, client_with_models, embedding_model_id):
"""Test that empty list input raises an appropriate error."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
with pytest.raises(Exception): # noqa: B017
compat_client.embeddings.create(
model=embedding_model_id,
input=[],
)
def test_openai_embeddings_invalid_model_error(compat_client, client_with_models, embedding_model_id):
"""Test that invalid model ID raises an appropriate error."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
with pytest.raises(Exception): # noqa: B017
compat_client.embeddings.create(
model="invalid-model-id",
input="Test text",
)
def test_openai_embeddings_different_inputs_different_outputs(compat_client, client_with_models, embedding_model_id):
"""Test that different inputs produce different embeddings."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
input_text1 = "This is the first text"
input_text2 = "This is completely different content"
response1 = compat_client.embeddings.create(
model=embedding_model_id,
input=input_text1,
encoding_format="float",
)
response2 = compat_client.embeddings.create(
model=embedding_model_id,
input=input_text2,
encoding_format="float",
)
embedding1 = response1.data[0].embedding
embedding2 = response2.data[0].embedding
assert len(embedding1) == len(embedding2)
# Embeddings should be different for different inputs
assert embedding1 != embedding2
def test_openai_embeddings_with_encoding_format_base64(compat_client, client_with_models, embedding_model_id):
"""Test OpenAI embeddings endpoint with base64 encoding format."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
skip_if_model_doesnt_support_encoding_format_base64(client_with_models, embedding_model_id)
skip_if_model_doesnt_support_variable_dimensions(client_with_models, embedding_model_id)
input_text = "Test base64 encoding format"
dimensions = 12
response = compat_client.embeddings.create(
model=embedding_model_id,
input=input_text,
encoding_format="base64",
dimensions=dimensions,
)
# Validate response structure
assert response.object == "list"
assert len(response.data) == 1
# With base64 encoding, embedding should be a string, not a list
embedding_data = response.data[0]
assert embedding_data.object == "embedding"
assert embedding_data.index == 0
assert isinstance(embedding_data.embedding, str)
# Verify it's valid base64 and decode to floats
embedding_floats = decode_base64_to_floats(embedding_data.embedding)
# Verify we got valid floats
assert len(embedding_floats) == dimensions, f"Got embedding length {len(embedding_floats)}, expected {dimensions}"
assert all(isinstance(x, float) for x in embedding_floats)
def test_openai_embeddings_base64_batch_processing(compat_client, client_with_models, embedding_model_id):
"""Test OpenAI embeddings endpoint with base64 encoding for batch processing."""
skip_if_model_doesnt_support_openai_embeddings(client_with_models, embedding_model_id)
skip_if_model_doesnt_support_encoding_format_base64(client_with_models, embedding_model_id)
input_texts = ["First text for base64", "Second text for base64", "Third text for base64"]
response = compat_client.embeddings.create(
model=embedding_model_id,
input=input_texts,
encoding_format="base64",
)
# Validate response structure
assert response.object == "list"
assert response.model == embedding_model_id
assert len(response.data) == len(input_texts)
# Validate each embedding in the batch
embedding_dimensions = []
for i, embedding_data in enumerate(response.data):
assert embedding_data.object == "embedding"
assert embedding_data.index == i
# With base64 encoding, embedding should be a string, not a list
assert isinstance(embedding_data.embedding, str)
embedding_floats = decode_base64_to_floats(embedding_data.embedding)
assert len(embedding_floats) > 0
assert all(isinstance(x, float) for x in embedding_floats)
embedding_dimensions.append(len(embedding_floats))
# All embeddings should have the same dimensionality
assert all(dim == embedding_dimensions[0] for dim in embedding_dimensions)