forked from phoenix-oss/llama-stack-mirror
# What does this PR do? This provides an initial [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses) implementation. The API is not yet complete, and this is more a proof-of-concept to show how we can store responses in our key-value stores and use them to support the Responses API concepts like `previous_response_id`. ## Test Plan I've added a new `tests/integration/openai_responses/test_openai_responses.py` as part of a test-driven development for this new API. I'm only testing this locally with the remote-vllm provider for now, but it should work with any of our inference providers since the only API it requires out of the inference provider is the `openai_chat_completion` endpoint. ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ llama stack build --template remote-vllm --image-type venv --run ``` ``` LLAMA_STACK_CONFIG="http://localhost:8321" \ python -m pytest -v \ tests/integration/openai_responses/test_openai_responses.py \ --text-model "meta-llama/Llama-3.2-3B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
40 lines
1.2 KiB
Python
40 lines
1.2 KiB
Python
# Copyright (c) Meta Platforms, Inc. and affiliates.
|
|
# All rights reserved.
|
|
#
|
|
# This source code is licensed under the terms described in the LICENSE file in
|
|
# the root directory of this source tree.
|
|
|
|
import json
|
|
import pathlib
|
|
|
|
|
|
class TestCase:
|
|
_apis = [
|
|
"inference/chat_completion",
|
|
"inference/completion",
|
|
"openai/responses",
|
|
]
|
|
_jsonblob = {}
|
|
|
|
def __init__(self, name):
|
|
# loading all test cases
|
|
if self._jsonblob == {}:
|
|
for api in self._apis:
|
|
with open(pathlib.Path(__file__).parent / f"{api}.json", "r") as f:
|
|
coloned = api.replace("/", ":")
|
|
try:
|
|
loaded = json.load(f)
|
|
except json.JSONDecodeError as e:
|
|
raise ValueError(f"There is a syntax error in {api}.json: {e}") from e
|
|
TestCase._jsonblob.update({f"{coloned}:{k}": v for k, v in loaded.items()})
|
|
|
|
# loading this test case
|
|
tc = self._jsonblob.get(name)
|
|
if tc is None:
|
|
raise ValueError(f"Test case {name} not found")
|
|
|
|
# these are the only fields we need
|
|
self.data = tc.get("data")
|
|
|
|
def __getitem__(self, key):
|
|
return self.data[key]
|