Francisco Arceo
e7d21e1ee3
feat: Add support for Conversations in Responses API ( #3743 )
...
# What does this PR do?
This PR adds support for Conversations in Responses.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Unit tests
Integration tests
<Details>
<Summary>Manual testing with this script: (click to expand)</Summary>
```python
from openai import OpenAI
client = OpenAI()
client = OpenAI(base_url="http://localhost:8321/v1/ ", api_key="none")
def test_conversation_create():
print("Testing conversation create...")
conversation = client.conversations.create(
metadata={"topic": "demo"},
items=[
{"type": "message", "role": "user", "content": "Hello!"}
]
)
print(f"Created: {conversation}")
return conversation
def test_conversation_retrieve(conv_id):
print(f"Testing conversation retrieve for {conv_id}...")
retrieved = client.conversations.retrieve(conv_id)
print(f"Retrieved: {retrieved}")
return retrieved
def test_conversation_update(conv_id):
print(f"Testing conversation update for {conv_id}...")
updated = client.conversations.update(
conv_id,
metadata={"topic": "project-x"}
)
print(f"Updated: {updated}")
return updated
def test_conversation_delete(conv_id):
print(f"Testing conversation delete for {conv_id}...")
deleted = client.conversations.delete(conv_id)
print(f"Deleted: {deleted}")
return deleted
def test_conversation_items_create(conv_id):
print(f"Testing conversation items create for {conv_id}...")
items = client.conversations.items.create(
conv_id,
items=[
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "Hello!"}]
},
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "How are you?"}]
}
]
)
print(f"Items created: {items}")
return items
def test_conversation_items_list(conv_id):
print(f"Testing conversation items list for {conv_id}...")
items = client.conversations.items.list(conv_id, limit=10)
print(f"Items list: {items}")
return items
def test_conversation_item_retrieve(conv_id, item_id):
print(f"Testing conversation item retrieve for {conv_id}/{item_id}...")
item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id)
print(f"Item retrieved: {item}")
return item
def test_conversation_item_delete(conv_id, item_id):
print(f"Testing conversation item delete for {conv_id}/{item_id}...")
deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id)
print(f"Item deleted: {deleted}")
return deleted
def test_conversation_responses_create():
print("\nTesting conversation create for a responses example...")
conversation = client.conversations.create()
print(f"Created: {conversation}")
response = client.responses.create(
model="gpt-4.1",
input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}],
conversation=conversation.id,
)
print(f"Created response: {response} for conversation {conversation.id}")
return response, conversation
def test_conversations_responses_create_followup(
conversation,
content="Repeat what you just said but add 'this is my second time saying this'",
):
print(f"Using: {conversation.id}")
response = client.responses.create(
model="gpt-4.1",
input=[{"role": "user", "content": content}],
conversation=conversation.id,
)
print(f"Created response: {response} for conversation {conversation.id}")
conv_items = client.conversations.items.list(conversation.id)
print(f"\nRetrieving list of items for conversation {conversation.id}:")
print(conv_items.model_dump_json(indent=2))
def test_response_with_fake_conv_id():
fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44"
print(f"Using {fake_conv_id}")
try:
response = client.responses.create(
model="gpt-4.1",
input=[{"role": "user", "content": "say hello"}],
conversation=fake_conv_id,
)
print(f"Created response: {response} for conversation {fake_conv_id}")
except Exception as e:
print(f"failed to create response for conversation {fake_conv_id} with error {e}")
def main():
print("Testing OpenAI Conversations API...")
# Create conversation
conversation = test_conversation_create()
conv_id = conversation.id
# Retrieve conversation
test_conversation_retrieve(conv_id)
# Update conversation
test_conversation_update(conv_id)
# Create items
items = test_conversation_items_create(conv_id)
# List items
items_list = test_conversation_items_list(conv_id)
# Retrieve specific item
if items_list.data:
item_id = items_list.data[0].id
test_conversation_item_retrieve(conv_id, item_id)
# Delete item
test_conversation_item_delete(conv_id, item_id)
# Delete conversation
test_conversation_delete(conv_id)
response, conversation2 = test_conversation_responses_create()
print('\ntesting reseponse retrieval')
test_conversation_retrieve(conversation2.id)
print('\ntesting responses follow up')
test_conversations_responses_create_followup(conversation2)
print('\ntesting responses follow up x2!')
test_conversations_responses_create_followup(
conversation2,
content="Repeat what you just said but add 'this is my third time saying this'",
)
test_response_with_fake_conv_id()
print("All tests completed!")
if __name__ == "__main__":
main()
```
</Details>
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-10 11:57:40 -07:00
Francisco Arceo
a20e8eac8c
feat: Add OpenAI Conversations API ( #3429 )
...
# What does this PR do?
Initial implementation for `Conversations` and `ConversationItems` using
`AuthorizedSqlStore` with endpoints to:
- CREATE
- UPDATE
- GET/RETRIEVE/LIST
- DELETE
Set `level=LLAMA_STACK_API_V1`.
NOTE: This does not currently incorporate changes for Responses, that'll
be done in a subsequent PR.
Closes https://github.com/llamastack/llama-stack/issues/3235
## Test Plan
- Unit tests
- Integration tests
Also comparison of [OpenAPI spec for OpenAI
API](https://github.com/openai/openai-openapi/tree/manual_spec )
```bash
oasdiff breaking --fail-on ERR docs/static/llama-stack-spec.yaml https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/manual_spec/openapi.yaml --strip-prefix-base "/v1/openai/v1" \
--match-path '(^/v1/openai/v1/conversations.*|^/conversations.*)'
```
Note I still have some uncertainty about this, I borrowed this info from
@cdoern on https://github.com/llamastack/llama-stack/pull/3514 but need
to spend more time to confirm it's working, at the moment it suggests it
does.
UPDATE on `oasdiff`, I investigated the OpenAI spec further and it looks
like currently the spec does not list Conversations, so that analysis is
useless. Noting for future reference.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-03 08:47:18 -07:00
Charlie Doern
aac42ddcc2
feat(api): level inference/rerank and remove experimental ( #3565 )
...
# What does this PR do?
inference/rerank is the one route in the API intended to not be
deprecated. Level it as v1alpha.
Additionally, remove `experimental` and opt to instead use `v1alpha`
which itself implies an experimental state based on the original
proposal
Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-09-29 12:42:09 -07:00
Francisco Arceo
ad6ea7fb91
feat: Adding OpenAI Prompts API ( #3319 )
...
# What does this PR do?
This PR adds support for OpenAI Prompts API.
Note, OpenAI does not explicitly expose the Prompts API but instead
makes it available in the Responses API and in the [Prompts
Dashboard](https://platform.openai.com/docs/guides/prompting#create-a-prompt ).
I have added the following APIs:
- CREATE
- GET
- LIST
- UPDATE
- Set Default Version
The Set Default Version API is made available only in the Prompts
Dashboard and configures which prompt version is returned in the GET
(the latest version is the default).
Overall, the expected functionality in Responses will look like this:
```python
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
prompt={
"id": "pmpt_68b0c29740048196bd3a6e6ac3c4d0e20ed9a13f0d15bf5e",
"version": "2",
"variables": {
"city": "San Francisco",
"age": 30,
}
}
)
```
### Resolves https://github.com/llamastack/llama-stack/issues/3276
## Test Plan
Unit tests added. Integration tests can be added after client
generation.
## Next Steps
1. Update Responses API to support Prompt API
2. I'll enhance the UI to implement the Prompt Dashboard.
3. Add cache for lower latency
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-09-08 11:05:13 -04:00
ehhuang
d948e63340
chore: Improve error message for missing provider dependencies ( #3315 )
...
Generated with CC:
Replace cryptic KeyError with clear, actionable error message that
shows:
- Which API the failing provider belongs to
- The provider ID and type that's failing
- Which dependency is missing
- Clear instructions on how to fix the issue
## Test plan
Use a run config with Agents API and no safety provider
Before: KeyError: <Api.safety: 'safety'>
After: Failed to resolve 'agents' provider 'meta-reference' of type
'inline::meta-reference': required dependency 'safety' is not available.
Please add a 'safety' provider to your configuration or check if the
provider is properly configured.
2025-09-03 16:11:59 +02:00
Matthew Farrellee
914c7be288
feat: add batches API with OpenAI compatibility (with inference replay) ( #3162 )
...
Add complete batches API implementation with protocol, providers, and
tests:
Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)
Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation
Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases
Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations
Test with -
```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```
addresses #3066
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-15 15:34:15 -07:00
Ashwin Bharambe
ee7631b6cf
Revert "feat: add batches API with OpenAI compatibility" ( #3149 )
...
Reverts llamastack/llama-stack#3088
The PR broke integration tests.
2025-08-14 10:08:54 -07:00
Matthew Farrellee
de692162af
feat: add batches API with OpenAI compatibility ( #3088 )
...
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s
Unit Tests / unit-tests (3.12) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 22s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Update ReadTheDocs / update-readthedocs (push) Failing after 38s
Pre-commit / pre-commit (push) Successful in 1m53s
Add complete batches API implementation with protocol, providers, and
tests:
Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)
Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation
Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases
Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations
Test with -
```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```
addresses #3066
2025-08-14 09:42:02 -04:00
Ashwin Bharambe
2665f00102
chore(rename): move llama_stack.distribution to llama_stack.core ( #2975 )
...
We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.
cc @leseb
2025-07-30 23:30:53 -07:00