feat: Add suffix to openai_completions (#2449)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests / test-matrix (http, 3.10, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.10, inference) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.10, inspect) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.12, post_training) (push) Failing after 6s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.11, inspect) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.10, post_training) (push) Failing after 10s
Integration Tests / test-matrix (http, 3.12, inspect) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.10, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.10, scoring) (push) Failing after 11s
Integration Tests / test-matrix (http, 3.10, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.12, agents) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.10, providers) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.11, vector_io) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.11, datasets) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.10, agents) (push) Failing after 13s
Integration Tests / test-matrix (http, 3.11, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.10, datasets) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 14s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 14s
Unit Tests / unit-tests (3.10) (push) Failing after 19s
Unit Tests / unit-tests (3.11) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Pre-commit / pre-commit (push) Successful in 58s

For code completion apps need "fill in the middle" capabilities. 
Added option of `suffix` to `openai_completion` to enable this. 
Updated ollama provider to showcase the same. 

### Test Plan 
```
pytest -sv --stack-config="inference=ollama"  tests/integration/inference/test_openai_completion.py --text-model qwen2.5-coder:1.5b -k test_openai_completion_non_streaming_suffix
```

### OpenAI Sample script
```
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1/openai/v1")

response = client.completions.create(
    model="qwen2.5-coder:1.5b",
    prompt="The capital of ",
    suffix="is Paris.",
    max_tokens=10,
)

print(response.choices[0].text)
``` 
### Output
```
France is ____.

To answer this question, we 
```
This commit is contained in:
Hardik Shah 2025-06-13 16:06:06 -07:00 committed by GitHub
parent 2e8054bede
commit 985d0b156c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
16 changed files with 74 additions and 3 deletions

View file

@ -22,9 +22,6 @@ def provider_from_model(client_with_models, model_id):
def skip_if_model_doesnt_support_openai_completion(client_with_models, model_id):
if isinstance(client_with_models, LlamaStackAsLibraryClient):
pytest.skip("OpenAI completions are not supported when testing with library client yet.")
provider = provider_from_model(client_with_models, model_id)
if provider.provider_type in (
"inline::meta-reference",
@ -44,6 +41,23 @@ def skip_if_model_doesnt_support_openai_completion(client_with_models, model_id)
pytest.skip(f"Model {model_id} hosted by {provider.provider_type} doesn't support OpenAI completions.")
def skip_if_model_doesnt_support_suffix(client_with_models, model_id):
# To test `fim` ( fill in the middle ) completion, we need to use a model that supports suffix.
# Use this to specifically test this API functionality.
# pytest -sv --stack-config="inference=ollama" \
# tests/integration/inference/test_openai_completion.py \
# --text-model qwen2.5-coder:1.5b \
# -k test_openai_completion_non_streaming_suffix
if model_id != "qwen2.5-coder:1.5b":
pytest.skip(f"Suffix is not supported for the model: {model_id}.")
provider = provider_from_model(client_with_models, model_id)
if provider.provider_type != "remote::ollama":
pytest.skip(f"Provider {provider.provider_type} doesn't support suffix.")
def skip_if_model_doesnt_support_openai_chat_completion(client_with_models, model_id):
if isinstance(client_with_models, LlamaStackAsLibraryClient):
pytest.skip("OpenAI chat completions are not supported when testing with library client yet.")
@ -102,6 +116,32 @@ def test_openai_completion_non_streaming(llama_stack_client, client_with_models,
assert len(choice.text) > 10
@pytest.mark.parametrize(
"test_case",
[
"inference:completion:suffix",
],
)
def test_openai_completion_non_streaming_suffix(llama_stack_client, client_with_models, text_model_id, test_case):
skip_if_model_doesnt_support_openai_completion(client_with_models, text_model_id)
skip_if_model_doesnt_support_suffix(client_with_models, text_model_id)
tc = TestCase(test_case)
# ollama needs more verbose prompting for some reason here...
response = llama_stack_client.completions.create(
model=text_model_id,
prompt=tc["content"],
stream=False,
suffix=tc["suffix"],
max_tokens=10,
)
assert len(response.choices) > 0
choice = response.choices[0]
assert len(choice.text) > 5
assert "france" in choice.text.lower()
@pytest.mark.parametrize(
"test_case",
[

View file

@ -4,6 +4,12 @@
"content": "Complete the sentence using one word: Roses are red, violets are "
}
},
"suffix": {
"data": {
"content": "The capital of ",
"suffix": "is Paris."
}
},
"non_streaming": {
"data": {
"content": "Micheael Jordan is born in ",