llama-stack-mirror/llama_stack
Ben Browning e92f571f47
fix: ollama chat completion needs unique ids (#2344)
# What does this PR do?

The chat completion ids generated by Ollama are not unique enough to use
with stored chat completions as they rely on only 3 numbers of
randomness to give unique values - ie `chatcmpl-373`. This causes
frequent collisions in id values of chat completions in Ollama, which
creates issues in our SQL storage of chat completions by id where it
expects ids to actually be unique.

So, this adjusts Ollama responses to use uuids as unique ids. This does
mean we're replacing the ids generated natively by Ollama. If we don't
wish to do this, we'll either need to relax the unique constraint on our
chat completions id field in the inference storage or convince Ollama
upstream to use something closer to uuid values here.

Closes #2315

## Test Plan

I tested by running the openai completion / chat completion integration
tests in a loop. Without this change, I regularly get unique id
collisions. With this change, I do not. We sometimes see flakes from
these unique id collisions in our CI tests, and this will resolve those.

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run llama_stack/templates/ollama/run.yaml

while true; do; \
  INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
  pytest -s -v \
    tests/integration/inference/test_openai_completion.py \
    --stack-config=http://localhost:8321 \
    --text-model="meta-llama/Llama-3.2-3B-Instruct"; \
done
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-02 17:43:20 -07:00
..
apis feat(responses): implement full multi-turn support (#2295) 2025-06-02 15:35:49 -07:00
cli refactor: remove container from list of run image types (#2178) 2025-06-02 09:57:55 +02:00
distribution refactor: remove container from list of run image types (#2178) 2025-06-02 09:57:55 +02:00
models chore: remove usage of load_tiktoken_bpe (#2276) 2025-06-02 07:33:37 -07:00
providers fix: ollama chat completion needs unique ids (#2344) 2025-06-02 17:43:20 -07:00
strong_typing chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
templates revert: "chore: Remove zero-width space characters from OTEL service" (#2331) 2025-06-02 14:21:35 -07:00
ui chore: revert llama-stack-client dep (#2342) 2025-06-02 16:05:21 -07:00
__init__.py export LibraryClient 2024-12-13 12:08:00 -08:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py chore: make cprint write to stderr (#2250) 2025-05-24 23:39:57 -07:00
schema_utils.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00