# What does this PR do?
Migrates MD5 and SHA-1 hash algorithms to SHA-256.
In particular, replaces:
- MD5 in chunk ID generation.
- MD5 in file verification.
- SHA-1 in model identifier digests.
And updates all related test expectations.
Original discussion:
https://github.com/llamastack/llama-stack/discussions/3413
<!-- If resolving an issue, uncomment and update the line below -->
Closes#3424.
## Test Plan
Unit tests from scripts/unit-tests.sh were updated to match the new hash
output, and ran to verify the tests pass.
Signed-off-by: Doug Edgar <dedgar@redhat.com>
# What does this PR do?
some providers do not produce spec compliant outputs. when this happens
the replay infra will fail to construct the proper types and will return
a dict to the client. the client likely does not expect a dict.
this was discovered with tgi, which returns finish_reason="" when valid
values are "stop", "length" or "content_filter"
## Test Plan
ci
One needed to specify record-replay related environment variables for
running integration tests. We could not use defaults because integration
tests could be run against Ollama instances which could be running
different models. For example, text vs vision tests needed separate
instances of Ollama because a single instance typically cannot serve
both of these models if you assume the standard CI worker configuration
on Github. As a result, `client.list()` as returned by the Ollama client
would be different between these runs and we'd end up overwriting
responses.
This PR "solves" it by adding a small amount of complexity -- we store
model list responses specially, keyed by the hashes of the models they
return. At replay time, we merge all of them and pretend that we have
the union of all models available.
## Test Plan
Re-recorded all the tests using `scripts/integration-tests.sh
--inference-mode record`, including the vision tests.
Recording files use a predictable naming format, making the SQLite index
redundant. The binary SQLite file was causing frequent git conflicts.
Simplify by calculating file paths directly from request hashes.
Signed-off-by: Derek Higgins <derekh@redhat.com>
I started this PR trying to unbreak a newly broken test
`test_agent_name`. This test was broken all along but did not show up
because during testing we were pulling the "non-updated" llama stack
client. See this comment:
https://github.com/llamastack/llama-stack/pull/3119#discussion_r2270988205
While fixing this, I encountered a large amount of badness in our CI
workflow definitions.
- We weren't passing `LLAMA_STACK_DIR` or `LLAMA_STACK_CLIENT_DIR`
overrides to `llama stack build` at all in some cases.
- Even when we did, we used `uv run` liberally. The first thing `uv run`
does is "syncs" the project environment. This means, it is going to undo
any mutations we might have done ourselves. But we make many mutations
in our CI runners to these environments. The most important of which is
why `llama stack build` where we install distro dependencies. As a
result, when you tried to run the integration tests, you would see old,
strange versions.
## Test Plan
Re-record using:
```
sh scripts/integration-tests.sh --stack-config ci-tests \
--provider ollama --test-pattern test_agent_name --inference-mode record
```
Then re-run with `--inference-mode replay`. But:
Eventually, this test turned out to be quite flaky for telemetry
reasons. I haven't investigated it for now and just disabled it sadly
since we have a release to push out.
Implements a comprehensive recording and replay system for inference API
calls that eliminates dependency on online inference providers during
testing. The system treats inference as deterministic by recording real
API responses and replaying them in subsequent test runs. Applies to
OpenAI clients (which should cover many inference requests) as well as
Ollama AsyncClient.
For storing, we use a hybrid system: Sqlite for fast lookups and JSON
files for easy greppability / debuggability.
As expected, tests become much much faster (more than 3x in just
inference testing.)
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=record LLAMA_STACK_TEST_RECORDING_DIR=<...> \
uv run pytest -s -v tests/integration/inference \
--stack-config=starter \
-k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
--text-model="ollama/llama3.2:3b-instruct-fp16" \
--embedding-model=sentence-transformers/all-MiniLM-L6-v2
```
```bash
LLAMA_STACK_TEST_INFERENCE_MODE=replay LLAMA_STACK_TEST_RECORDING_DIR=<...> \
uv run pytest -s -v tests/integration/inference \
--stack-config=starter \
-k "not( builtin_tool or safety_with_image or code_interpreter or test_rag )" \
--text-model="ollama/llama3.2:3b-instruct-fp16" \
--embedding-model=sentence-transformers/all-MiniLM-L6-v2
```
- `LLAMA_STACK_TEST_INFERENCE_MODE`: `live` (default), `record`, or
`replay`
- `LLAMA_STACK_TEST_RECORDING_DIR`: Storage location (must be specified
for record or replay modes)