fix(ci, tests): ensure uv environments in CI are kosher, record tests (#3193)

I started this PR trying to unbreak a newly broken test `test_agent_name`. This test was broken all along but did not show up because during testing we were pulling the "non-updated" llama stack client. See this comment: https://github.com/llamastack/llama-stack/pull/3119#discussion_r2270988205 While fixing this, I encountered a large amount of badness in our CI workflow definitions. - We weren't passing `LLAMA_STACK_DIR` or `LLAMA_STACK_CLIENT_DIR` overrides to `llama stack build` at all in some cases. - Even when we did, we used `uv run` liberally. The first thing `uv run` does is "syncs" the project environment. This means, it is going to undo any mutations we might have done ourselves. But we make many mutations in our CI runners to these environments. The most important of which is why `llama stack build` where we install distro dependencies. As a result, when you tried to run the integration tests, you would see old, strange versions. ## Test Plan Re-record using: ``` sh scripts/integration-tests.sh --stack-config ci-tests \ --provider ollama --test-pattern test_agent_name --inference-mode record ``` Then re-run with `--inference-mode replay`. But: Eventually, this test turned out to be quite flaky for telemetry reasons. I haven't investigated it for now and just disabled it sadly since we have a release to push out.
2025-12-03 09:53:45 +00:00 · 2025-08-18 17:02:24 -07:00 · 2025-08-18 17:02:24 -07:00 · eb07a0f86a
commit eb07a0f86a
parent ac78e9f66a
14 changed files with 366 additions and 36 deletions
--- a/llama_stack/testing/inference_recorder.py
+++ b/llama_stack/testing/inference_recorder.py
@ -261,7 +261,7 @@ async def _patched_inference_method(original_method, self, client_type, endpoint
        else:
            raise RuntimeError(
                f"No recorded response found for request hash: {request_hash}\n"
-                f"Endpoint: {endpoint}\n"
+                f"Request: {method} {url} {body}\n"
                f"Model: {body.get('model', 'unknown')}\n"
                f"To record this response, run with LLAMA_STACK_INFERENCE_MODE=record"
            )