# What does this PR do?
closes#3268closes#3498
When resuming from previous response ID, currently we attempt to convert
from the stored responses input to chat completion messages, which is
not always possible, e.g. for tool calls where some data is lost once
converted from chat completion message to repsonses input format.
This PR stores the chat completion messages that correspond to the
_last_ call to chat completion, which is sufficient to be resumed from
in the next responses API call, where we load these saved messages and
skip conversion entirely.
Separate issue to optimize storage:
https://github.com/llamastack/llama-stack/issues/3646
## Test Plan
existing CI tests
# What does this PR do?
Mirroring the same changes that was used for inference_store:
https://github.com/llamastack/llama-stack/pull/3383
Will follow up with a shared internal API for managing these write
queues.
## Test Plan
existing tests
# What does this PR do?
previously, developers who ran `./scripts/unit-tests.sh` would get
`asyncio-mode=auto`, which meant `@pytest.mark.asyncio` and
`@pytest_asyncio.fixture` were redundent. developers who ran `pytest`
directly would get pytest's default (strict mode), would run into errors
leading them to add `@pytest.mark.asyncio` / `@pytest_asyncio.fixture`
to their code.
with this change -
- `asyncio_mode=auto` is included in `pyproject.toml` making behavior
consistent for all invocations of pytest
- removes all redundant `@pytest_asyncio.fixture` and
`@pytest.mark.asyncio`
- for good measure, requires `pytest>=8.4` and `pytest-asyncio>=1.0`
## Test Plan
- `./scripts/unit-tests.sh`
- `uv run pytest tests/unit`
# What does this PR do?
Inference/Response stores now store user attributes when inserting, and
respects them when fetching.
## Test Plan
pytest tests/unit/utils/test_sqlstore.py