mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-25 17:11:12 +00:00 
			
		
		
		
	| 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Python Package Build Test / build (3.12) (push) Failing after 2s Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped API Conformance Tests / check-schema-compatibility (push) Successful in 10s Vector IO Integration Tests / test-matrix (push) Failing after 5s Python Package Build Test / build (3.13) (push) Failing after 3s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Test External API and Providers / test-external (venv) (push) Failing after 6s Unit Tests / unit-tests (3.12) (push) Failing after 5s Unit Tests / unit-tests (3.13) (push) Failing after 6s UI Tests / ui-tests (22) (push) Successful in 33s Pre-commit / pre-commit (push) Successful in 1m27s # What does this PR do? closes #3268 closes #3498 When resuming from previous response ID, currently we attempt to convert from the stored responses input to chat completion messages, which is not always possible, e.g. for tool calls where some data is lost once converted from chat completion message to repsonses input format. This PR stores the chat completion messages that correspond to the _last_ call to chat completion, which is sufficient to be resumed from in the next responses API call, where we load these saved messages and skip conversion entirely. Separate issue to optimize storage: https://github.com/llamastack/llama-stack/issues/3646 ## Test Plan existing CI tests | ||
|---|---|---|
| .. | ||
| __init__.py | ||
| responses_store.py | ||