llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-28 06:21:59 +00:00

History

Ben Browning 544a804678 fix: Together provider shutdown and default to non-streaming The together inference provider was throwing a stack trace every time it shut down, as it was trying to call a non-existent `close` method on the AsyncTogether client. While fixing that, I also adjusted its shutdown logic to close the OpenAI client if we've created one of those, as that client does have a `close` method. In testing that, I also realized we were defaulting to treating all requests as streaming requests instead of defaulting to non-streaming. So, this flips that default to non-streaming to match how the other providers work. I tested this by ensuring the together inference provider no longer spits out a long stack trace when shutting it down and by running the OpenAI API chat completion verification suite to ensure the change in default streaming logic didn't mess anything else up. Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-04-21 17:06:44 -04:00
..
agents	test: add unit test to ensure all config types are instantiable (#1601 )	2025-03-12 22:29:58 -07:00
datasetio	refactor: extract pagination logic into shared helper function (#1770 )	2025-03-31 13:08:29 -07:00
inference	fix: Together provider shutdown and default to non-streaming	2025-04-21 17:06:44 -04:00
post_training	fix: Handle case when Customizer Job status is unknown (#1965 )	2025-04-17 10:27:07 +02:00
safety	docs: Add NVIDIA platform distro docs (#1971 )	2025-04-17 05:54:30 -07:00
tool_runtime	fix(api): don't return list for runtime tools (#1686 )	2025-04-01 09:53:11 +02:00
vector_io	chore: Updating Milvus Client calls to be non-blocking (#1830 )	2025-03-28 22:14:07 -04:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00