llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-29 16:14:45 +00:00

History

Ben Browning 544a804678 fix: Together provider shutdown and default to non-streaming The together inference provider was throwing a stack trace every time it shut down, as it was trying to call a non-existent `close` method on the AsyncTogether client. While fixing that, I also adjusted its shutdown logic to close the OpenAI client if we've created one of those, as that client does have a `close` method. In testing that, I also realized we were defaulting to treating all requests as streaming requests instead of defaulting to non-streaming. So, this flips that default to non-streaming to match how the other providers work. I tested this by ensuring the together inference provider no longer spits out a long stack trace when shutting it down and by running the OpenAI API chat completion verification suite to ensure the change in default streaming logic didn't mess anything else up. Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-04-21 17:06:44 -04:00
..
__init__.py	Fix precommit check after moving to ruff (#927 )	2025-02-02 06:46:45 -08:00
config.py	feat: Add open benchmark template codegen (#1579 )	2025-03-12 11:12:08 -07:00
models.py	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
together.py	fix: Together provider shutdown and default to non-streaming	2025-04-21 17:06:44 -04:00