llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-28 06:30:24 +00:00

History

Ben Browning 544a804678 fix: Together provider shutdown and default to non-streaming The together inference provider was throwing a stack trace every time it shut down, as it was trying to call a non-existent `close` method on the AsyncTogether client. While fixing that, I also adjusted its shutdown logic to close the OpenAI client if we've created one of those, as that client does have a `close` method. In testing that, I also realized we were defaulting to treating all requests as streaming requests instead of defaulting to non-streaming. So, this flips that default to non-streaming to match how the other providers work. I tested this by ensuring the together inference provider no longer spits out a long stack trace when shutting it down and by running the OpenAI API chat completion verification suite to ensure the change in default streaming logic didn't mess anything else up. Signed-off-by: Ben Browning <bbrownin@redhat.com>		2025-04-21 17:06:44 -04:00
..
inline	fix: OAI compat endpoint for meta reference inference provider (#1962 )	2025-04-17 11:16:04 -07:00
registry	fix: use torchao 0.8.0 for inference (#1925 )	2025-04-10 13:39:20 -07:00
remote	fix: Together provider shutdown and default to non-streaming	2025-04-21 17:06:44 -04:00
tests	refactor: move all llama code to models/llama out of meta reference (#1887 )	2025-04-07 15:03:58 -07:00
utils	fix: OAI compat endpoint for meta reference inference provider (#1962 )	2025-04-17 11:16:04 -07:00
__init__.py	API Updates (#73 )	2024-09-17 19:51:35 -07:00
datatypes.py	feat: add health to all providers through providers endpoint (#1418 )	2025-04-14 11:59:36 +02:00