llama-stack-mirror/llama_stack/providers/remote/inference
Ben Browning 825ce39879
fix: Together provider shutdown and default to non-streaming (#2001)
# What does this PR do?

The together inference provider was throwing a stack trace every time it
shut down, as it was trying to call a non-existent `close` method on the
AsyncTogether client. While fixing that, I also adjusted its shutdown
logic to close the OpenAI client if we've created one of those, as that
client does have a `close` method.

In testing that, I also realized we were defaulting to treating all
requests as streaming requests instead of defaulting to non-streaming.
So, this flips that default to non-streaming to match how the other
providers work.

## Test Plan

I tested this by ensuring the together inference provider no longer
spits out a long stack trace when shutting it down and by running the
OpenAI API chat completion verification suite to ensure the change in
default streaming logic didn't mess anything else up.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-22 17:47:53 +02:00
..
anthropic feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
bedrock fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
cerebras fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
cerebras_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
databricks fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
fireworks fix: OpenAI Completions API and Fireworks (#1997) 2025-04-21 11:49:12 -07:00
fireworks_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
gemini feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
groq fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
groq_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
nvidia feat: update nvidia inference provider to use model_store (#1988) 2025-04-18 10:16:43 +02:00
ollama feat: allow ollama to use 'latest' if available but not specified (#1903) 2025-04-14 09:03:54 -07:00
openai feat(providers): Groq now uses LiteLLM openai-compat (#1303) 2025-02-27 13:16:50 -08:00
passthrough fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
runpod fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
sambanova fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
sambanova_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
tgi fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
together fix: Together provider shutdown and default to non-streaming (#2001) 2025-04-22 17:47:53 +02:00
together_openai_compat test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
vllm fix: Do not send an empty 'tools' list to remote vllm (#1957) 2025-04-15 20:31:12 -04:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00