llama-stack-mirror/llama_stack/providers/remote
Ben Browning c014571258 fix: OpenAI API - together.ai extra usage chunks
This fixes an issue where, with some models (ie the Llama 4 models),
together.ai is sending a final usage chunk for streaming responses
even if the user didn't ask to include usage.

With this change, the OpenAI API verification tests now pass 100% when
using Llama Stack as your API server and together.ai as the backend
provider.

As part of this, I also cleaned up the streaming/non-streaming return
types of the `openai_chat_completion` method to keep type checking happy.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-13 13:39:56 -04:00
..
agents test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
datasetio refactor: extract pagination logic into shared helper function (#1770) 2025-03-31 13:08:29 -07:00
inference fix: OpenAI API - together.ai extra usage chunks 2025-04-13 13:39:56 -04:00
post_training fix: remove extra sft args in NvidiaPostTrainingAdapter (#1939) 2025-04-11 10:17:57 -07:00
safety feat: Add unit tests for NVIDIA safety (#1897) 2025-04-11 11:49:55 -07:00
tool_runtime fix(api): don't return list for runtime tools (#1686) 2025-04-01 09:53:11 +02:00
vector_io chore: Updating Milvus Client calls to be non-blocking (#1830) 2025-03-28 22:14:07 -04:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00