llama-stack

History

Hardik Shah 999195fe5b fix: [Litellm]Do not swallow first token (#1316 ) `ChatCompletionResponseEventType: start` is ignored and not yielded in the agent_instance as we expect that to not have any content. However, litellm sends first event as `ChatCompletionResponseEventType: start` with content ( which was the first token that we were skipping ) ``` LLAMA_STACK_CONFIG=dev pytest -s -v tests/client-sdk/agents/test_agents.py --inference-model "openai/gpt-4o-mini" -k test_agent_simple ``` This was failing before ( since the word hello was not in the final response )		2025-02-27 20:53:47 -08:00
..
__init__.py	chore: move all Llama Stack types from llama-models to llama-stack (#1098 )	2025-02-14 09:10:59 -08:00
embedding_mixin.py	fix: dont assume SentenceTransformer is imported	2025-02-25 16:53:01 -08:00
litellm_openai_mixin.py	fix: Structured outputs for recursive models (#1311 )	2025-02-27 17:31:53 -08:00
model_registry.py	feat(providers): support non-llama models for inference providers (#1200 )	2025-02-21 13:21:28 -08:00
openai_compat.py	fix: [Litellm]Do not swallow first token (#1316 )	2025-02-27 20:53:47 -08:00
prompt_adapter.py	fix: set default tool_prompt_format in inference api (#1214 )	2025-02-24 12:38:37 -08:00