llama-stack-mirror/llama_stack/apis
Ashwin Bharambe 0c9eb3341c Separate chat_completion stream and non-stream implementations
This is a pretty important requirement. The streaming response type is
an AsyncGenerator while the non-stream one is a single object. So far
this has worked _sometimes_ due to various pre-existing hacks (and in
some cases, just failed.)
2024-10-08 17:23:40 -07:00
..
agents Push registration methods onto the backing providers 2024-10-08 17:23:02 -07:00
batch_inference API Updates (#73) 2024-09-17 19:51:35 -07:00
common API Updates (#73) 2024-09-17 19:51:35 -07:00
dataset API Updates (#73) 2024-09-17 19:51:35 -07:00
evals API Updates (#73) 2024-09-17 19:51:35 -07:00
inference Separate chat_completion stream and non-stream implementations 2024-10-08 17:23:40 -07:00
inspect memory bank registration fixes 2024-10-08 17:23:02 -07:00
memory more memory related fixes; memory.client now works 2024-10-08 17:23:02 -07:00
memory_banks more memory related fixes; memory.client now works 2024-10-08 17:23:02 -07:00
models Redo the { models, shields, memory_banks } typeset 2024-10-08 17:23:02 -07:00
post_training API Updates (#73) 2024-09-17 19:51:35 -07:00
reward_scoring API Updates (#73) 2024-09-17 19:51:35 -07:00
safety Introduce model_store, shield_store, memory_bank_store 2024-10-08 17:23:02 -07:00
shields Redo the { models, shields, memory_banks } typeset 2024-10-08 17:23:02 -07:00
synthetic_data_generation API Updates (#73) 2024-09-17 19:51:35 -07:00
telemetry API Updates (#73) 2024-09-17 19:51:35 -07:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00