llama-stack-mirror/llama_stack/distribution
Ashwin Bharambe 0c9eb3341c Separate chat_completion stream and non-stream implementations
This is a pretty important requirement. The streaming response type is
an AsyncGenerator while the non-stream one is a single object. So far
this has worked _sometimes_ due to various pre-existing hacks (and in
some cases, just failed.)
2024-10-08 17:23:40 -07:00
..
routers Separate chat_completion stream and non-stream implementations 2024-10-08 17:23:40 -07:00
server Fix a bug in meta-reference inference when stream=False 2024-10-08 17:23:02 -07:00
templates fix db path 2024-10-06 11:46:08 -07:00
utils Add an introspection "Api.inspect" API 2024-10-02 15:41:14 -07:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
build.py Kill a derpy import 2024-10-03 11:25:58 -07:00
build_conda_env.sh fix prompt guard (#177) 2024-10-03 11:07:53 -07:00
build_container.sh [CLI] avoid configure twice (#171) 2024-10-03 11:20:54 -07:00
common.sh API Updates (#73) 2024-09-17 19:51:35 -07:00
configure.py A few bug fixes for covering corner cases 2024-10-08 17:23:02 -07:00
configure_container.sh docker: Check for selinux before using --security-opt (#167) 2024-10-02 10:37:41 -07:00
datatypes.py Add inference test 2024-10-08 17:23:02 -07:00
distribution.py A bit cleanup to avoid breakages 2024-10-02 21:31:09 -07:00
inspect.py memory bank registration fixes 2024-10-08 17:23:02 -07:00
request_headers.py provider_id => provider_type, adapter_id => adapter_type 2024-10-02 14:05:59 -07:00
resolver.py Add really basic testing for memory API 2024-10-08 17:23:02 -07:00
start_conda_env.sh API Updates (#73) 2024-09-17 19:51:35 -07:00
start_container.sh docker: Check for selinux before using --security-opt (#167) 2024-10-02 10:37:41 -07:00