llama-stack-mirror/llama_toolchain/inference
2024-07-22 19:09:55 -07:00
..
api update batch completion endpoint 2024-07-22 16:08:28 -07:00
quantization rename toolchain/ --> llama_toolchain/ 2024-07-21 23:48:38 -07:00
__init__.py rename toolchain/ --> llama_toolchain/ 2024-07-21 23:48:38 -07:00
api_instance.py rename toolchain/ --> llama_toolchain/ 2024-07-21 23:48:38 -07:00
client.py Remove configurations 2024-07-22 16:03:37 -07:00
event_logger.py add EventLogger for inference 2024-07-22 15:11:34 -07:00
generation.py Don't load as bf16 on CPU unless fp8 is active 2024-07-22 19:09:55 -07:00
inference.py rename toolchain/ --> llama_toolchain/ 2024-07-21 23:48:38 -07:00
model_parallel.py Don't load as bf16 on CPU unless fp8 is active 2024-07-22 19:09:55 -07:00
parallel_utils.py rename toolchain/ --> llama_toolchain/ 2024-07-21 23:48:38 -07:00
server.py rename toolchain/ --> llama_toolchain/ 2024-07-21 23:48:38 -07:00