llama-stack-mirror/toolchain/inference
Ashwin Bharambe 0746a0f62b fp8 inference
2024-07-20 23:13:47 -07:00
..
api make inference server load checkpoints for fp8 inference 2024-07-20 22:54:48 -07:00
quantization fp8 inference 2024-07-20 23:13:47 -07:00
__init__.py Add toolchain from agentic system here 2024-07-19 12:30:35 -07:00
api_instance.py Add toolchain from agentic system here 2024-07-19 12:30:35 -07:00
client.py fp8 inference 2024-07-20 23:13:47 -07:00
generation.py make inference server load checkpoints for fp8 inference 2024-07-20 22:54:48 -07:00
inference.py make inference server load checkpoints for fp8 inference 2024-07-20 22:54:48 -07:00
model_parallel.py make inference server load checkpoints for fp8 inference 2024-07-20 22:54:48 -07:00
parallel_utils.py Add toolchain from agentic system here 2024-07-19 12:30:35 -07:00
server.py Add toolchain from agentic system here 2024-07-19 12:30:35 -07:00