llama-stack-mirror/toolchain/inference/api
Ashwin Bharambe ad62e2e1f3 make inference server load checkpoints for fp8 inference
- introduce quantization related args for inference config
- also kill GeneratorArgs
2024-07-20 22:54:48 -07:00
..
__init__.py Add toolchain from agentic system here 2024-07-19 12:30:35 -07:00
config.py make inference server load checkpoints for fp8 inference 2024-07-20 22:54:48 -07:00
datatypes.py make inference server load checkpoints for fp8 inference 2024-07-20 22:54:48 -07:00
endpoints.py Add toolchain from agentic system here 2024-07-19 12:30:35 -07:00