Commit graph

4 commits

Author SHA1 Message Date
Ashwin Bharambe
ad62e2e1f3 make inference server load checkpoints for fp8 inference
- introduce quantization related args for inference config
- also kill GeneratorArgs
2024-07-20 22:54:48 -07:00
Ashwin Bharambe
7d2c0b14b8 Changes from the main repo 2024-07-20 22:52:29 -07:00
Hardik Shah
2ed2881a21 fixed imports models.llama3. --> models.llama3_1.api. 2024-07-19 17:42:14 -07:00
Ashwin Bharambe
95781ec85d Add toolchain from agentic system here 2024-07-19 12:30:35 -07:00