Commit graph

9 commits

Author SHA1 Message Date
Hardik Shah
c64b8cba22 from models.llama3_1 --> from llama_models.llama3_1 2024-07-21 19:07:02 -07:00
rsm
67f0510edd rename ModelInference to Inference 2024-07-21 12:20:32 -07:00
Hardik Shah
c9f33d8f68 cli updates 2024-07-21 01:51:54 -07:00
Ashwin Bharambe
d73fed5cc3 cleanup for fp8 and requirements etc 2024-07-20 23:21:55 -07:00
Ashwin Bharambe
0746a0f62b fp8 inference 2024-07-20 23:13:47 -07:00
Ashwin Bharambe
ad62e2e1f3 make inference server load checkpoints for fp8 inference
- introduce quantization related args for inference config
- also kill GeneratorArgs
2024-07-20 22:54:48 -07:00
Ashwin Bharambe
7d2c0b14b8 Changes from the main repo 2024-07-20 22:52:29 -07:00
Hardik Shah
2ed2881a21 fixed imports models.llama3. --> models.llama3_1.api. 2024-07-19 17:42:14 -07:00
Ashwin Bharambe
95781ec85d Add toolchain from agentic system here 2024-07-19 12:30:35 -07:00