Hardik Shah
|
c64b8cba22
|
from models.llama3_1 --> from llama_models.llama3_1
|
2024-07-21 19:07:02 -07:00 |
|
rsm
|
67f0510edd
|
rename ModelInference to Inference
|
2024-07-21 12:20:32 -07:00 |
|
Hardik Shah
|
c9f33d8f68
|
cli updates
|
2024-07-21 01:51:54 -07:00 |
|
Ashwin Bharambe
|
d73fed5cc3
|
cleanup for fp8 and requirements etc
|
2024-07-20 23:21:55 -07:00 |
|
Ashwin Bharambe
|
0746a0f62b
|
fp8 inference
|
2024-07-20 23:13:47 -07:00 |
|
Ashwin Bharambe
|
ad62e2e1f3
|
make inference server load checkpoints for fp8 inference
- introduce quantization related args for inference config
- also kill GeneratorArgs
|
2024-07-20 22:54:48 -07:00 |
|
Ashwin Bharambe
|
7d2c0b14b8
|
Changes from the main repo
|
2024-07-20 22:52:29 -07:00 |
|
Hardik Shah
|
2ed2881a21
|
fixed imports models.llama3. --> models.llama3_1.api.
|
2024-07-19 17:42:14 -07:00 |
|
Ashwin Bharambe
|
95781ec85d
|
Add toolchain from agentic system here
|
2024-07-19 12:30:35 -07:00 |
|