llama-stack-mirror/llama_stack/providers/impls/meta_reference/inference
Sachin Mehta c05fbf14b3
Added hadamard transform for spinquant (#326)
* Added hadamard transform for spinquant

* Changed from config to model_args

* Added an assertion for model args

* Use enum.value to check against str

* pre-commit

---------

Co-authored-by: Sachin Mehta <sacmehta@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-10-25 12:58:48 -07:00
..
quantization Added hadamard transform for spinquant (#326) 2024-10-25 12:58:48 -07:00
__init__.py Split off meta-reference-quantized provider 2024-10-10 16:03:19 -07:00
config.py Allow overridding checkpoint_dir via config 2024-10-18 14:28:06 -07:00
generation.py Added hadamard transform for spinquant (#326) 2024-10-25 12:58:48 -07:00
inference.py Add support for Structured Output / Guided decoding (#281) 2024-10-22 12:53:34 -07:00
model_parallel.py Make all methods async def again; add completion() for meta-reference (#270) 2024-10-18 20:50:59 -07:00
parallel_utils.py Make all methods async def again; add completion() for meta-reference (#270) 2024-10-18 20:50:59 -07:00