mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 19:04:19 +00:00
I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load from fp8" codepath. YAML I tested with: ``` providers: - provider_id: quantized provider_type: meta-reference-quantized config: model: Llama3.1-8B-Instruct quantization: type: fp8 ``` |
||
---|---|---|
.. | ||
scripts | ||
__init__.py | ||
fp8_impls.py | ||
fp8_txest_disabled.py | ||
loader.py |