forked from phoenix-oss/llama-stack-mirror
I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load
from fp8" codepath.
YAML I tested with:
```
providers:
- provider_id: quantized
provider_type: meta-reference-quantized
config:
model: Llama3.1-8B-Instruct
quantization:
type: fp8
```
|
||
|---|---|---|
| .. | ||
| scripts | ||
| __init__.py | ||
| fp8_impls.py | ||
| fp8_txest_disabled.py | ||
| loader.py | ||