forked from phoenix-oss/llama-stack-mirror
I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load
from fp8" codepath.
YAML I tested with:
```
providers:
- provider_id: quantized
provider_type: meta-reference-quantized
config:
model: Llama3.1-8B-Instruct
quantization:
type: fp8
```
|
||
|---|---|---|
| .. | ||
| quantization | ||
| __init__.py | ||
| config.py | ||
| generation.py | ||
| inference.py | ||
| model_parallel.py | ||
| parallel_utils.py | ||