mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load
from fp8" codepath.
YAML I tested with:
```
providers:
- provider_id: quantized
provider_type: meta-reference-quantized
config:
model: Llama3.1-8B-Instruct
quantization:
type: fp8
```
|
||
|---|---|---|
| .. | ||
| apis | ||
| cli | ||
| distribution | ||
| providers | ||
| scripts | ||
| __init__.py | ||