llama-stack-mirror/llama_stack/providers/impls/meta_reference/inference/quantization
Ashwin Bharambe 09b793c4d6 Fix fp8 implementation which had bit-rotten a bit
I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load
from fp8" codepath.

YAML I tested with:

```
providers:
  - provider_id: quantized
    provider_type: meta-reference-quantized
    config:
      model: Llama3.1-8B-Instruct
      quantization:
        type: fp8
```
2024-10-15 13:57:01 -07:00
..
scripts API Updates (#73) 2024-09-17 19:51:35 -07:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
fp8_impls.py API Updates (#73) 2024-09-17 19:51:35 -07:00
fp8_txest_disabled.py Add a test runner and 2 very simple tests for agents 2024-09-19 12:22:48 -07:00
loader.py Fix fp8 implementation which had bit-rotten a bit 2024-10-15 13:57:01 -07:00