llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

History

Ashwin Bharambe 09b793c4d6 Fix fp8 implementation which had bit-rotten a bit I only tested with "on-the-fly" bf16 -> fp8 conversion, not the "load from fp8" codepath. YAML I tested with: ``` providers: - provider_id: quantized provider_type: meta-reference-quantized config: model: Llama3.1-8B-Instruct quantization: type: fp8 ```		2024-10-15 13:57:01 -07:00
..
quantization	Fix fp8 implementation which had bit-rotten a bit	2024-10-15 13:57:01 -07:00
__init__.py	Split off meta-reference-quantized provider	2024-10-10 16:03:19 -07:00
config.py	Split off meta-reference-quantized provider	2024-10-10 16:03:19 -07:00
generation.py	Fix fp8 implementation which had bit-rotten a bit	2024-10-15 13:57:01 -07:00
inference.py	Split off meta-reference-quantized provider	2024-10-10 16:03:19 -07:00
model_parallel.py	Split off meta-reference-quantized provider	2024-10-10 16:03:19 -07:00
parallel_utils.py	Split off meta-reference-quantized provider	2024-10-10 16:03:19 -07:00