llama-stack-mirror/llama_stack/providers/impls/meta_reference/inference
Dalton Flanagan 7a8aa775e5
JSON serialization for parallel processing queue (#232)
* send/recv pydantic json over socket

* fixup

* address feedback

* bidirectional wrapper

* second round of feedback
2024-10-09 17:24:12 -04:00
..
quantization fix fp8 imports 2024-10-03 14:40:21 -07:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00
config.py Use inference APIs for executing Llama Guard (#121) 2024-09-28 15:40:06 -07:00
generation.py JSON serialization for parallel processing queue (#232) 2024-10-09 17:24:12 -04:00
inference.py [bugfix] Fix logprobs on meta-reference impl (#213) 2024-10-07 19:42:39 -07:00
model_parallel.py JSON serialization for parallel processing queue (#232) 2024-10-09 17:24:12 -04:00
parallel_utils.py JSON serialization for parallel processing queue (#232) 2024-10-09 17:24:12 -04:00