llama-stack-mirror/llama_stack/providers/inline/inference/meta_reference
Dmitry Rogozhkin 7ea14ae62e
feat: enable xpu support for meta-reference stack (#558)
This commit adds support for XPU and CPU devices into meta-reference
stack for text models. On creation stack automatically identifies which
device to use checking available accelerate capabilities in the
following order: CUDA, then XPU, finally CPU. This behaviour can be
overwritten with the `DEVICE` environment variable. In this case
explicitly specified device will be used.

Tested with:
```
torchrun pytest llama_stack/providers/tests/inference/test_text_inference.py -k meta_reference
```

Results:
* Tested on: system with single CUDA device, system with single XPU
device and on pure CPU system
* Results: all test pass except `test_completion_logprobs`
* `test_completion_logprobs` fails in the same way as on a baseline,
i.e. unrelated with this change: `AssertionError: Unexpected top_k=3`

Requires: https://github.com/meta-llama/llama-models/pull/233

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-01-31 12:11:49 -08:00
..
quantization Make llama stack build not create a new conda by default (#788) 2025-01-16 13:44:53 -08:00
__init__.py Add provider deprecation support; change directory structure (#397) 2024-11-07 13:04:53 -08:00
config.py [remove import *] clean up import *'s (#689) 2024-12-27 15:45:44 -08:00
generation.py feat: enable xpu support for meta-reference stack (#558) 2025-01-31 12:11:49 -08:00
inference.py [inference api] modify content types so they follow a more standard structure (#841) 2025-01-22 12:16:18 -08:00
model_parallel.py Fix Meta reference GPU implementation (#663) 2024-12-19 14:09:45 -08:00
parallel_utils.py Fix meta-reference GPU implementation for inference 2025-01-22 18:31:59 -08:00