mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-30 19:23:52 +00:00
PR adds instructions to setup vLLM remote endpoint for vllm-remote llama stack distribution. * Verified with manual tests of the configured vllm-remote against vllm endpoint running on the system with Intel GPU * Also verified with ci pytests (see cmdline below). Test passes in the same capacity as it does on the A10 Nvidia setup (some tests do fail which seems to be known issues with vllm remote llama stack distribution) ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config=http://localhost:5001 \ --text-model=meta-llama/Llama-3.2-3B-Instruct ``` Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> |
||
|---|---|---|
| .. | ||
| apis | ||
| cli | ||
| distribution | ||
| models | ||
| providers | ||
| strong_typing | ||
| templates | ||
| __init__.py | ||
| env.py | ||
| log.py | ||
| schema_utils.py | ||