llama-stack-mirror/llama_stack/templates
Dmitry Rogozhkin 71ed47ea76
docs: add example for intel gpu in vllm remote (#1952)
# What does this PR do?

PR adds instructions to setup vLLM remote endpoint for vllm-remote llama
stack distribution.

## Test Plan

* Verified with manual tests of the configured vllm-remote against vllm
endpoint running on the system with Intel GPU
* Also verified with ci pytests (see cmdline below). Test passes in the
same capacity as it does on the A10 Nvidia setup (some tests do fail
which seems to be known issues with vllm remote llama stack
distribution)

```
pytest -s -v tests/integration/inference/test_text_inference.py \
   --stack-config=http://localhost:5001 \
   --text-model=meta-llama/Llama-3.2-3B-Instruct
```

CC: @ashwinb

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-04-15 07:56:23 -07:00
..
bedrock chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
cerebras chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
ci-tests test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
dell chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
dev fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
experimental-post-training fix: fix experimental-post-training template (#1740) 2025-03-20 23:07:19 -07:00
fireworks test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
groq fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
hf-endpoint chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
hf-serverless chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
meta-reference-gpu feat: add batch inference API to llama stack inference (#1945) 2025-04-12 11:41:12 -07:00
nvidia chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
ollama chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
open-benchmark chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
passthrough chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
remote-vllm docs: add example for intel gpu in vllm remote (#1952) 2025-04-15 07:56:23 -07:00
sambanova test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
tgi chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
together test: verification on provider's OAI endpoints (#1893) 2025-04-07 23:06:28 -07:00
verification fix: 100% OpenAI API verification for together and fireworks (#1946) 2025-04-14 08:56:29 -07:00
vllm-gpu chore: Revert "chore(telemetry): remove service_name entirely" (#1785) 2025-03-25 14:42:05 -07:00
__init__.py Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
dependencies.json fix: use torchao 0.8.0 for inference (#1925) 2025-04-10 13:39:20 -07:00
template.py feat(api): (1/n) datasets api clean up (#1573) 2025-03-17 16:55:45 -07:00