llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

History

Dmitry Rogozhkin 71ed47ea76 docs: add example for intel gpu in vllm remote (#1952 ) # What does this PR do? PR adds instructions to setup vLLM remote endpoint for vllm-remote llama stack distribution. ## Test Plan * Verified with manual tests of the configured vllm-remote against vllm endpoint running on the system with Intel GPU * Also verified with ci pytests (see cmdline below). Test passes in the same capacity as it does on the A10 Nvidia setup (some tests do fail which seems to be known issues with vllm remote llama stack distribution) ``` pytest -s -v tests/integration/inference/test_text_inference.py \ --stack-config=http://localhost:5001 \ --text-model=meta-llama/Llama-3.2-3B-Instruct ``` CC: @ashwinb Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>		2025-04-15 07:56:23 -07:00
..
bedrock	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
cerebras	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
ci-tests	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
dell	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
dev	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
experimental-post-training	fix: fix experimental-post-training template (#1740 )	2025-03-20 23:07:19 -07:00
fireworks	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
groq	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
hf-endpoint	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
hf-serverless	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
meta-reference-gpu	feat: add batch inference API to llama stack inference (#1945 )	2025-04-12 11:41:12 -07:00
nvidia	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
ollama	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
open-benchmark	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
passthrough	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
remote-vllm	docs: add example for intel gpu in vllm remote (#1952 )	2025-04-15 07:56:23 -07:00
sambanova	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
tgi	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
together	test: verification on provider's OAI endpoints (#1893 )	2025-04-07 23:06:28 -07:00
verification	fix: 100% OpenAI API verification for together and fireworks (#1946 )	2025-04-14 08:56:29 -07:00
vllm-gpu	chore: Revert "chore(telemetry): remove service_name entirely" (#1785 )	2025-03-25 14:42:05 -07:00
__init__.py	Auto-generate distro yamls + docs (#468 )	2024-11-18 14:57:06 -08:00
dependencies.json	fix: use torchao 0.8.0 for inference (#1925 )	2025-04-10 13:39:20 -07:00
template.py	feat(api): (1/n) datasets api clean up (#1573 )	2025-03-17 16:55:45 -07:00