From 08da5d003a6b43c06bbb8a8e9d178e6f726c7a03 Mon Sep 17 00:00:00 2001 From: Russell Bryant Date: Sat, 28 Sep 2024 19:10:04 +0000 Subject: [PATCH] Add a local-vllm template This is just like `local` using `meta-reference` for everything except it uses `vllm` for inference. Docker works, but So far, `conda` is a bit easier to use with the vllm provider. The default container base image does not include all the necessary libraries for all vllm features. More cuda dependencies are necessary. I started changing this base image used in this template, but it also required changes to the Dockerfile, so it was getting too involved to include in the first PR. Signed-off-by: Russell Bryant --- .../distribution/templates/local-vllm-build.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 llama_stack/distribution/templates/local-vllm-build.yaml diff --git a/llama_stack/distribution/templates/local-vllm-build.yaml b/llama_stack/distribution/templates/local-vllm-build.yaml new file mode 100644 index 000000000..e907cb7c9 --- /dev/null +++ b/llama_stack/distribution/templates/local-vllm-build.yaml @@ -0,0 +1,10 @@ +name: local-vllm +distribution_spec: + description: Like local, but use vLLM for running LLM inference + providers: + inference: vllm + memory: meta-reference + safety: meta-reference + agents: meta-reference + telemetry: meta-reference +image_type: conda