forked from phoenix-oss/llama-stack-mirror
Distributions updates (slight updates to ollama, add inline-vllm and remote-vllm) (#408)
* remote vllm distro * add inline-vllm details, fix things * Write some docs
This commit is contained in:
parent
ba82021d4b
commit
4986e46188
19 changed files with 365 additions and 46 deletions
|
@ -45,7 +45,7 @@ def available_providers() -> List[ProviderSpec]:
|
|||
),
|
||||
InlineProviderSpec(
|
||||
api=Api.inference,
|
||||
provider_type="vllm",
|
||||
provider_type="inline::vllm",
|
||||
pip_packages=[
|
||||
"vllm",
|
||||
],
|
||||
|
|
13
llama_stack/templates/inline-vllm/build.yaml
Normal file
13
llama_stack/templates/inline-vllm/build.yaml
Normal file
|
@ -0,0 +1,13 @@
|
|||
name: meta-reference-gpu
|
||||
distribution_spec:
|
||||
docker_image: pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime
|
||||
description: Use code from `llama_stack` itself to serve all llama stack APIs
|
||||
providers:
|
||||
inference: meta-reference
|
||||
memory:
|
||||
- meta-reference
|
||||
- remote::chromadb
|
||||
- remote::pgvector
|
||||
safety: meta-reference
|
||||
agents: meta-reference
|
||||
telemetry: meta-reference
|
12
llama_stack/templates/remote-vllm/build.yaml
Normal file
12
llama_stack/templates/remote-vllm/build.yaml
Normal file
|
@ -0,0 +1,12 @@
|
|||
name: remote-vllm
|
||||
distribution_spec:
|
||||
description: Use (an external) vLLM server for running LLM inference
|
||||
providers:
|
||||
inference: remote::vllm
|
||||
memory:
|
||||
- meta-reference
|
||||
- remote::chromadb
|
||||
- remote::pgvector
|
||||
safety: meta-reference
|
||||
agents: meta-reference
|
||||
telemetry: meta-reference
|
|
@ -1,9 +0,0 @@
|
|||
name: vllm
|
||||
distribution_spec:
|
||||
description: Like local, but use vLLM for running LLM inference
|
||||
providers:
|
||||
inference: vllm
|
||||
memory: meta-reference
|
||||
safety: meta-reference
|
||||
agents: meta-reference
|
||||
telemetry: meta-reference
|
Loading…
Add table
Add a link
Reference in a new issue