Distributions updates (slight updates to ollama, add inline-vllm and remote-vllm) (#408)

* remote vllm distro

* add inline-vllm details, fix things

* Write some docs
This commit is contained in:
Ashwin Bharambe 2024-11-08 18:09:39 -08:00 committed by GitHub
parent ba82021d4b
commit 4986e46188
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
19 changed files with 365 additions and 46 deletions

View file

@ -45,7 +45,7 @@ def available_providers() -> List[ProviderSpec]:
),
InlineProviderSpec(
api=Api.inference,
provider_type="vllm",
provider_type="inline::vllm",
pip_packages=[
"vllm",
],

View file

@ -0,0 +1,13 @@
name: meta-reference-gpu
distribution_spec:
docker_image: pytorch/pytorch:2.5.0-cuda12.4-cudnn9-runtime
description: Use code from `llama_stack` itself to serve all llama stack APIs
providers:
inference: meta-reference
memory:
- meta-reference
- remote::chromadb
- remote::pgvector
safety: meta-reference
agents: meta-reference
telemetry: meta-reference

View file

@ -0,0 +1,12 @@
name: remote-vllm
distribution_spec:
description: Use (an external) vLLM server for running LLM inference
providers:
inference: remote::vllm
memory:
- meta-reference
- remote::chromadb
- remote::pgvector
safety: meta-reference
agents: meta-reference
telemetry: meta-reference

View file

@ -1,9 +0,0 @@
name: vllm
distribution_spec:
description: Like local, but use vLLM for running LLM inference
providers:
inference: vllm
memory: meta-reference
safety: meta-reference
agents: meta-reference
telemetry: meta-reference