llama-stack-mirror/distributions/remote-vllm
Ashwin Bharambe 96e7ef646f
add support for ${env.FOO_BAR} placeholders in run.yaml files (#439)
# What does this PR do?

We'd like our docker steps to require _ZERO EDITS_ to a YAML file in
order to get going. This is often not possible because depending on the
provider, we do need some configuration input from the user. Environment
variables are the best way to obtain this information.

This PR allows our run.yaml to contain `${env.FOO_BAR}` placeholders
which can be replaced using `docker run -e FOO_BAR=baz` (and similar
`docker compose` equivalent).

## Test Plan

For remote-vllm, example `run.yaml` snippet looks like this:
```yaml
providers:
  inference:
  # serves main inference model
  - provider_id: vllm-0
    provider_type: remote::vllm
    config:
      # NOTE: replace with "localhost" if you are running in "host" network mode
      url: ${env.LLAMA_INFERENCE_VLLM_URL:http://host.docker.internal:5100/v1}
      max_tokens: ${env.MAX_TOKENS:4096}
      api_token: fake
  # serves safety llama_guard model
  - provider_id: vllm-1
    provider_type: remote::vllm
    config:
      # NOTE: replace with "localhost" if you are running in "host" network mode
      url: ${env.LLAMA_SAFETY_VLLM_URL:http://host.docker.internal:5101/v1}
      max_tokens: ${env.MAX_TOKENS:4096}
      api_token: fake
```

`compose.yaml` snippet looks like this:
```yaml
llamastack:
    depends_on:
    - vllm-0
    - vllm-1
      # image: llamastack/distribution-remote-vllm
    image: llamastack/distribution-remote-vllm:test-0.0.52rc3
    volumes:
      - ~/.llama:/root/.llama
      - ~/local/llama-stack/distributions/remote-vllm/run.yaml:/root/llamastack-run-remote-vllm.yaml
    # network_mode: "host"
    environment:
      - LLAMA_INFERENCE_VLLM_URL=${LLAMA_INFERENCE_VLLM_URL:-http://host.docker.internal:5100/v1}
      - LLAMA_INFERENCE_MODEL=${LLAMA_INFERENCE_MODEL:-Llama3.1-8B-Instruct}
      - MAX_TOKENS=${MAX_TOKENS:-4096}
      - SQLITE_STORE_DIR=${SQLITE_STORE_DIR:-$HOME/.llama/distributions/remote-vllm}
      - LLAMA_SAFETY_VLLM_URL=${LLAMA_SAFETY_VLLM_URL:-http://host.docker.internal:5101/v1}
      - LLAMA_SAFETY_MODEL=${LLAMA_SAFETY_MODEL:-Llama-Guard-3-1B}
```
2024-11-13 11:25:58 -08:00
..
build.yaml Distributions updates (slight updates to ollama, add inline-vllm and remote-vllm) (#408) 2024-11-08 18:09:39 -08:00
compose.yaml add support for ${env.FOO_BAR} placeholders in run.yaml files (#439) 2024-11-13 11:25:58 -08:00
run.yaml add support for ${env.FOO_BAR} placeholders in run.yaml files (#439) 2024-11-13 11:25:58 -08:00