llama-stack/distributions
Ashwin Bharambe 96e7ef646f
add support for ${env.FOO_BAR} placeholders in run.yaml files (#439)
# What does this PR do?

We'd like our docker steps to require _ZERO EDITS_ to a YAML file in
order to get going. This is often not possible because depending on the
provider, we do need some configuration input from the user. Environment
variables are the best way to obtain this information.

This PR allows our run.yaml to contain `${env.FOO_BAR}` placeholders
which can be replaced using `docker run -e FOO_BAR=baz` (and similar
`docker compose` equivalent).

## Test Plan

For remote-vllm, example `run.yaml` snippet looks like this:
```yaml
providers:
  inference:
  # serves main inference model
  - provider_id: vllm-0
    provider_type: remote::vllm
    config:
      # NOTE: replace with "localhost" if you are running in "host" network mode
      url: ${env.LLAMA_INFERENCE_VLLM_URL:http://host.docker.internal:5100/v1}
      max_tokens: ${env.MAX_TOKENS:4096}
      api_token: fake
  # serves safety llama_guard model
  - provider_id: vllm-1
    provider_type: remote::vllm
    config:
      # NOTE: replace with "localhost" if you are running in "host" network mode
      url: ${env.LLAMA_SAFETY_VLLM_URL:http://host.docker.internal:5101/v1}
      max_tokens: ${env.MAX_TOKENS:4096}
      api_token: fake
```

`compose.yaml` snippet looks like this:
```yaml
llamastack:
    depends_on:
    - vllm-0
    - vllm-1
      # image: llamastack/distribution-remote-vllm
    image: llamastack/distribution-remote-vllm:test-0.0.52rc3
    volumes:
      - ~/.llama:/root/.llama
      - ~/local/llama-stack/distributions/remote-vllm/run.yaml:/root/llamastack-run-remote-vllm.yaml
    # network_mode: "host"
    environment:
      - LLAMA_INFERENCE_VLLM_URL=${LLAMA_INFERENCE_VLLM_URL:-http://host.docker.internal:5100/v1}
      - LLAMA_INFERENCE_MODEL=${LLAMA_INFERENCE_MODEL:-Llama3.1-8B-Instruct}
      - MAX_TOKENS=${MAX_TOKENS:-4096}
      - SQLITE_STORE_DIR=${SQLITE_STORE_DIR:-$HOME/.llama/distributions/remote-vllm}
      - LLAMA_SAFETY_VLLM_URL=${LLAMA_SAFETY_VLLM_URL:-http://host.docker.internal:5101/v1}
      - LLAMA_SAFETY_MODEL=${LLAMA_SAFETY_MODEL:-Llama-Guard-3-1B}
```
2024-11-13 11:25:58 -08:00
..
bedrock Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00
databricks fix broken --list-templates with adding build.yaml files for packaging (#327) 2024-10-25 12:51:22 -07:00
dell-tgi Update provider types and prefix with inline:: 2024-11-12 12:54:44 -08:00
fireworks Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00
hf-endpoint fix broken --list-templates with adding build.yaml files for packaging (#327) 2024-10-25 12:51:22 -07:00
hf-serverless fix broken --list-templates with adding build.yaml files for packaging (#327) 2024-10-25 12:51:22 -07:00
inline-vllm Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00
meta-reference-gpu Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00
meta-reference-quantized-gpu Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00
ollama Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00
ollama-gpu Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00
remote-vllm add support for ${env.FOO_BAR} placeholders in run.yaml files (#439) 2024-11-13 11:25:58 -08:00
tgi Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00
together Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00